diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-07-24 09:54:23 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-07-24 09:54:44 +0000 |
commit | 836b47cb7e99a977c5a23b059ca1d0b5065d310e (patch) | |
tree | 1604da8f482d02effa033c94a84be42bc0c848c3 /src/web/api/queries | |
parent | Releasing debian version 1.44.3-2. (diff) | |
download | netdata-836b47cb7e99a977c5a23b059ca1d0b5065d310e.tar.xz netdata-836b47cb7e99a977c5a23b059ca1d0b5065d310e.zip |
Merging upstream version 1.46.3.
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'src/web/api/queries')
43 files changed, 8668 insertions, 0 deletions
diff --git a/src/web/api/queries/README.md b/src/web/api/queries/README.md new file mode 100644 index 000000000..c9025f752 --- /dev/null +++ b/src/web/api/queries/README.md @@ -0,0 +1,181 @@ +# Database queries/lookup + +This document explains in detail the options available to retrieve data from the Netdata timeseries database in order to configure alerts, create badges or +create custom charts. + +The Netdata database can be queried with the `/api/v1/data` and `/api/v1/badge.svg` REST API methods. The database is also queried from the `lookup` line +in an [alert configuration](/src/health/REFERENCE.md). + +Every data query accepts the following parameters: + +|name|required|description| +|:--:|:------:|:----------| +|`chart`|yes|The chart to be queried.| +|`points`|no|The number of points to be returned. Netdata can reduce number of points by applying query grouping methods. If not given, the result will have the same granularity as the database (although this relates to `gtime`).| +|`before`|no|The absolute timestamp or the relative (to now) time the query should finish evaluating data. If not given, it defaults to the timestamp of the latest point in the database.| +|`after`|no|The absolute timestamp or the relative (to `before`) time the query should start evaluating data. if not given, it defaults to the timestamp of the oldest point in the database.| +|`group`|no|The grouping method to use when reducing the points the database has. If not given, it defaults to `average`.| +|`gtime`|no|A resampling period to change the units of the metrics (i.e. setting this to `60` will convert `per second` metrics to `per minute`. If not given it defaults to granularity of the database.| +|`options`|no|A bitmap of options that can affect the operation of the query. Only 2 options are used by the query engine: `unaligned` and `percentage`. All the other options are used by the output formatters. The default is to return aligned data.| +|`dimensions`|no|A simple pattern to filter the dimensions to be queried. The default is to return all the dimensions of the chart.| + +## Operation + +The query engine works as follows (in this order): + +#### Time-frame + +`after` and `before` define a time-frame, accepting: + +- **absolute timestamps** (unix timestamps, i.e. seconds since epoch). + +- **relative timestamps**: + + `before` is relative to now and `after` is relative to `before`. + + Example: `before=-60&after=-60` evaluates to the time-frame from -120 up to -60 seconds in + the past, relative to the latest entry of the database of the chart. + +The engine verifies that the time-frame requested is available at the database: + +- If the requested time-frame overlaps with the database, the excess requested + will be truncated. + +- If the requested time-frame does not overlap with the database, the engine will + return an empty data set. + +At the end of this operation, `after` and `before` are absolute timestamps. + +#### Data grouping + +Database points grouping is applied when the caller requests a time-frame to be +expressed with fewer points, compared to what is available at the database. + +There are 2 uses that enable this feature: + +- The caller requests a specific number of `points` to be returned. + + For example, for a time-frame of 10 minutes, the database has 600 points (1/sec), + while the caller requested these 10 minutes to be expressed in 200 points. + + This feature is used by Netdata dashboards when you zoom-out the charts. + The dashboard is requesting the number of points the user's screen has. + This saves bandwidth and speeds up the browser (fewer points to evaluate for drawing the charts). +- The caller requests a **re-sampling** of the database, by setting `gtime` to any value + above the granularity of the chart. + + For example, the chart's units is `requests/sec` and caller wants `requests/min`. + +Using `points` and `gtime` the query engine tries to find a best fit for **database-points** +vs **result-points** (we call this ratio `group points`). It always tries to keep `group points` +an integer. Keep in mind the query engine may shift `after` if required. See also the [example](#example). + +#### Time-frame Alignment + +Alignment is a very important aspect of Netdata queries. Without it, the animated +charts on the dashboards would constantly [change shape](#example) during incremental updates. + +To provide consistent grouping through time, the query engine (by default) aligns +`after` and `before` to be a multiple of `group points`. + +For example, if `group points` is 60 and alignment is enabled, the engine will return +each point with durations XX:XX:00 - XX:XX:59, matching whole minutes. + +To disable alignment, pass `&options=unaligned` to the query. + +#### Query Execution + +To execute the query, the engine evaluates all dimensions of the chart, one after another. + +The engine does not evaluate dimensions that do not match the [simple pattern](/src/libnetdata/simple_pattern/README.md) +given at the `dimensions` parameter, except when `options=percentage` is given (this option +requires all the dimensions to be evaluated to find the percentage of each dimension vs to chart +total). + +For each dimension, it starts evaluating values starting at `after` (not inclusive) towards +`before` (inclusive). + +For each value it calls the **grouping method** given with the `&group=` query parameter +(the default is `average`). + +## Grouping methods + +The following grouping methods are supported. These are given all the values in the time-frame +and they group the values every `group points`. + +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=net.eth0&options=unaligned&dimensions=received&group=min&after=-60&label=min&value_color=blue) finds the minimum value +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=net.eth0&options=unaligned&dimensions=received&group=max&after=-60&label=max&value_color=lightblue) finds the maximum value +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=net.eth0&options=unaligned&dimensions=received&group=average&after=-60&label=average&value_color=yellow) finds the average value +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=net.eth0&options=unaligned&dimensions=received&group=sum&units=kilobits&after=-60&label=sum&value_color=orange) adds all the values and returns the sum +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=net.eth0&options=unaligned&dimensions=received&group=median&after=-60&label=median&value_color=red) sorts the values and returns the value in the middle of the list +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=net.eth0&options=unaligned&dimensions=received&group=stddev&after=-60&label=stddev&value_color=green) finds the standard deviation of the values +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=net.eth0&options=unaligned&dimensions=received&group=cv&after=-60&label=cv&units=pcent&value_color=yellow) finds the relative standard deviation (coefficient of variation) of the values +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=net.eth0&options=unaligned&dimensions=received&group=ses&after=-60&label=ses&value_color=brown) finds the exponential weighted moving average of the values +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=net.eth0&options=unaligned&dimensions=received&group=des&after=-60&label=des&value_color=blue) applies Holt-Winters double exponential smoothing +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=net.eth0&options=unaligned&dimensions=received&group=incremental_sum&after=-60&label=incremental_sum&value_color=red) finds the difference of the last vs the first value + +The examples shown above show live information from the `received` traffic on the `eth0` interface of the global Netdata registry. +Inspect any of the badges to see the parameters provided. You can directly issue the request to the registry server's API yourself, e.g. by +passing the following to get the value shown on the badge for the sum of the values within the period: + +``` +https://registry.my-netdata.io/api/v1/data?chart=net.eth0&options=unaligned&dimensions=received&group=sum&units=kilobits&after=-60&label=sum&points=1 +``` + +## Further processing + +The result of the query engine is always a structure that has dimensions and values +for each dimension. + +Formatting modules are then used to convert this result in many different formats and return it +to the caller. + +## Performance + +The query engine is highly optimized for speed. Most of its modules implement "online" +versions of the algorithms, requiring just one pass on the database values to produce +the result. + +## Example + +When Netdata is reducing metrics, it tries to return always the same boundaries. So, if we want 10s averages, it will always return points starting at a `unix timestamp % 10 = 0`. + +Let's see why this is needed, by looking at the error case. + +Assume we have 5 points: + +|time|value| +|:--:|:---:| +|00:01|1| +|00:02|2| +|00:03|3| +|00:04|4| +|00:05|5| + +At 00:04 you ask for 2 points for 4 seconds in the past. So `group = 2`. Netdata would return: + +|point|time|value| +|:---:|:--:|:---:| +|1|00:01 - 00:02|1.5| +|2|00:03 - 00:04|3.5| + +A second later the chart is to be refreshed, and makes again the same request at 00:05. These are the points that would have been returned: + +|point|time|value| +|:---:|:--:|:---:| +|1|00:02 - 00:03|2.5| +|2|00:04 - 00:05|4.5| + +**Wait a moment!** The chart was shifted just one point and it changed value! Point 2 was 3.5 and when shifted to point 1 is 2.5! If you see this in a chart, it's a mess. The charts change shape constantly. + +For this reason, Netdata always aligns the data it returns to the `group`. + +When you request `points=1`, Netdata understands that you need 1 point for the whole database, so `group = 3600`. Then it tries to find the starting point which would be `timestamp % 3600 = 0` Within a database of 3600 seconds, there is one such point for sure. Then it tries to find the average of 3600 points. But, most probably it will not find 3600 of them (for just 1 out of 3600 seconds this query will return something). + +So, the proper way to query the database is to also set at least `after`. The following call will returns 1 point for the last complete 10-second duration (it starts at `timestamp % 10 = 0`): + +<http://netdata.firehol.org/api/v1/data?chart=system.cpu&points=1&after=-10&options=seconds> + +When you keep calling this URL, you will see that it returns one new value every 10 seconds, and the timestamp always ends with zero. Similarly, if you say `points=1&after=-5` it will always return timestamps ending with 0 or 5. + + diff --git a/src/web/api/queries/average/README.md b/src/web/api/queries/average/README.md new file mode 100644 index 000000000..1ad78bee5 --- /dev/null +++ b/src/web/api/queries/average/README.md @@ -0,0 +1,50 @@ +<!-- +title: "Average or Mean" +sidebar_label: "Average or Mean" +custom_edit_url: https://github.com/netdata/netdata/edit/master/src/web/api/queries/average/README.md +learn_status: "Published" +learn_topic_type: "References" +learn_rel_path: "Developers/Web/Api/Queries" +--> + +# Average or Mean + +> This query is available as `average` and `mean`. + +An average is a single number taken as representative of a list of numbers. + +It is calculated as: + +``` +average = sum(numbers) / count(numbers) +``` + +## how to use + +Use it in alerts like this: + +``` + alarm: my_alert + on: my_chart +lookup: average -1m unaligned of my_dimension + warn: $this > 1000 +``` + +`average` does not change the units. For example, if the chart units is `requests/sec`, the result +will be again expressed in the same units. + +It can also be used in APIs and badges as `&group=average` in the URL. + +## Examples + +Examining last 1 minute `successful` web server responses: + +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=min&after=-60&label=min) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=average&after=-60&label=average&value_color=orange) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=max&after=-60&label=max) + +## References + +- <https://en.wikipedia.org/wiki/Average>. + + diff --git a/src/web/api/queries/average/average.c b/src/web/api/queries/average/average.c new file mode 100644 index 000000000..f54dcb243 --- /dev/null +++ b/src/web/api/queries/average/average.c @@ -0,0 +1,4 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#include "average.h" + diff --git a/src/web/api/queries/average/average.h b/src/web/api/queries/average/average.h new file mode 100644 index 000000000..2d77cc571 --- /dev/null +++ b/src/web/api/queries/average/average.h @@ -0,0 +1,62 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_API_QUERY_AVERAGE_H +#define NETDATA_API_QUERY_AVERAGE_H + +#include "../query.h" +#include "../rrdr.h" + +// ---------------------------------------------------------------------------- +// average + +struct tg_average { + NETDATA_DOUBLE sum; + size_t count; +}; + +static inline void tg_average_create(RRDR *r, const char *options __maybe_unused) { + r->time_grouping.data = onewayalloc_callocz(r->internal.owa, 1, sizeof(struct tg_average)); +} + +// resets when switches dimensions +// so, clear everything to restart +static inline void tg_average_reset(RRDR *r) { + struct tg_average *g = (struct tg_average *)r->time_grouping.data; + g->sum = 0; + g->count = 0; +} + +static inline void tg_average_free(RRDR *r) { + onewayalloc_freez(r->internal.owa, r->time_grouping.data); + r->time_grouping.data = NULL; +} + +static inline void tg_average_add(RRDR *r, NETDATA_DOUBLE value) { + struct tg_average *g = (struct tg_average *)r->time_grouping.data; + g->sum += value; + g->count++; +} + +static inline NETDATA_DOUBLE tg_average_flush(RRDR *r, RRDR_VALUE_FLAGS *rrdr_value_options_ptr) { + struct tg_average *g = (struct tg_average *)r->time_grouping.data; + + NETDATA_DOUBLE value; + + if(unlikely(!g->count)) { + value = 0.0; + *rrdr_value_options_ptr |= RRDR_VALUE_EMPTY; + } + else { + if(unlikely(r->time_grouping.resampling_group != 1)) + value = g->sum / r->time_grouping.resampling_divisor; + else + value = g->sum / g->count; + } + + g->sum = 0.0; + g->count = 0; + + return value; +} + +#endif //NETDATA_API_QUERY_AVERAGE_H diff --git a/src/web/api/queries/countif/README.md b/src/web/api/queries/countif/README.md new file mode 100644 index 000000000..a40535395 --- /dev/null +++ b/src/web/api/queries/countif/README.md @@ -0,0 +1,40 @@ +<!-- +title: "CountIf" +sidebar_label: "CountIf" +custom_edit_url: https://github.com/netdata/netdata/edit/master/src/web/api/queries/countif/README.md +learn_status: "Published" +learn_topic_type: "References" +learn_rel_path: "Developers/Web/Api/Queries" +--> + +# CountIf + +> This query is available as `countif`. + +CountIf returns the percentage of points in the database that satisfy the condition supplied. + +The following conditions are available: + +- `!` or `!=` or `<>`, different than +- `=` or `:`, equal to +- `>`, greater than +- `<`, less than +- `>=`, greater or equal to +- `<=`, less or equal to + +The target number and the desired condition can be set using the `group_options` query parameter, as a string, like in these examples: + +- `!0`, to match any number except zero. +- `>=-3` to match any number bigger or equal to -3. + +. When an invalid condition is given, the web server can deliver a not accurate response. + +## how to use + +This query cannot be used in alerts. + +`countif` changes the units of charts. The result of the calculation is always from zero to 1, expressing the percentage of database points that matched the condition. + +In APIs and badges can be used like this: `&group=countif&group_options=>10` in the URL. + + diff --git a/src/web/api/queries/countif/countif.c b/src/web/api/queries/countif/countif.c new file mode 100644 index 000000000..8a3a1f50b --- /dev/null +++ b/src/web/api/queries/countif/countif.c @@ -0,0 +1,7 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#include "countif.h" + +// ---------------------------------------------------------------------------- +// countif + diff --git a/src/web/api/queries/countif/countif.h b/src/web/api/queries/countif/countif.h new file mode 100644 index 000000000..af204a95a --- /dev/null +++ b/src/web/api/queries/countif/countif.h @@ -0,0 +1,148 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_API_QUERY_COUNTIF_H +#define NETDATA_API_QUERY_COUNTIF_H + +#include "../query.h" +#include "../rrdr.h" + +enum tg_countif_cmp { + TG_COUNTIF_EQUAL, + TG_COUNTIF_NOTEQUAL, + TG_COUNTIF_LESS, + TG_COUNTIF_LESSEQUAL, + TG_COUNTIF_GREATER, + TG_COUNTIF_GREATEREQUAL, +}; + +struct tg_countif { + enum tg_countif_cmp comparison; + NETDATA_DOUBLE target; + size_t count; + size_t matched; +}; + +static inline void tg_countif_create(RRDR *r, const char *options __maybe_unused) { + struct tg_countif *g = onewayalloc_callocz(r->internal.owa, 1, sizeof(struct tg_countif)); + r->time_grouping.data = g; + + if(options && *options) { + // skip any leading spaces + while(isspace((uint8_t)*options)) options++; + + // find the comparison function + switch(*options) { + case '!': + options++; + if(*options != '=' && *options != ':') + options--; + g->comparison = TG_COUNTIF_NOTEQUAL; + break; + + case '>': + options++; + if(*options == '=' || *options == ':') { + g->comparison = TG_COUNTIF_GREATEREQUAL; + } + else { + options--; + g->comparison = TG_COUNTIF_GREATER; + } + break; + + case '<': + options++; + if(*options == '>') { + g->comparison = TG_COUNTIF_NOTEQUAL; + } + else if(*options == '=' || *options == ':') { + g->comparison = TG_COUNTIF_LESSEQUAL; + } + else { + options--; + g->comparison = TG_COUNTIF_LESS; + } + break; + + default: + case '=': + case ':': + g->comparison = TG_COUNTIF_EQUAL; + break; + } + if(*options) options++; + + // skip everything up to the first digit + while(isspace((uint8_t)*options)) options++; + + g->target = str2ndd(options, NULL); + } + else { + g->target = 0.0; + g->comparison = TG_COUNTIF_EQUAL; + } +} + +// resets when switches dimensions +// so, clear everything to restart +static inline void tg_countif_reset(RRDR *r) { + struct tg_countif *g = (struct tg_countif *)r->time_grouping.data; + g->matched = 0; + g->count = 0; +} + +static inline void tg_countif_free(RRDR *r) { + onewayalloc_freez(r->internal.owa, r->time_grouping.data); + r->time_grouping.data = NULL; +} + +static inline void tg_countif_add(RRDR *r, NETDATA_DOUBLE value) { + struct tg_countif *g = (struct tg_countif *)r->time_grouping.data; + switch(g->comparison) { + case TG_COUNTIF_GREATER: + if(value > g->target) g->matched++; + break; + + case TG_COUNTIF_GREATEREQUAL: + if(value >= g->target) g->matched++; + break; + + case TG_COUNTIF_LESS: + if(value < g->target) g->matched++; + break; + + case TG_COUNTIF_LESSEQUAL: + if(value <= g->target) g->matched++; + break; + + case TG_COUNTIF_EQUAL: + if(value == g->target) g->matched++; + break; + + case TG_COUNTIF_NOTEQUAL: + if(value != g->target) g->matched++; + break; + } + g->count++; +} + +static inline NETDATA_DOUBLE tg_countif_flush(RRDR *r, RRDR_VALUE_FLAGS *rrdr_value_options_ptr) { + struct tg_countif *g = (struct tg_countif *)r->time_grouping.data; + + NETDATA_DOUBLE value; + + if(unlikely(!g->count)) { + value = 0.0; + *rrdr_value_options_ptr |= RRDR_VALUE_EMPTY; + } + else { + value = (NETDATA_DOUBLE)g->matched * 100 / (NETDATA_DOUBLE)g->count; + } + + g->matched = 0; + g->count = 0; + + return value; +} + +#endif //NETDATA_API_QUERY_COUNTIF_H diff --git a/src/web/api/queries/des/README.md b/src/web/api/queries/des/README.md new file mode 100644 index 000000000..6dc19e732 --- /dev/null +++ b/src/web/api/queries/des/README.md @@ -0,0 +1,77 @@ +<!-- +title: "double exponential smoothing" +sidebar_label: "double exponential smoothing" +custom_edit_url: https://github.com/netdata/netdata/edit/master/src/web/api/queries/des/README.md +learn_status: "Published" +learn_topic_type: "References" +learn_rel_path: "Developers/Web/Api/Queries" +--> + +# double exponential smoothing + +Exponential smoothing is one of many window functions commonly applied to smooth data in signal +processing, acting as low-pass filters to remove high frequency noise. + +Simple exponential smoothing does not do well when there is a trend in the data. +In such situations, several methods were devised under the name "double exponential smoothing" +or "second-order exponential smoothing.", which is the recursive application of an exponential +filter twice, thus being termed "double exponential smoothing". + +In simple terms, this is like an average value, but more recent values are given more weight +and the trend of the values influences significantly the result. + +> **IMPORTANT** +> +> It is common for `des` to provide "average" values that far beyond the minimum or the maximum +> values found in the time-series. +> `des` estimates these values because of it takes into account the trend. + +This module implements the "Holt-Winters double exponential smoothing". + +Netdata automatically adjusts the weight (`alpha`) and the trend (`beta`) based on the number +of values processed, using the formula: + +``` +window = max(number of values, 15) +alpha = 2 / (window + 1) +beta = 2 / (window + 1) +``` + +You can change the fixed value `15` by setting in `netdata.conf`: + +``` +[web] + des max window = 15 +``` + +## how to use + +Use it in alerts like this: + +``` + alarm: my_alert + on: my_chart +lookup: des -1m unaligned of my_dimension + warn: $this > 1000 +``` + +`des` does not change the units. For example, if the chart units is `requests/sec`, the result +will be again expressed in the same units. + +It can also be used in APIs and badges as `&group=des` in the URL. + +## Examples + +Examining last 1 minute `successful` web server responses: + +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=min&after=-60&label=min) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=average&after=-60&label=average&value_color=yellow) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=ses&after=-60&label=single+exponential+smoothing&value_color=yellow) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=des&after=-60&label=double+exponential+smoothing&value_color=orange) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=max&after=-60&label=max) + +## References + +- <https://en.wikipedia.org/wiki/Exponential_smoothing>. + + diff --git a/src/web/api/queries/des/des.c b/src/web/api/queries/des/des.c new file mode 100644 index 000000000..d0e234e23 --- /dev/null +++ b/src/web/api/queries/des/des.c @@ -0,0 +1,8 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#include <web/api/queries/rrdr.h> +#include "des.h" + + +// ---------------------------------------------------------------------------- +// single exponential smoothing diff --git a/src/web/api/queries/des/des.h b/src/web/api/queries/des/des.h new file mode 100644 index 000000000..3153d497c --- /dev/null +++ b/src/web/api/queries/des/des.h @@ -0,0 +1,138 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_API_QUERIES_DES_H +#define NETDATA_API_QUERIES_DES_H + +#include "../query.h" +#include "../rrdr.h" + +struct tg_des { + NETDATA_DOUBLE alpha; + NETDATA_DOUBLE alpha_other; + NETDATA_DOUBLE beta; + NETDATA_DOUBLE beta_other; + + NETDATA_DOUBLE level; + NETDATA_DOUBLE trend; + + size_t count; +}; + +static size_t tg_des_max_window_size = 15; + +static inline void tg_des_init(void) { + long long ret = config_get_number(CONFIG_SECTION_WEB, "des max tg_des_window", (long long)tg_des_max_window_size); + if(ret <= 1) { + config_set_number(CONFIG_SECTION_WEB, "des max tg_des_window", (long long)tg_des_max_window_size); + } + else { + tg_des_max_window_size = (size_t) ret; + } +} + +static inline NETDATA_DOUBLE tg_des_window(RRDR *r, struct tg_des *g) { + (void)g; + + NETDATA_DOUBLE points; + if(r->view.group == 1) { + // provide a running DES + points = (NETDATA_DOUBLE)r->time_grouping.points_wanted; + } + else { + // provide a SES with flush points + points = (NETDATA_DOUBLE)r->view.group; + } + + // https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average + // A commonly used value for alpha is 2 / (N + 1) + return (points > (NETDATA_DOUBLE)tg_des_max_window_size) ? (NETDATA_DOUBLE)tg_des_max_window_size : points; +} + +static inline void tg_des_set_alpha(RRDR *r, struct tg_des *g) { + // https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average + // A commonly used value for alpha is 2 / (N + 1) + + g->alpha = 2.0 / (tg_des_window(r, g) + 1.0); + g->alpha_other = 1.0 - g->alpha; + + //info("alpha for chart '%s' is " CALCULATED_NUMBER_FORMAT, r->st->name, g->alpha); +} + +static inline void tg_des_set_beta(RRDR *r, struct tg_des *g) { + // https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average + // A commonly used value for alpha is 2 / (N + 1) + + g->beta = 2.0 / (tg_des_window(r, g) + 1.0); + g->beta_other = 1.0 - g->beta; + + //info("beta for chart '%s' is " CALCULATED_NUMBER_FORMAT, r->st->name, g->beta); +} + +static inline void tg_des_create(RRDR *r, const char *options __maybe_unused) { + struct tg_des *g = (struct tg_des *)onewayalloc_mallocz(r->internal.owa, sizeof(struct tg_des)); + tg_des_set_alpha(r, g); + tg_des_set_beta(r, g); + g->level = 0.0; + g->trend = 0.0; + g->count = 0; + r->time_grouping.data = g; +} + +// resets when switches dimensions +// so, clear everything to restart +static inline void tg_des_reset(RRDR *r) { + struct tg_des *g = (struct tg_des *)r->time_grouping.data; + g->level = 0.0; + g->trend = 0.0; + g->count = 0; + + // fprintf(stderr, "\nDES: "); + +} + +static inline void tg_des_free(RRDR *r) { + onewayalloc_freez(r->internal.owa, r->time_grouping.data); + r->time_grouping.data = NULL; +} + +static inline void tg_des_add(RRDR *r, NETDATA_DOUBLE value) { + struct tg_des *g = (struct tg_des *)r->time_grouping.data; + + if(likely(g->count > 0)) { + // we have at least a number so far + + if(unlikely(g->count == 1)) { + // the second value we got + g->trend = value - g->trend; + g->level = value; + } + + // for the values, except the first + NETDATA_DOUBLE last_level = g->level; + g->level = (g->alpha * value) + (g->alpha_other * (g->level + g->trend)); + g->trend = (g->beta * (g->level - last_level)) + (g->beta_other * g->trend); + } + else { + // the first value we got + g->level = g->trend = value; + } + + g->count++; + + //fprintf(stderr, "value: " CALCULATED_NUMBER_FORMAT ", level: " CALCULATED_NUMBER_FORMAT ", trend: " CALCULATED_NUMBER_FORMAT "\n", value, g->level, g->trend); +} + +static inline NETDATA_DOUBLE tg_des_flush(RRDR *r, RRDR_VALUE_FLAGS *rrdr_value_options_ptr) { + struct tg_des *g = (struct tg_des *)r->time_grouping.data; + + if(unlikely(!g->count || !netdata_double_isnumber(g->level))) { + *rrdr_value_options_ptr |= RRDR_VALUE_EMPTY; + return 0.0; + } + + //fprintf(stderr, " RESULT for %zu values = " CALCULATED_NUMBER_FORMAT " \n", g->count, g->level); + + return g->level; +} + +#endif //NETDATA_API_QUERIES_DES_H diff --git a/src/web/api/queries/incremental_sum/README.md b/src/web/api/queries/incremental_sum/README.md new file mode 100644 index 000000000..6f02abe7d --- /dev/null +++ b/src/web/api/queries/incremental_sum/README.md @@ -0,0 +1,45 @@ +<!-- +title: "Incremental Sum (`incremental_sum`)" +sidebar_label: "Incremental Sum (`incremental_sum`)" +custom_edit_url: https://github.com/netdata/netdata/edit/master/src/web/api/queries/incremental_sum/README.md +learn_status: "Published" +learn_topic_type: "References" +learn_rel_path: "Developers/Web/Api/Queries" +--> + +# Incremental Sum (`incremental_sum`) + +This modules finds the incremental sum of a period, which `last value - first value`. + +The result may be positive (rising) or negative (falling) depending on the first and last values. + +## how to use + +Use it in alerts like this: + +``` + alarm: my_alert + on: my_chart +lookup: incremental_sum -1m unaligned of my_dimension + warn: $this > 1000 +``` + +`incremental_sum` does not change the units. For example, if the chart units is `requests/sec`, the result +will be again expressed in the same units. + +It can also be used in APIs and badges as `&group=incremental_sum` in the URL. + +## Examples + +Examining last 1 minute `successful` web server responses: + +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=min&after=-60&label=min) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=average&after=-60&label=average) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=max&after=-60&label=max) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=incremental_sum&after=-60&label=incremental+sum&value_color=orange) + +## References + +- none + + diff --git a/src/web/api/queries/incremental_sum/incremental_sum.c b/src/web/api/queries/incremental_sum/incremental_sum.c new file mode 100644 index 000000000..88072f297 --- /dev/null +++ b/src/web/api/queries/incremental_sum/incremental_sum.c @@ -0,0 +1,7 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#include "incremental_sum.h" + +// ---------------------------------------------------------------------------- +// incremental sum + diff --git a/src/web/api/queries/incremental_sum/incremental_sum.h b/src/web/api/queries/incremental_sum/incremental_sum.h new file mode 100644 index 000000000..f110c5861 --- /dev/null +++ b/src/web/api/queries/incremental_sum/incremental_sum.h @@ -0,0 +1,71 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_API_QUERY_INCREMENTAL_SUM_H +#define NETDATA_API_QUERY_INCREMENTAL_SUM_H + +#include "../query.h" +#include "../rrdr.h" + +struct tg_incremental_sum { + NETDATA_DOUBLE first; + NETDATA_DOUBLE last; + size_t count; +}; + +// resets when switches dimensions +// so, clear everything to restart +static inline void tg_incremental_sum_reset(RRDR *r) { + struct tg_incremental_sum *g = (struct tg_incremental_sum *)r->time_grouping.data; + g->first = NAN; + g->last = NAN; + g->count = 0; +} + +static inline void tg_incremental_sum_create(RRDR *r, const char *options __maybe_unused) { + r->time_grouping.data = onewayalloc_mallocz(r->internal.owa, sizeof(struct tg_incremental_sum)); + tg_incremental_sum_reset(r); +} + +static inline void tg_incremental_sum_free(RRDR *r) { + onewayalloc_freez(r->internal.owa, r->time_grouping.data); + r->time_grouping.data = NULL; +} + +static inline void tg_incremental_sum_add(RRDR *r, NETDATA_DOUBLE value) { + struct tg_incremental_sum *g = (struct tg_incremental_sum *)r->time_grouping.data; + + if(unlikely(!g->count)) { + if(isnan(g->first)) + g->first = value; + else + g->last = value; + + g->count++; + } + else { + g->last = value; + g->count++; + } +} + +static inline NETDATA_DOUBLE tg_incremental_sum_flush(RRDR *r, RRDR_VALUE_FLAGS *rrdr_value_options_ptr) { + struct tg_incremental_sum *g = (struct tg_incremental_sum *)r->time_grouping.data; + + NETDATA_DOUBLE value; + + if(unlikely(!g->count || isnan(g->first) || isnan(g->last))) { + value = 0.0; + *rrdr_value_options_ptr |= RRDR_VALUE_EMPTY; + } + else { + value = g->last - g->first; + } + + g->first = g->last; + g->last = NAN; + g->count = 0; + + return value; +} + +#endif //NETDATA_API_QUERY_INCREMENTAL_SUM_H diff --git a/src/web/api/queries/max/README.md b/src/web/api/queries/max/README.md new file mode 100644 index 000000000..ae634e05e --- /dev/null +++ b/src/web/api/queries/max/README.md @@ -0,0 +1,42 @@ +<!-- +title: "Max" +sidebar_label: "Max" +custom_edit_url: https://github.com/netdata/netdata/edit/master/src/web/api/queries/max/README.md +learn_status: "Published" +learn_topic_type: "References" +learn_rel_path: "Developers/Web/Api/Queries" +--> + +# Max + +This module finds the max value in the time-frame given. + +## how to use + +Use it in alerts like this: + +``` + alarm: my_alert + on: my_chart +lookup: max -1m unaligned of my_dimension + warn: $this > 1000 +``` + +`max` does not change the units. For example, if the chart units is `requests/sec`, the result +will be again expressed in the same units. + +It can also be used in APIs and badges as `&group=max` in the URL. + +## Examples + +Examining last 1 minute `successful` web server responses: + +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=min&after=-60&label=min) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=average&after=-60&label=average) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=max&after=-60&label=max&value_color=orange) + +## References + +- <https://en.wikipedia.org/wiki/Sample_maximum_and_minimum>. + + diff --git a/src/web/api/queries/max/max.c b/src/web/api/queries/max/max.c new file mode 100644 index 000000000..cc5999a29 --- /dev/null +++ b/src/web/api/queries/max/max.c @@ -0,0 +1,7 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#include "max.h" + +// ---------------------------------------------------------------------------- +// max + diff --git a/src/web/api/queries/max/max.h b/src/web/api/queries/max/max.h new file mode 100644 index 000000000..c26bb79ad --- /dev/null +++ b/src/web/api/queries/max/max.h @@ -0,0 +1,59 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_API_QUERY_MAX_H +#define NETDATA_API_QUERY_MAX_H + +#include "../query.h" +#include "../rrdr.h" + +struct tg_max { + NETDATA_DOUBLE max; + size_t count; +}; + +static inline void tg_max_create(RRDR *r, const char *options __maybe_unused) { + r->time_grouping.data = onewayalloc_callocz(r->internal.owa, 1, sizeof(struct tg_max)); +} + +// resets when switches dimensions +// so, clear everything to restart +static inline void tg_max_reset(RRDR *r) { + struct tg_max *g = (struct tg_max *)r->time_grouping.data; + g->max = 0; + g->count = 0; +} + +static inline void tg_max_free(RRDR *r) { + onewayalloc_freez(r->internal.owa, r->time_grouping.data); + r->time_grouping.data = NULL; +} + +static inline void tg_max_add(RRDR *r, NETDATA_DOUBLE value) { + struct tg_max *g = (struct tg_max *)r->time_grouping.data; + + if(!g->count || fabsndd(value) > fabsndd(g->max)) { + g->max = value; + g->count++; + } +} + +static inline NETDATA_DOUBLE tg_max_flush(RRDR *r, RRDR_VALUE_FLAGS *rrdr_value_options_ptr) { + struct tg_max *g = (struct tg_max *)r->time_grouping.data; + + NETDATA_DOUBLE value; + + if(unlikely(!g->count)) { + value = 0.0; + *rrdr_value_options_ptr |= RRDR_VALUE_EMPTY; + } + else { + value = g->max; + } + + g->max = 0.0; + g->count = 0; + + return value; +} + +#endif //NETDATA_API_QUERY_MAX_H diff --git a/src/web/api/queries/median/README.md b/src/web/api/queries/median/README.md new file mode 100644 index 000000000..e6f6c04e7 --- /dev/null +++ b/src/web/api/queries/median/README.md @@ -0,0 +1,64 @@ +<!-- +title: "Median" +sidebar_label: "Median" +description: "Use median in API queries and health entities to find the 'middle' value from a sample, eliminating any unwanted spikes in the returned metrics." +custom_edit_url: https://github.com/netdata/netdata/edit/master/src/web/api/queries/median/README.md +learn_status: "Published" +learn_topic_type: "References" +learn_rel_path: "Developers/Web/Api/Queries" +--> + +# Median + +The median is the value separating the higher half from the lower half of a data sample +(a population or a probability distribution). For a data set, it may be thought of as the +"middle" value. + +`median` is not an accurate average. However, it eliminates all spikes, by sorting +all the values in a period, and selecting the value in the middle of the sorted array. + +Netdata also supports `trimmed-median`, which trims a percentage of the smaller and bigger values prior to finding the +median. The following `trimmed-median` functions are defined: + +- `trimmed-median1` +- `trimmed-median2` +- `trimmed-median3` +- `trimmed-median5` +- `trimmed-median10` +- `trimmed-median15` +- `trimmed-median20` +- `trimmed-median25` + +The function `trimmed-median` is an alias for `trimmed-median5`. + +## how to use + +Use it in alerts like this: + +``` + alarm: my_alert + on: my_chart +lookup: median -1m unaligned of my_dimension + warn: $this > 1000 +``` + +`median` does not change the units. For example, if the chart units is `requests/sec`, the result +will be again expressed in the same units. + +It can also be used in APIs and badges as `&group=median` in the URL. Additionally, a percentage may be given with +`&group_options=` to trim all small and big values before finding the median. + +## Examples + +Examining last 1 minute `successful` web server responses: + +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=min&after=-60&label=min) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=average&after=-60&label=average) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=median&after=-60&label=median&value_color=orange) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=max&after=-60&label=max) + +## References + +- <https://en.wikipedia.org/wiki/Median>. + + diff --git a/src/web/api/queries/median/median.c b/src/web/api/queries/median/median.c new file mode 100644 index 000000000..9865b485c --- /dev/null +++ b/src/web/api/queries/median/median.c @@ -0,0 +1,6 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#include "median.h" + +// ---------------------------------------------------------------------------- +// median diff --git a/src/web/api/queries/median/median.h b/src/web/api/queries/median/median.h new file mode 100644 index 000000000..3d6d35925 --- /dev/null +++ b/src/web/api/queries/median/median.h @@ -0,0 +1,143 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_API_QUERIES_MEDIAN_H +#define NETDATA_API_QUERIES_MEDIAN_H + +#include "../query.h" +#include "../rrdr.h" + +struct tg_median { + size_t series_size; + size_t next_pos; + NETDATA_DOUBLE percent; + + NETDATA_DOUBLE *series; +}; + +static inline void tg_median_create_internal(RRDR *r, const char *options, NETDATA_DOUBLE def) { + long entries = r->view.group; + if(entries < 10) entries = 10; + + struct tg_median *g = (struct tg_median *)onewayalloc_callocz(r->internal.owa, 1, sizeof(struct tg_median)); + g->series = onewayalloc_mallocz(r->internal.owa, entries * sizeof(NETDATA_DOUBLE)); + g->series_size = (size_t)entries; + + g->percent = def; + if(options && *options) { + g->percent = str2ndd(options, NULL); + if(!netdata_double_isnumber(g->percent)) g->percent = 0.0; + if(g->percent < 0.0) g->percent = 0.0; + if(g->percent > 50.0) g->percent = 50.0; + } + + g->percent = g->percent / 100.0; + r->time_grouping.data = g; +} + +static inline void tg_median_create(RRDR *r, const char *options) { + tg_median_create_internal(r, options, 0.0); +} +static inline void tg_median_create_trimmed_1(RRDR *r, const char *options) { + tg_median_create_internal(r, options, 1.0); +} +static inline void tg_median_create_trimmed_2(RRDR *r, const char *options) { + tg_median_create_internal(r, options, 2.0); +} +static inline void tg_median_create_trimmed_3(RRDR *r, const char *options) { + tg_median_create_internal(r, options, 3.0); +} +static inline void tg_median_create_trimmed_5(RRDR *r, const char *options) { + tg_median_create_internal(r, options, 5.0); +} +static inline void tg_median_create_trimmed_10(RRDR *r, const char *options) { + tg_median_create_internal(r, options, 10.0); +} +static inline void tg_median_create_trimmed_15(RRDR *r, const char *options) { + tg_median_create_internal(r, options, 15.0); +} +static inline void tg_median_create_trimmed_20(RRDR *r, const char *options) { + tg_median_create_internal(r, options, 20.0); +} +static inline void tg_median_create_trimmed_25(RRDR *r, const char *options) { + tg_median_create_internal(r, options, 25.0); +} + +// resets when switches dimensions +// so, clear everything to restart +static inline void tg_median_reset(RRDR *r) { + struct tg_median *g = (struct tg_median *)r->time_grouping.data; + g->next_pos = 0; +} + +static inline void tg_median_free(RRDR *r) { + struct tg_median *g = (struct tg_median *)r->time_grouping.data; + if(g) onewayalloc_freez(r->internal.owa, g->series); + + onewayalloc_freez(r->internal.owa, r->time_grouping.data); + r->time_grouping.data = NULL; +} + +static inline void tg_median_add(RRDR *r, NETDATA_DOUBLE value) { + struct tg_median *g = (struct tg_median *)r->time_grouping.data; + + if(unlikely(g->next_pos >= g->series_size)) { + g->series = onewayalloc_doublesize( r->internal.owa, g->series, g->series_size * sizeof(NETDATA_DOUBLE)); + g->series_size *= 2; + } + + g->series[g->next_pos++] = value; +} + +static inline NETDATA_DOUBLE tg_median_flush(RRDR *r, RRDR_VALUE_FLAGS *rrdr_value_options_ptr) { + struct tg_median *g = (struct tg_median *)r->time_grouping.data; + + size_t available_slots = g->next_pos; + NETDATA_DOUBLE value; + + if(unlikely(!available_slots)) { + value = 0.0; + *rrdr_value_options_ptr |= RRDR_VALUE_EMPTY; + } + else if(available_slots == 1) { + value = g->series[0]; + } + else { + sort_series(g->series, available_slots); + + size_t start_slot = 0; + size_t end_slot = available_slots - 1; + + if(g->percent > 0.0) { + NETDATA_DOUBLE min = g->series[0]; + NETDATA_DOUBLE max = g->series[available_slots - 1]; + NETDATA_DOUBLE delta = (max - min) * g->percent; + + NETDATA_DOUBLE wanted_min = min + delta; + NETDATA_DOUBLE wanted_max = max - delta; + + for (start_slot = 0; start_slot < available_slots; start_slot++) + if (g->series[start_slot] >= wanted_min) break; + + for (end_slot = available_slots - 1; end_slot > start_slot; end_slot--) + if (g->series[end_slot] <= wanted_max) break; + } + + if(start_slot == end_slot) + value = g->series[start_slot]; + else + value = median_on_sorted_series(&g->series[start_slot], end_slot - start_slot + 1); + } + + if(unlikely(!netdata_double_isnumber(value))) { + value = 0.0; + *rrdr_value_options_ptr |= RRDR_VALUE_EMPTY; + } + + //log_series_to_stderr(g->series, g->next_pos, value, "median"); + + g->next_pos = 0; + + return value; +} + +#endif //NETDATA_API_QUERIES_MEDIAN_H diff --git a/src/web/api/queries/min/README.md b/src/web/api/queries/min/README.md new file mode 100644 index 000000000..35acb8c9e --- /dev/null +++ b/src/web/api/queries/min/README.md @@ -0,0 +1,42 @@ +<!-- +title: "Min" +sidebar_label: "Min" +custom_edit_url: https://github.com/netdata/netdata/edit/master/src/web/api/queries/min/README.md +learn_status: "Published" +learn_topic_type: "References" +learn_rel_path: "Developers/Web/Api/Queries" +--> + +# Min + +This module finds the min value in the time-frame given. + +## how to use + +Use it in alerts like this: + +``` + alarm: my_alert + on: my_chart +lookup: min -1m unaligned of my_dimension + warn: $this > 1000 +``` + +`min` does not change the units. For example, if the chart units is `requests/sec`, the result +will be again expressed in the same units. + +It can also be used in APIs and badges as `&group=min` in the URL. + +## Examples + +Examining last 1 minute `successful` web server responses: + +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=min&after=-60&label=min&value_color=orange) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=average&after=-60&label=average) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=max&after=-60&label=max) + +## References + +- <https://en.wikipedia.org/wiki/Sample_maximum_and_minimum>. + + diff --git a/src/web/api/queries/min/min.c b/src/web/api/queries/min/min.c new file mode 100644 index 000000000..cefa7cf31 --- /dev/null +++ b/src/web/api/queries/min/min.c @@ -0,0 +1,7 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#include "min.h" + +// ---------------------------------------------------------------------------- +// min + diff --git a/src/web/api/queries/min/min.h b/src/web/api/queries/min/min.h new file mode 100644 index 000000000..3c53dfd1d --- /dev/null +++ b/src/web/api/queries/min/min.h @@ -0,0 +1,59 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_API_QUERY_MIN_H +#define NETDATA_API_QUERY_MIN_H + +#include "../query.h" +#include "../rrdr.h" + +struct tg_min { + NETDATA_DOUBLE min; + size_t count; +}; + +static inline void tg_min_create(RRDR *r, const char *options __maybe_unused) { + r->time_grouping.data = onewayalloc_callocz(r->internal.owa, 1, sizeof(struct tg_min)); +} + +// resets when switches dimensions +// so, clear everything to restart +static inline void tg_min_reset(RRDR *r) { + struct tg_min *g = (struct tg_min *)r->time_grouping.data; + g->min = 0; + g->count = 0; +} + +static inline void tg_min_free(RRDR *r) { + onewayalloc_freez(r->internal.owa, r->time_grouping.data); + r->time_grouping.data = NULL; +} + +static inline void tg_min_add(RRDR *r, NETDATA_DOUBLE value) { + struct tg_min *g = (struct tg_min *)r->time_grouping.data; + + if(!g->count || fabsndd(value) < fabsndd(g->min)) { + g->min = value; + g->count++; + } +} + +static inline NETDATA_DOUBLE tg_min_flush(RRDR *r, RRDR_VALUE_FLAGS *rrdr_value_options_ptr) { + struct tg_min *g = (struct tg_min *)r->time_grouping.data; + + NETDATA_DOUBLE value; + + if(unlikely(!g->count)) { + value = 0.0; + *rrdr_value_options_ptr |= RRDR_VALUE_EMPTY; + } + else { + value = g->min; + } + + g->min = 0.0; + g->count = 0; + + return value; +} + +#endif //NETDATA_API_QUERY_MIN_H diff --git a/src/web/api/queries/percentile/README.md b/src/web/api/queries/percentile/README.md new file mode 100644 index 000000000..88abf8d5c --- /dev/null +++ b/src/web/api/queries/percentile/README.md @@ -0,0 +1,62 @@ +<!-- +title: "Percentile" +sidebar_label: "Percentile" +description: "Use percentile in API queries and health entities to find the 'percentile' value from a sample, eliminating any unwanted spikes in the returned metrics." +custom_edit_url: https://github.com/netdata/netdata/edit/master/src/web/api/queries/percentile/README.md +learn_status: "Published" +learn_topic_type: "References" +learn_rel_path: "Developers/Web/Api/Queries" +--> + +# Percentile + +The percentile is the average value of a series using only the smaller N percentile of the values. +(a population or a probability distribution). + +Netdata applies linear interpolation on the last point, if the percentile requested does not give a round number of +points. + +The following percentile aliases are defined: + +- `percentile25` +- `percentile50` +- `percentile75` +- `percentile80` +- `percentile90` +- `percentile95` +- `percentile97` +- `percentile98` +- `percentile99` + +The default `percentile` is an alias for `percentile95`. +Any percentile may be requested using the `group_options` query parameter. + +## how to use + +Use it in alerts like this: + +``` + alarm: my_alert + on: my_chart +lookup: percentile95 -1m unaligned of my_dimension + warn: $this > 1000 +``` + +`percentile` does not change the units. For example, if the chart units is `requests/sec`, the result +will be again expressed in the same units. + +It can also be used in APIs and badges as `&group=percentile` in the URL and the additional parameter `group_options` +may be used to request any percentile (e.g. `&group=percentile&group_options=96`). + +## Examples + +Examining last 1 minute `successful` web server responses: + +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=min&after=-60&label=min) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=average&after=-60&label=average) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=percentile95&after=-60&label=percentile95&value_color=orange) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=max&after=-60&label=max) + +## References + +- <https://en.wikipedia.org/wiki/Percentile>. diff --git a/src/web/api/queries/percentile/percentile.c b/src/web/api/queries/percentile/percentile.c new file mode 100644 index 000000000..da3b32696 --- /dev/null +++ b/src/web/api/queries/percentile/percentile.c @@ -0,0 +1,6 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#include "percentile.h" + +// ---------------------------------------------------------------------------- +// median diff --git a/src/web/api/queries/percentile/percentile.h b/src/web/api/queries/percentile/percentile.h new file mode 100644 index 000000000..0532f9d3f --- /dev/null +++ b/src/web/api/queries/percentile/percentile.h @@ -0,0 +1,172 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_API_QUERIES_PERCENTILE_H +#define NETDATA_API_QUERIES_PERCENTILE_H + +#include "../query.h" +#include "../rrdr.h" + +struct tg_percentile { + size_t series_size; + size_t next_pos; + NETDATA_DOUBLE percent; + + NETDATA_DOUBLE *series; +}; + +static inline void tg_percentile_create_internal(RRDR *r, const char *options, NETDATA_DOUBLE def) { + long entries = r->view.group; + if(entries < 10) entries = 10; + + struct tg_percentile *g = (struct tg_percentile *)onewayalloc_callocz(r->internal.owa, 1, sizeof(struct tg_percentile)); + g->series = onewayalloc_mallocz(r->internal.owa, entries * sizeof(NETDATA_DOUBLE)); + g->series_size = (size_t)entries; + + g->percent = def; + if(options && *options) { + g->percent = str2ndd(options, NULL); + if(!netdata_double_isnumber(g->percent)) g->percent = 0.0; + if(g->percent < 0.0) g->percent = 0.0; + if(g->percent > 100.0) g->percent = 100.0; + } + + g->percent = g->percent / 100.0; + r->time_grouping.data = g; +} + +static inline void tg_percentile_create_25(RRDR *r, const char *options) { + tg_percentile_create_internal(r, options, 25.0); +} +static inline void tg_percentile_create_50(RRDR *r, const char *options) { + tg_percentile_create_internal(r, options, 50.0); +} +static inline void tg_percentile_create_75(RRDR *r, const char *options) { + tg_percentile_create_internal(r, options, 75.0); +} +static inline void tg_percentile_create_80(RRDR *r, const char *options) { + tg_percentile_create_internal(r, options, 80.0); +} +static inline void tg_percentile_create_90(RRDR *r, const char *options) { + tg_percentile_create_internal(r, options, 90.0); +} +static inline void tg_percentile_create_95(RRDR *r, const char *options) { + tg_percentile_create_internal(r, options, 95.0); +} +static inline void tg_percentile_create_97(RRDR *r, const char *options) { + tg_percentile_create_internal(r, options, 97.0); +} +static inline void tg_percentile_create_98(RRDR *r, const char *options) { + tg_percentile_create_internal(r, options, 98.0); +} +static inline void tg_percentile_create_99(RRDR *r, const char *options) { + tg_percentile_create_internal(r, options, 99.0); +} + +// resets when switches dimensions +// so, clear everything to restart +static inline void tg_percentile_reset(RRDR *r) { + struct tg_percentile *g = (struct tg_percentile *)r->time_grouping.data; + g->next_pos = 0; +} + +static inline void tg_percentile_free(RRDR *r) { + struct tg_percentile *g = (struct tg_percentile *)r->time_grouping.data; + if(g) onewayalloc_freez(r->internal.owa, g->series); + + onewayalloc_freez(r->internal.owa, r->time_grouping.data); + r->time_grouping.data = NULL; +} + +static inline void tg_percentile_add(RRDR *r, NETDATA_DOUBLE value) { + struct tg_percentile *g = (struct tg_percentile *)r->time_grouping.data; + + if(unlikely(g->next_pos >= g->series_size)) { + g->series = onewayalloc_doublesize( r->internal.owa, g->series, g->series_size * sizeof(NETDATA_DOUBLE)); + g->series_size *= 2; + } + + g->series[g->next_pos++] = value; +} + +static inline NETDATA_DOUBLE tg_percentile_flush(RRDR *r, RRDR_VALUE_FLAGS *rrdr_value_options_ptr) { + struct tg_percentile *g = (struct tg_percentile *)r->time_grouping.data; + + NETDATA_DOUBLE value; + size_t available_slots = g->next_pos; + + if(unlikely(!available_slots)) { + value = 0.0; + *rrdr_value_options_ptr |= RRDR_VALUE_EMPTY; + } + else if(available_slots == 1) { + value = g->series[0]; + } + else { + sort_series(g->series, available_slots); + + NETDATA_DOUBLE min = g->series[0]; + NETDATA_DOUBLE max = g->series[available_slots - 1]; + + if (min != max) { + size_t slots_to_use = (size_t)((NETDATA_DOUBLE)available_slots * g->percent); + if(!slots_to_use) slots_to_use = 1; + + NETDATA_DOUBLE percent_to_use = (NETDATA_DOUBLE)slots_to_use / (NETDATA_DOUBLE)available_slots; + NETDATA_DOUBLE percent_delta = g->percent - percent_to_use; + + NETDATA_DOUBLE percent_interpolation_slot = 0.0; + NETDATA_DOUBLE percent_last_slot = 0.0; + if(percent_delta > 0.0) { + NETDATA_DOUBLE percent_to_use_plus_1_slot = (NETDATA_DOUBLE)(slots_to_use + 1) / (NETDATA_DOUBLE)available_slots; + NETDATA_DOUBLE percent_1slot = percent_to_use_plus_1_slot - percent_to_use; + + percent_interpolation_slot = percent_delta / percent_1slot; + percent_last_slot = 1 - percent_interpolation_slot; + } + + int start_slot, stop_slot, step, last_slot, interpolation_slot; + if(min >= 0.0 && max >= 0.0) { + start_slot = 0; + stop_slot = start_slot + (int)slots_to_use; + last_slot = stop_slot - 1; + interpolation_slot = stop_slot; + step = 1; + } + else { + start_slot = (int)available_slots - 1; + stop_slot = start_slot - (int)slots_to_use; + last_slot = stop_slot + 1; + interpolation_slot = stop_slot; + step = -1; + } + + value = 0.0; + for(int slot = start_slot; slot != stop_slot ; slot += step) + value += g->series[slot]; + + size_t counted = slots_to_use; + if(percent_interpolation_slot > 0.0 && interpolation_slot >= 0 && interpolation_slot < (int)available_slots) { + value += g->series[interpolation_slot] * percent_interpolation_slot; + value += g->series[last_slot] * percent_last_slot; + counted++; + } + + value = value / (NETDATA_DOUBLE)counted; + } + else + value = min; + } + + if(unlikely(!netdata_double_isnumber(value))) { + value = 0.0; + *rrdr_value_options_ptr |= RRDR_VALUE_EMPTY; + } + + //log_series_to_stderr(g->series, g->next_pos, value, "percentile"); + + g->next_pos = 0; + + return value; +} + +#endif //NETDATA_API_QUERIES_PERCENTILE_H diff --git a/src/web/api/queries/query.c b/src/web/api/queries/query.c new file mode 100644 index 000000000..c97b546b1 --- /dev/null +++ b/src/web/api/queries/query.c @@ -0,0 +1,3736 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#include "query.h" +#include "web/api/formatters/rrd2json.h" +#include "rrdr.h" + +#include "average/average.h" +#include "countif/countif.h" +#include "incremental_sum/incremental_sum.h" +#include "max/max.h" +#include "median/median.h" +#include "min/min.h" +#include "sum/sum.h" +#include "stddev/stddev.h" +#include "ses/ses.h" +#include "des/des.h" +#include "percentile/percentile.h" +#include "trimmed_mean/trimmed_mean.h" + +#define QUERY_PLAN_MIN_POINTS 10 +#define POINTS_TO_EXPAND_QUERY 5 + +// ---------------------------------------------------------------------------- + +static struct { + const char *name; + uint32_t hash; + RRDR_TIME_GROUPING value; + RRDR_TIME_GROUPING add_flush; + + // One time initialization for the module. + // This is called once, when netdata starts. + void (*init)(void); + + // Allocate all required structures for a query. + // This is called once for each netdata query. + void (*create)(struct rrdresult *r, const char *options); + + // Cleanup collected values, but don't destroy the structures. + // This is called when the query engine switches dimensions, + // as part of the same query (so same chart, switching metric). + void (*reset)(struct rrdresult *r); + + // Free all resources allocated for the query. + void (*free)(struct rrdresult *r); + + // Add a single value into the calculation. + // The module may decide to cache it, or use it in the fly. + void (*add)(struct rrdresult *r, NETDATA_DOUBLE value); + + // Generate a single result for the values added so far. + // More values and points may be requested later. + // It is up to the module to reset its internal structures + // when flushing it (so for a few modules it may be better to + // continue after a flush as if nothing changed, for others a + // cleanup of the internal structures may be required). + NETDATA_DOUBLE (*flush)(struct rrdresult *r, RRDR_VALUE_FLAGS *rrdr_value_options_ptr); + + TIER_QUERY_FETCH tier_query_fetch; +} api_v1_data_groups[] = { + {.name = "average", + .hash = 0, + .value = RRDR_GROUPING_AVERAGE, + .add_flush = RRDR_GROUPING_AVERAGE, + .init = NULL, + .create= tg_average_create, + .reset = tg_average_reset, + .free = tg_average_free, + .add = tg_average_add, + .flush = tg_average_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "avg", // alias on 'average' + .hash = 0, + .value = RRDR_GROUPING_AVERAGE, + .add_flush = RRDR_GROUPING_AVERAGE, + .init = NULL, + .create= tg_average_create, + .reset = tg_average_reset, + .free = tg_average_free, + .add = tg_average_add, + .flush = tg_average_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "mean", // alias on 'average' + .hash = 0, + .value = RRDR_GROUPING_AVERAGE, + .add_flush = RRDR_GROUPING_AVERAGE, + .init = NULL, + .create= tg_average_create, + .reset = tg_average_reset, + .free = tg_average_free, + .add = tg_average_add, + .flush = tg_average_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "trimmed-mean1", + .hash = 0, + .value = RRDR_GROUPING_TRIMMED_MEAN1, + .add_flush = RRDR_GROUPING_TRIMMED_MEAN, + .init = NULL, + .create= tg_trimmed_mean_create_1, + .reset = tg_trimmed_mean_reset, + .free = tg_trimmed_mean_free, + .add = tg_trimmed_mean_add, + .flush = tg_trimmed_mean_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "trimmed-mean2", + .hash = 0, + .value = RRDR_GROUPING_TRIMMED_MEAN2, + .add_flush = RRDR_GROUPING_TRIMMED_MEAN, + .init = NULL, + .create= tg_trimmed_mean_create_2, + .reset = tg_trimmed_mean_reset, + .free = tg_trimmed_mean_free, + .add = tg_trimmed_mean_add, + .flush = tg_trimmed_mean_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "trimmed-mean3", + .hash = 0, + .value = RRDR_GROUPING_TRIMMED_MEAN3, + .add_flush = RRDR_GROUPING_TRIMMED_MEAN, + .init = NULL, + .create= tg_trimmed_mean_create_3, + .reset = tg_trimmed_mean_reset, + .free = tg_trimmed_mean_free, + .add = tg_trimmed_mean_add, + .flush = tg_trimmed_mean_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "trimmed-mean5", + .hash = 0, + .value = RRDR_GROUPING_TRIMMED_MEAN, + .add_flush = RRDR_GROUPING_TRIMMED_MEAN, + .init = NULL, + .create= tg_trimmed_mean_create_5, + .reset = tg_trimmed_mean_reset, + .free = tg_trimmed_mean_free, + .add = tg_trimmed_mean_add, + .flush = tg_trimmed_mean_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "trimmed-mean10", + .hash = 0, + .value = RRDR_GROUPING_TRIMMED_MEAN10, + .add_flush = RRDR_GROUPING_TRIMMED_MEAN, + .init = NULL, + .create= tg_trimmed_mean_create_10, + .reset = tg_trimmed_mean_reset, + .free = tg_trimmed_mean_free, + .add = tg_trimmed_mean_add, + .flush = tg_trimmed_mean_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "trimmed-mean15", + .hash = 0, + .value = RRDR_GROUPING_TRIMMED_MEAN15, + .add_flush = RRDR_GROUPING_TRIMMED_MEAN, + .init = NULL, + .create= tg_trimmed_mean_create_15, + .reset = tg_trimmed_mean_reset, + .free = tg_trimmed_mean_free, + .add = tg_trimmed_mean_add, + .flush = tg_trimmed_mean_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "trimmed-mean20", + .hash = 0, + .value = RRDR_GROUPING_TRIMMED_MEAN20, + .add_flush = RRDR_GROUPING_TRIMMED_MEAN, + .init = NULL, + .create= tg_trimmed_mean_create_20, + .reset = tg_trimmed_mean_reset, + .free = tg_trimmed_mean_free, + .add = tg_trimmed_mean_add, + .flush = tg_trimmed_mean_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "trimmed-mean25", + .hash = 0, + .value = RRDR_GROUPING_TRIMMED_MEAN25, + .add_flush = RRDR_GROUPING_TRIMMED_MEAN, + .init = NULL, + .create= tg_trimmed_mean_create_25, + .reset = tg_trimmed_mean_reset, + .free = tg_trimmed_mean_free, + .add = tg_trimmed_mean_add, + .flush = tg_trimmed_mean_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "trimmed-mean", + .hash = 0, + .value = RRDR_GROUPING_TRIMMED_MEAN, + .add_flush = RRDR_GROUPING_TRIMMED_MEAN, + .init = NULL, + .create= tg_trimmed_mean_create_5, + .reset = tg_trimmed_mean_reset, + .free = tg_trimmed_mean_free, + .add = tg_trimmed_mean_add, + .flush = tg_trimmed_mean_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "incremental_sum", + .hash = 0, + .value = RRDR_GROUPING_INCREMENTAL_SUM, + .add_flush = RRDR_GROUPING_INCREMENTAL_SUM, + .init = NULL, + .create= tg_incremental_sum_create, + .reset = tg_incremental_sum_reset, + .free = tg_incremental_sum_free, + .add = tg_incremental_sum_add, + .flush = tg_incremental_sum_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "incremental-sum", + .hash = 0, + .value = RRDR_GROUPING_INCREMENTAL_SUM, + .add_flush = RRDR_GROUPING_INCREMENTAL_SUM, + .init = NULL, + .create= tg_incremental_sum_create, + .reset = tg_incremental_sum_reset, + .free = tg_incremental_sum_free, + .add = tg_incremental_sum_add, + .flush = tg_incremental_sum_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "median", + .hash = 0, + .value = RRDR_GROUPING_MEDIAN, + .add_flush = RRDR_GROUPING_MEDIAN, + .init = NULL, + .create= tg_median_create, + .reset = tg_median_reset, + .free = tg_median_free, + .add = tg_median_add, + .flush = tg_median_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "trimmed-median1", + .hash = 0, + .value = RRDR_GROUPING_TRIMMED_MEDIAN1, + .add_flush = RRDR_GROUPING_MEDIAN, + .init = NULL, + .create= tg_median_create_trimmed_1, + .reset = tg_median_reset, + .free = tg_median_free, + .add = tg_median_add, + .flush = tg_median_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "trimmed-median2", + .hash = 0, + .value = RRDR_GROUPING_TRIMMED_MEDIAN2, + .add_flush = RRDR_GROUPING_MEDIAN, + .init = NULL, + .create= tg_median_create_trimmed_2, + .reset = tg_median_reset, + .free = tg_median_free, + .add = tg_median_add, + .flush = tg_median_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "trimmed-median3", + .hash = 0, + .value = RRDR_GROUPING_TRIMMED_MEDIAN3, + .add_flush = RRDR_GROUPING_MEDIAN, + .init = NULL, + .create= tg_median_create_trimmed_3, + .reset = tg_median_reset, + .free = tg_median_free, + .add = tg_median_add, + .flush = tg_median_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "trimmed-median5", + .hash = 0, + .value = RRDR_GROUPING_TRIMMED_MEDIAN, + .add_flush = RRDR_GROUPING_MEDIAN, + .init = NULL, + .create= tg_median_create_trimmed_5, + .reset = tg_median_reset, + .free = tg_median_free, + .add = tg_median_add, + .flush = tg_median_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "trimmed-median10", + .hash = 0, + .value = RRDR_GROUPING_TRIMMED_MEDIAN10, + .add_flush = RRDR_GROUPING_MEDIAN, + .init = NULL, + .create= tg_median_create_trimmed_10, + .reset = tg_median_reset, + .free = tg_median_free, + .add = tg_median_add, + .flush = tg_median_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "trimmed-median15", + .hash = 0, + .value = RRDR_GROUPING_TRIMMED_MEDIAN15, + .add_flush = RRDR_GROUPING_MEDIAN, + .init = NULL, + .create= tg_median_create_trimmed_15, + .reset = tg_median_reset, + .free = tg_median_free, + .add = tg_median_add, + .flush = tg_median_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "trimmed-median20", + .hash = 0, + .value = RRDR_GROUPING_TRIMMED_MEDIAN20, + .add_flush = RRDR_GROUPING_MEDIAN, + .init = NULL, + .create= tg_median_create_trimmed_20, + .reset = tg_median_reset, + .free = tg_median_free, + .add = tg_median_add, + .flush = tg_median_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "trimmed-median25", + .hash = 0, + .value = RRDR_GROUPING_TRIMMED_MEDIAN25, + .add_flush = RRDR_GROUPING_MEDIAN, + .init = NULL, + .create= tg_median_create_trimmed_25, + .reset = tg_median_reset, + .free = tg_median_free, + .add = tg_median_add, + .flush = tg_median_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "trimmed-median", + .hash = 0, + .value = RRDR_GROUPING_TRIMMED_MEDIAN, + .add_flush = RRDR_GROUPING_MEDIAN, + .init = NULL, + .create= tg_median_create_trimmed_5, + .reset = tg_median_reset, + .free = tg_median_free, + .add = tg_median_add, + .flush = tg_median_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "percentile25", + .hash = 0, + .value = RRDR_GROUPING_PERCENTILE25, + .add_flush = RRDR_GROUPING_PERCENTILE, + .init = NULL, + .create= tg_percentile_create_25, + .reset = tg_percentile_reset, + .free = tg_percentile_free, + .add = tg_percentile_add, + .flush = tg_percentile_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "percentile50", + .hash = 0, + .value = RRDR_GROUPING_PERCENTILE50, + .add_flush = RRDR_GROUPING_PERCENTILE, + .init = NULL, + .create= tg_percentile_create_50, + .reset = tg_percentile_reset, + .free = tg_percentile_free, + .add = tg_percentile_add, + .flush = tg_percentile_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "percentile75", + .hash = 0, + .value = RRDR_GROUPING_PERCENTILE75, + .add_flush = RRDR_GROUPING_PERCENTILE, + .init = NULL, + .create= tg_percentile_create_75, + .reset = tg_percentile_reset, + .free = tg_percentile_free, + .add = tg_percentile_add, + .flush = tg_percentile_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "percentile80", + .hash = 0, + .value = RRDR_GROUPING_PERCENTILE80, + .add_flush = RRDR_GROUPING_PERCENTILE, + .init = NULL, + .create= tg_percentile_create_80, + .reset = tg_percentile_reset, + .free = tg_percentile_free, + .add = tg_percentile_add, + .flush = tg_percentile_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "percentile90", + .hash = 0, + .value = RRDR_GROUPING_PERCENTILE90, + .add_flush = RRDR_GROUPING_PERCENTILE, + .init = NULL, + .create= tg_percentile_create_90, + .reset = tg_percentile_reset, + .free = tg_percentile_free, + .add = tg_percentile_add, + .flush = tg_percentile_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "percentile95", + .hash = 0, + .value = RRDR_GROUPING_PERCENTILE, + .add_flush = RRDR_GROUPING_PERCENTILE, + .init = NULL, + .create= tg_percentile_create_95, + .reset = tg_percentile_reset, + .free = tg_percentile_free, + .add = tg_percentile_add, + .flush = tg_percentile_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "percentile97", + .hash = 0, + .value = RRDR_GROUPING_PERCENTILE97, + .add_flush = RRDR_GROUPING_PERCENTILE, + .init = NULL, + .create= tg_percentile_create_97, + .reset = tg_percentile_reset, + .free = tg_percentile_free, + .add = tg_percentile_add, + .flush = tg_percentile_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "percentile98", + .hash = 0, + .value = RRDR_GROUPING_PERCENTILE98, + .add_flush = RRDR_GROUPING_PERCENTILE, + .init = NULL, + .create= tg_percentile_create_98, + .reset = tg_percentile_reset, + .free = tg_percentile_free, + .add = tg_percentile_add, + .flush = tg_percentile_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "percentile99", + .hash = 0, + .value = RRDR_GROUPING_PERCENTILE99, + .add_flush = RRDR_GROUPING_PERCENTILE, + .init = NULL, + .create= tg_percentile_create_99, + .reset = tg_percentile_reset, + .free = tg_percentile_free, + .add = tg_percentile_add, + .flush = tg_percentile_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "percentile", + .hash = 0, + .value = RRDR_GROUPING_PERCENTILE, + .add_flush = RRDR_GROUPING_PERCENTILE, + .init = NULL, + .create= tg_percentile_create_95, + .reset = tg_percentile_reset, + .free = tg_percentile_free, + .add = tg_percentile_add, + .flush = tg_percentile_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "min", + .hash = 0, + .value = RRDR_GROUPING_MIN, + .add_flush = RRDR_GROUPING_MIN, + .init = NULL, + .create= tg_min_create, + .reset = tg_min_reset, + .free = tg_min_free, + .add = tg_min_add, + .flush = tg_min_flush, + .tier_query_fetch = TIER_QUERY_FETCH_MIN + }, + {.name = "max", + .hash = 0, + .value = RRDR_GROUPING_MAX, + .add_flush = RRDR_GROUPING_MAX, + .init = NULL, + .create= tg_max_create, + .reset = tg_max_reset, + .free = tg_max_free, + .add = tg_max_add, + .flush = tg_max_flush, + .tier_query_fetch = TIER_QUERY_FETCH_MAX + }, + {.name = "sum", + .hash = 0, + .value = RRDR_GROUPING_SUM, + .add_flush = RRDR_GROUPING_SUM, + .init = NULL, + .create= tg_sum_create, + .reset = tg_sum_reset, + .free = tg_sum_free, + .add = tg_sum_add, + .flush = tg_sum_flush, + .tier_query_fetch = TIER_QUERY_FETCH_SUM + }, + + // standard deviation + {.name = "stddev", + .hash = 0, + .value = RRDR_GROUPING_STDDEV, + .add_flush = RRDR_GROUPING_STDDEV, + .init = NULL, + .create= tg_stddev_create, + .reset = tg_stddev_reset, + .free = tg_stddev_free, + .add = tg_stddev_add, + .flush = tg_stddev_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "cv", // coefficient variation is calculated by stddev + .hash = 0, + .value = RRDR_GROUPING_CV, + .add_flush = RRDR_GROUPING_CV, + .init = NULL, + .create= tg_stddev_create, // not an error, stddev calculates this too + .reset = tg_stddev_reset, // not an error, stddev calculates this too + .free = tg_stddev_free, // not an error, stddev calculates this too + .add = tg_stddev_add, // not an error, stddev calculates this too + .flush = tg_stddev_coefficient_of_variation_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "rsd", // alias of 'cv' + .hash = 0, + .value = RRDR_GROUPING_CV, + .add_flush = RRDR_GROUPING_CV, + .init = NULL, + .create= tg_stddev_create, // not an error, stddev calculates this too + .reset = tg_stddev_reset, // not an error, stddev calculates this too + .free = tg_stddev_free, // not an error, stddev calculates this too + .add = tg_stddev_add, // not an error, stddev calculates this too + .flush = tg_stddev_coefficient_of_variation_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + + // single exponential smoothing + {.name = "ses", + .hash = 0, + .value = RRDR_GROUPING_SES, + .add_flush = RRDR_GROUPING_SES, + .init = tg_ses_init, + .create= tg_ses_create, + .reset = tg_ses_reset, + .free = tg_ses_free, + .add = tg_ses_add, + .flush = tg_ses_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "ema", // alias for 'ses' + .hash = 0, + .value = RRDR_GROUPING_SES, + .add_flush = RRDR_GROUPING_SES, + .init = NULL, + .create= tg_ses_create, + .reset = tg_ses_reset, + .free = tg_ses_free, + .add = tg_ses_add, + .flush = tg_ses_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + {.name = "ewma", // alias for ses + .hash = 0, + .value = RRDR_GROUPING_SES, + .add_flush = RRDR_GROUPING_SES, + .init = NULL, + .create= tg_ses_create, + .reset = tg_ses_reset, + .free = tg_ses_free, + .add = tg_ses_add, + .flush = tg_ses_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + + // double exponential smoothing + {.name = "des", + .hash = 0, + .value = RRDR_GROUPING_DES, + .add_flush = RRDR_GROUPING_DES, + .init = tg_des_init, + .create= tg_des_create, + .reset = tg_des_reset, + .free = tg_des_free, + .add = tg_des_add, + .flush = tg_des_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + + {.name = "countif", + .hash = 0, + .value = RRDR_GROUPING_COUNTIF, + .add_flush = RRDR_GROUPING_COUNTIF, + .init = NULL, + .create= tg_countif_create, + .reset = tg_countif_reset, + .free = tg_countif_free, + .add = tg_countif_add, + .flush = tg_countif_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + }, + + // terminator + {.name = NULL, + .hash = 0, + .value = RRDR_GROUPING_UNDEFINED, + .add_flush = RRDR_GROUPING_AVERAGE, + .init = NULL, + .create= tg_average_create, + .reset = tg_average_reset, + .free = tg_average_free, + .add = tg_average_add, + .flush = tg_average_flush, + .tier_query_fetch = TIER_QUERY_FETCH_AVERAGE + } +}; + +void time_grouping_init(void) { + int i; + + for(i = 0; api_v1_data_groups[i].name ; i++) { + api_v1_data_groups[i].hash = simple_hash(api_v1_data_groups[i].name); + + if(api_v1_data_groups[i].init) + api_v1_data_groups[i].init(); + } +} + +const char *time_grouping_id2txt(RRDR_TIME_GROUPING group) { + int i; + + for(i = 0; api_v1_data_groups[i].name ; i++) { + if(api_v1_data_groups[i].value == group) { + return api_v1_data_groups[i].name; + } + } + + return "average"; +} + +RRDR_TIME_GROUPING time_grouping_txt2id(const char *name) { + int i; + + uint32_t hash = simple_hash(name); + for(i = 0; api_v1_data_groups[i].name ; i++) + if(unlikely(hash == api_v1_data_groups[i].hash && !strcmp(name, api_v1_data_groups[i].name))) + return api_v1_data_groups[i].value; + + return RRDR_GROUPING_AVERAGE; +} + +RRDR_TIME_GROUPING time_grouping_parse(const char *name, RRDR_TIME_GROUPING def) { + int i; + + uint32_t hash = simple_hash(name); + for(i = 0; api_v1_data_groups[i].name ; i++) + if(unlikely(hash == api_v1_data_groups[i].hash && !strcmp(name, api_v1_data_groups[i].name))) + return api_v1_data_groups[i].value; + + return def; +} + +const char *time_grouping_tostring(RRDR_TIME_GROUPING group) { + int i; + + for(i = 0; api_v1_data_groups[i].name ; i++) + if(unlikely(group == api_v1_data_groups[i].value)) + return api_v1_data_groups[i].name; + + return "unknown"; +} + +static void rrdr_set_grouping_function(RRDR *r, RRDR_TIME_GROUPING group_method) { + int i, found = 0; + for(i = 0; !found && api_v1_data_groups[i].name ;i++) { + if(api_v1_data_groups[i].value == group_method) { + r->time_grouping.create = api_v1_data_groups[i].create; + r->time_grouping.reset = api_v1_data_groups[i].reset; + r->time_grouping.free = api_v1_data_groups[i].free; + r->time_grouping.add = api_v1_data_groups[i].add; + r->time_grouping.flush = api_v1_data_groups[i].flush; + r->time_grouping.tier_query_fetch = api_v1_data_groups[i].tier_query_fetch; + r->time_grouping.add_flush = api_v1_data_groups[i].add_flush; + found = 1; + } + } + if(!found) { + errno = 0; + internal_error(true, "QUERY: grouping method %u not found. Using 'average'", (unsigned int)group_method); + r->time_grouping.create = tg_average_create; + r->time_grouping.reset = tg_average_reset; + r->time_grouping.free = tg_average_free; + r->time_grouping.add = tg_average_add; + r->time_grouping.flush = tg_average_flush; + r->time_grouping.tier_query_fetch = TIER_QUERY_FETCH_AVERAGE; + r->time_grouping.add_flush = RRDR_GROUPING_AVERAGE; + } +} + +static inline void time_grouping_add(RRDR *r, NETDATA_DOUBLE value, const RRDR_TIME_GROUPING add_flush) { + switch(add_flush) { + case RRDR_GROUPING_AVERAGE: + tg_average_add(r, value); + break; + + case RRDR_GROUPING_MAX: + tg_max_add(r, value); + break; + + case RRDR_GROUPING_MIN: + tg_min_add(r, value); + break; + + case RRDR_GROUPING_MEDIAN: + tg_median_add(r, value); + break; + + case RRDR_GROUPING_STDDEV: + case RRDR_GROUPING_CV: + tg_stddev_add(r, value); + break; + + case RRDR_GROUPING_SUM: + tg_sum_add(r, value); + break; + + case RRDR_GROUPING_COUNTIF: + tg_countif_add(r, value); + break; + + case RRDR_GROUPING_TRIMMED_MEAN: + tg_trimmed_mean_add(r, value); + break; + + case RRDR_GROUPING_PERCENTILE: + tg_percentile_add(r, value); + break; + + case RRDR_GROUPING_SES: + tg_ses_add(r, value); + break; + + case RRDR_GROUPING_DES: + tg_des_add(r, value); + break; + + case RRDR_GROUPING_INCREMENTAL_SUM: + tg_incremental_sum_add(r, value); + break; + + default: + r->time_grouping.add(r, value); + break; + } +} + +static inline NETDATA_DOUBLE time_grouping_flush(RRDR *r, RRDR_VALUE_FLAGS *rrdr_value_options_ptr, const RRDR_TIME_GROUPING add_flush) { + switch(add_flush) { + case RRDR_GROUPING_AVERAGE: + return tg_average_flush(r, rrdr_value_options_ptr); + + case RRDR_GROUPING_MAX: + return tg_max_flush(r, rrdr_value_options_ptr); + + case RRDR_GROUPING_MIN: + return tg_min_flush(r, rrdr_value_options_ptr); + + case RRDR_GROUPING_MEDIAN: + return tg_median_flush(r, rrdr_value_options_ptr); + + case RRDR_GROUPING_STDDEV: + return tg_stddev_flush(r, rrdr_value_options_ptr); + + case RRDR_GROUPING_CV: + return tg_stddev_coefficient_of_variation_flush(r, rrdr_value_options_ptr); + + case RRDR_GROUPING_SUM: + return tg_sum_flush(r, rrdr_value_options_ptr); + + case RRDR_GROUPING_COUNTIF: + return tg_countif_flush(r, rrdr_value_options_ptr); + + case RRDR_GROUPING_TRIMMED_MEAN: + return tg_trimmed_mean_flush(r, rrdr_value_options_ptr); + + case RRDR_GROUPING_PERCENTILE: + return tg_percentile_flush(r, rrdr_value_options_ptr); + + case RRDR_GROUPING_SES: + return tg_ses_flush(r, rrdr_value_options_ptr); + + case RRDR_GROUPING_DES: + return tg_des_flush(r, rrdr_value_options_ptr); + + case RRDR_GROUPING_INCREMENTAL_SUM: + return tg_incremental_sum_flush(r, rrdr_value_options_ptr); + + default: + return r->time_grouping.flush(r, rrdr_value_options_ptr); + } +} + +RRDR_GROUP_BY group_by_parse(char *s) { + RRDR_GROUP_BY group_by = RRDR_GROUP_BY_NONE; + + while(s) { + char *key = strsep_skip_consecutive_separators(&s, ",| "); + if (!key || !*key) continue; + + if (strcmp(key, "selected") == 0) + group_by |= RRDR_GROUP_BY_SELECTED; + + if (strcmp(key, "dimension") == 0) + group_by |= RRDR_GROUP_BY_DIMENSION; + + if (strcmp(key, "instance") == 0) + group_by |= RRDR_GROUP_BY_INSTANCE; + + if (strcmp(key, "percentage-of-instance") == 0) + group_by |= RRDR_GROUP_BY_PERCENTAGE_OF_INSTANCE; + + if (strcmp(key, "label") == 0) + group_by |= RRDR_GROUP_BY_LABEL; + + if (strcmp(key, "node") == 0) + group_by |= RRDR_GROUP_BY_NODE; + + if (strcmp(key, "context") == 0) + group_by |= RRDR_GROUP_BY_CONTEXT; + + if (strcmp(key, "units") == 0) + group_by |= RRDR_GROUP_BY_UNITS; + } + + if((group_by & RRDR_GROUP_BY_SELECTED) && (group_by & ~RRDR_GROUP_BY_SELECTED)) { + internal_error(true, "group-by given by query has 'selected' together with more groupings"); + group_by = RRDR_GROUP_BY_SELECTED; // remove all other groupings + } + + if(group_by & RRDR_GROUP_BY_PERCENTAGE_OF_INSTANCE) + group_by = RRDR_GROUP_BY_PERCENTAGE_OF_INSTANCE; // remove all other groupings + + return group_by; +} + +void buffer_json_group_by_to_array(BUFFER *wb, RRDR_GROUP_BY group_by) { + if(group_by == RRDR_GROUP_BY_NONE) + buffer_json_add_array_item_string(wb, "none"); + else { + if (group_by & RRDR_GROUP_BY_DIMENSION) + buffer_json_add_array_item_string(wb, "dimension"); + + if (group_by & RRDR_GROUP_BY_INSTANCE) + buffer_json_add_array_item_string(wb, "instance"); + + if (group_by & RRDR_GROUP_BY_PERCENTAGE_OF_INSTANCE) + buffer_json_add_array_item_string(wb, "percentage-of-instance"); + + if (group_by & RRDR_GROUP_BY_LABEL) + buffer_json_add_array_item_string(wb, "label"); + + if (group_by & RRDR_GROUP_BY_NODE) + buffer_json_add_array_item_string(wb, "node"); + + if (group_by & RRDR_GROUP_BY_CONTEXT) + buffer_json_add_array_item_string(wb, "context"); + + if (group_by & RRDR_GROUP_BY_UNITS) + buffer_json_add_array_item_string(wb, "units"); + + if (group_by & RRDR_GROUP_BY_SELECTED) + buffer_json_add_array_item_string(wb, "selected"); + } +} + +RRDR_GROUP_BY_FUNCTION group_by_aggregate_function_parse(const char *s) { + if(strcmp(s, "average") == 0) + return RRDR_GROUP_BY_FUNCTION_AVERAGE; + + if(strcmp(s, "avg") == 0) + return RRDR_GROUP_BY_FUNCTION_AVERAGE; + + if(strcmp(s, "min") == 0) + return RRDR_GROUP_BY_FUNCTION_MIN; + + if(strcmp(s, "max") == 0) + return RRDR_GROUP_BY_FUNCTION_MAX; + + if(strcmp(s, "sum") == 0) + return RRDR_GROUP_BY_FUNCTION_SUM; + + if(strcmp(s, "percentage") == 0) + return RRDR_GROUP_BY_FUNCTION_PERCENTAGE; + + return RRDR_GROUP_BY_FUNCTION_AVERAGE; +} + +const char *group_by_aggregate_function_to_string(RRDR_GROUP_BY_FUNCTION group_by_function) { + switch(group_by_function) { + default: + case RRDR_GROUP_BY_FUNCTION_AVERAGE: + return "average"; + + case RRDR_GROUP_BY_FUNCTION_MIN: + return "min"; + + case RRDR_GROUP_BY_FUNCTION_MAX: + return "max"; + + case RRDR_GROUP_BY_FUNCTION_SUM: + return "sum"; + + case RRDR_GROUP_BY_FUNCTION_PERCENTAGE: + return "percentage"; + } +} + +// ---------------------------------------------------------------------------- +// helpers to find our way in RRDR + +static inline RRDR_VALUE_FLAGS *UNUSED_FUNCTION(rrdr_line_options)(RRDR *r, long rrdr_line) { + return &r->o[ rrdr_line * r->d ]; +} + +static inline NETDATA_DOUBLE *UNUSED_FUNCTION(rrdr_line_values)(RRDR *r, long rrdr_line) { + return &r->v[ rrdr_line * r->d ]; +} + +static inline long rrdr_line_init(RRDR *r __maybe_unused, time_t t __maybe_unused, long rrdr_line) { + rrdr_line++; + + internal_fatal(rrdr_line >= (long)r->n, + "QUERY: requested to step above RRDR size for query '%s'", + r->internal.qt->id); + + internal_fatal(r->t[rrdr_line] != t, + "QUERY: wrong timestamp at RRDR line %ld, expected %ld, got %ld, of query '%s'", + rrdr_line, r->t[rrdr_line], t, r->internal.qt->id); + + return rrdr_line; +} + +// ---------------------------------------------------------------------------- +// tier management + +static bool query_metric_is_valid_tier(QUERY_METRIC *qm, size_t tier) { + if(!qm->tiers[tier].smh || !qm->tiers[tier].db_first_time_s || !qm->tiers[tier].db_last_time_s || !qm->tiers[tier].db_update_every_s) + return false; + + return true; +} + +static size_t query_metric_first_working_tier(QUERY_METRIC *qm) { + for(size_t tier = 0; tier < storage_tiers ; tier++) { + + // find the db time-range for this tier for all metrics + STORAGE_METRIC_HANDLE *smh = qm->tiers[tier].smh; + time_t first_time_s = qm->tiers[tier].db_first_time_s; + time_t last_time_s = qm->tiers[tier].db_last_time_s; + time_t update_every_s = qm->tiers[tier].db_update_every_s; + + if(!smh || !first_time_s || !last_time_s || !update_every_s) + continue; + + return tier; + } + + return 0; +} + +static long query_plan_points_coverage_weight(time_t db_first_time_s, time_t db_last_time_s, time_t db_update_every_s, time_t after_wanted, time_t before_wanted, size_t points_wanted, size_t tier __maybe_unused) { + if(db_first_time_s == 0 || + db_last_time_s == 0 || + db_update_every_s == 0 || + db_first_time_s > before_wanted || + db_last_time_s < after_wanted) + return -LONG_MAX; + + long long common_first_t = MAX(db_first_time_s, after_wanted); + long long common_last_t = MIN(db_last_time_s, before_wanted); + + long long time_coverage = (common_last_t - common_first_t) * 1000000LL / (before_wanted - after_wanted); + long long points_wanted_in_coverage = (long long)points_wanted * time_coverage / 1000000LL; + + long long points_available = (common_last_t - common_first_t) / db_update_every_s; + long long points_delta = (long)(points_available - points_wanted_in_coverage); + long long points_coverage = (points_delta < 0) ? (long)(points_available * time_coverage / points_wanted_in_coverage) : time_coverage; + + // a way to benefit higher tiers + // points_coverage += (long)tier * 10000; + + if(points_available <= 0) + return -LONG_MAX; + + return (long)(points_coverage + (25000LL * tier)); // 2.5% benefit for each higher tier +} + +static size_t query_metric_best_tier_for_timeframe(QUERY_METRIC *qm, time_t after_wanted, time_t before_wanted, size_t points_wanted) { + if(unlikely(storage_tiers < 2)) + return 0; + + if(unlikely(after_wanted == before_wanted || points_wanted <= 0)) + return query_metric_first_working_tier(qm); + + if(points_wanted < QUERY_PLAN_MIN_POINTS) + // when selecting tiers, aim for a resolution of at least QUERY_PLAN_MIN_POINTS points + points_wanted = (before_wanted - after_wanted) > QUERY_PLAN_MIN_POINTS ? QUERY_PLAN_MIN_POINTS : before_wanted - after_wanted; + + time_t min_first_time_s = 0; + time_t max_last_time_s = 0; + + for(size_t tier = 0; tier < storage_tiers ; tier++) { + time_t first_time_s = qm->tiers[tier].db_first_time_s; + time_t last_time_s = qm->tiers[tier].db_last_time_s; + + if(!min_first_time_s || (first_time_s && first_time_s < min_first_time_s)) + min_first_time_s = first_time_s; + + if(!max_last_time_s || (last_time_s && last_time_s > max_last_time_s)) + max_last_time_s = last_time_s; + } + + for(size_t tier = 0; tier < storage_tiers ; tier++) { + + // find the db time-range for this tier for all metrics + STORAGE_METRIC_HANDLE *smh = qm->tiers[tier].smh; + time_t first_time_s = qm->tiers[tier].db_first_time_s; + time_t last_time_s = qm->tiers[tier].db_last_time_s; + time_t update_every_s = qm->tiers[tier].db_update_every_s; + + if( !smh || + !first_time_s || + !last_time_s || + !update_every_s || + first_time_s > before_wanted || + last_time_s < after_wanted + ) { + qm->tiers[tier].weight = -LONG_MAX; + continue; + } + + internal_fatal(first_time_s > before_wanted || last_time_s < after_wanted, "QUERY: invalid db durations"); + + qm->tiers[tier].weight = query_plan_points_coverage_weight( + min_first_time_s, max_last_time_s, update_every_s, + after_wanted, before_wanted, points_wanted, tier); + } + + size_t best_tier = 0; + for(size_t tier = 1; tier < storage_tiers ; tier++) { + if(qm->tiers[tier].weight >= qm->tiers[best_tier].weight) + best_tier = tier; + } + + return best_tier; +} + +static size_t rrddim_find_best_tier_for_timeframe(QUERY_TARGET *qt, time_t after_wanted, time_t before_wanted, size_t points_wanted) { + if(unlikely(storage_tiers < 2)) + return 0; + + if(unlikely(after_wanted == before_wanted || points_wanted <= 0)) { + internal_error(true, "QUERY: '%s' has invalid params to tier calculation", qt->id); + return 0; + } + + long weight[storage_tiers]; + + for(size_t tier = 0; tier < storage_tiers ; tier++) { + + time_t common_first_time_s = 0; + time_t common_last_time_s = 0; + time_t common_update_every_s = 0; + + // find the db time-range for this tier for all metrics + for(size_t i = 0, used = qt->query.used; i < used ; i++) { + QUERY_METRIC *qm = query_metric(qt, i); + + time_t first_time_s = qm->tiers[tier].db_first_time_s; + time_t last_time_s = qm->tiers[tier].db_last_time_s; + time_t update_every_s = qm->tiers[tier].db_update_every_s; + + if(!first_time_s || !last_time_s || !update_every_s) + continue; + + if(!common_first_time_s) + common_first_time_s = first_time_s; + else + common_first_time_s = MIN(first_time_s, common_first_time_s); + + if(!common_last_time_s) + common_last_time_s = last_time_s; + else + common_last_time_s = MAX(last_time_s, common_last_time_s); + + if(!common_update_every_s) + common_update_every_s = update_every_s; + else + common_update_every_s = MIN(update_every_s, common_update_every_s); + } + + weight[tier] = query_plan_points_coverage_weight(common_first_time_s, common_last_time_s, common_update_every_s, after_wanted, before_wanted, points_wanted, tier); + } + + size_t best_tier = 0; + for(size_t tier = 1; tier < storage_tiers ; tier++) { + if(weight[tier] >= weight[best_tier]) + best_tier = tier; + } + + if(weight[best_tier] == -LONG_MAX) + best_tier = 0; + + return best_tier; +} + +static time_t rrdset_find_natural_update_every_for_timeframe(QUERY_TARGET *qt, time_t after_wanted, time_t before_wanted, size_t points_wanted, RRDR_OPTIONS options, size_t tier) { + size_t best_tier; + if((options & RRDR_OPTION_SELECTED_TIER) && tier < storage_tiers) + best_tier = tier; + else + best_tier = rrddim_find_best_tier_for_timeframe(qt, after_wanted, before_wanted, points_wanted); + + // find the db minimum update every for this tier for all metrics + time_t common_update_every_s = default_rrd_update_every; + for(size_t i = 0, used = qt->query.used; i < used ; i++) { + QUERY_METRIC *qm = query_metric(qt, i); + + time_t update_every_s = qm->tiers[best_tier].db_update_every_s; + + if(!i) + common_update_every_s = update_every_s; + else + common_update_every_s = MIN(update_every_s, common_update_every_s); + } + + return common_update_every_s; +} + +// ---------------------------------------------------------------------------- +// query ops + +typedef struct query_point { + STORAGE_POINT sp; + NETDATA_DOUBLE value; + bool added; +#ifdef NETDATA_INTERNAL_CHECKS + size_t id; +#endif +} QUERY_POINT; + +QUERY_POINT QUERY_POINT_EMPTY = { + .sp = STORAGE_POINT_UNSET, + .value = NAN, + .added = false, +#ifdef NETDATA_INTERNAL_CHECKS + .id = 0, +#endif +}; + +#ifdef NETDATA_INTERNAL_CHECKS +#define query_point_set_id(point, point_id) (point).id = point_id +#else +#define query_point_set_id(point, point_id) debug_dummy() +#endif + +typedef struct query_engine_ops { + // configuration + RRDR *r; + QUERY_METRIC *qm; + time_t view_update_every; + time_t query_granularity; + TIER_QUERY_FETCH tier_query_fetch; + + // query planer + size_t current_plan; + time_t current_plan_expire_time; + time_t plan_expanded_after; + time_t plan_expanded_before; + + // storage queries + size_t tier; + struct query_metric_tier *tier_ptr; + struct storage_engine_query_handle *seqh; + + // aggregating points over time + size_t group_points_non_zero; + size_t group_points_added; + STORAGE_POINT group_point; // aggregates min, max, sum, count, anomaly count for each group point + STORAGE_POINT query_point; // aggregates min, max, sum, count, anomaly count across the whole query + RRDR_VALUE_FLAGS group_value_flags; + + // statistics + size_t db_total_points_read; + size_t db_points_read_per_tier[RRD_STORAGE_TIERS]; + + struct { + time_t expanded_after; + time_t expanded_before; + struct storage_engine_query_handle handle; + bool initialized; + bool finalized; + } plans[QUERY_PLANS_MAX]; + + struct query_engine_ops *next; +} QUERY_ENGINE_OPS; + + +// ---------------------------------------------------------------------------- +// query planer + +#define query_plan_should_switch_plan(ops, now) ((now) >= (ops)->current_plan_expire_time) + +static size_t query_planer_expand_duration_in_points(time_t this_update_every, time_t next_update_every) { + + time_t delta = this_update_every - next_update_every; + if(delta < 0) delta = -delta; + + size_t points; + if(delta < this_update_every * POINTS_TO_EXPAND_QUERY) + points = POINTS_TO_EXPAND_QUERY; + else + points = (delta + this_update_every - 1) / this_update_every; + + return points; +} + +static void query_planer_initialize_plans(QUERY_ENGINE_OPS *ops) { + QUERY_METRIC *qm = ops->qm; + + for(size_t p = 0; p < qm->plan.used ; p++) { + size_t tier = qm->plan.array[p].tier; + time_t update_every = qm->tiers[tier].db_update_every_s; + + size_t points_to_add_to_after; + if(p > 0) { + // there is another plan before to this + + size_t tier0 = qm->plan.array[p - 1].tier; + time_t update_every0 = qm->tiers[tier0].db_update_every_s; + + points_to_add_to_after = query_planer_expand_duration_in_points(update_every, update_every0); + } + else + points_to_add_to_after = (tier == 0) ? 0 : POINTS_TO_EXPAND_QUERY; + + size_t points_to_add_to_before; + if(p + 1 < qm->plan.used) { + // there is another plan after to this + + size_t tier1 = qm->plan.array[p+1].tier; + time_t update_every1 = qm->tiers[tier1].db_update_every_s; + + points_to_add_to_before = query_planer_expand_duration_in_points(update_every, update_every1); + } + else + points_to_add_to_before = POINTS_TO_EXPAND_QUERY; + + time_t after = qm->plan.array[p].after - (time_t)(update_every * points_to_add_to_after); + time_t before = qm->plan.array[p].before + (time_t)(update_every * points_to_add_to_before); + + ops->plans[p].expanded_after = after; + ops->plans[p].expanded_before = before; + + ops->r->internal.qt->db.tiers[tier].queries++; + + struct query_metric_tier *tier_ptr = &qm->tiers[tier]; + STORAGE_ENGINE *eng = query_metric_storage_engine(ops->r->internal.qt, qm, tier); + storage_engine_query_init(eng->seb, tier_ptr->smh, &ops->plans[p].handle, + after, before, ops->r->internal.qt->request.priority); + + ops->plans[p].initialized = true; + ops->plans[p].finalized = false; + } +} + +static void query_planer_finalize_plan(QUERY_ENGINE_OPS *ops, size_t plan_id) { + // QUERY_METRIC *qm = ops->qm; + + if(ops->plans[plan_id].initialized && !ops->plans[plan_id].finalized) { + storage_engine_query_finalize(&ops->plans[plan_id].handle); + ops->plans[plan_id].initialized = false; + ops->plans[plan_id].finalized = true; + } +} + +static void query_planer_finalize_remaining_plans(QUERY_ENGINE_OPS *ops) { + QUERY_METRIC *qm = ops->qm; + + for(size_t p = 0; p < qm->plan.used ; p++) + query_planer_finalize_plan(ops, p); +} + +static void query_planer_activate_plan(QUERY_ENGINE_OPS *ops, size_t plan_id, time_t overwrite_after __maybe_unused) { + QUERY_METRIC *qm = ops->qm; + + internal_fatal(plan_id >= qm->plan.used, "QUERY: invalid plan_id given"); + internal_fatal(!ops->plans[plan_id].initialized, "QUERY: plan has not been initialized"); + internal_fatal(ops->plans[plan_id].finalized, "QUERY: plan has been finalized"); + + internal_fatal(qm->plan.array[plan_id].after > qm->plan.array[plan_id].before, "QUERY: flipped after/before"); + + ops->tier = qm->plan.array[plan_id].tier; + ops->tier_ptr = &qm->tiers[ops->tier]; + ops->seqh = &ops->plans[plan_id].handle; + ops->current_plan = plan_id; + + if(plan_id + 1 < qm->plan.used && qm->plan.array[plan_id + 1].after < qm->plan.array[plan_id].before) + ops->current_plan_expire_time = qm->plan.array[plan_id + 1].after; + else + ops->current_plan_expire_time = qm->plan.array[plan_id].before; + + ops->plan_expanded_after = ops->plans[plan_id].expanded_after; + ops->plan_expanded_before = ops->plans[plan_id].expanded_before; +} + +static bool query_planer_next_plan(QUERY_ENGINE_OPS *ops, time_t now, time_t last_point_end_time) { + QUERY_METRIC *qm = ops->qm; + + size_t old_plan = ops->current_plan; + + time_t next_plan_before_time; + do { + ops->current_plan++; + + if (ops->current_plan >= qm->plan.used) { + ops->current_plan = old_plan; + ops->current_plan_expire_time = ops->r->internal.qt->window.before; + // let the query run with current plan + // we will not switch it + return false; + } + + next_plan_before_time = qm->plan.array[ops->current_plan].before; + } while(now >= next_plan_before_time || last_point_end_time >= next_plan_before_time); + + if(!query_metric_is_valid_tier(qm, qm->plan.array[ops->current_plan].tier)) { + ops->current_plan = old_plan; + ops->current_plan_expire_time = ops->r->internal.qt->window.before; + return false; + } + + query_planer_finalize_plan(ops, old_plan); + query_planer_activate_plan(ops, ops->current_plan, MIN(now, last_point_end_time)); + return true; +} + +static int compare_query_plan_entries_on_start_time(const void *a, const void *b) { + QUERY_PLAN_ENTRY *p1 = (QUERY_PLAN_ENTRY *)a; + QUERY_PLAN_ENTRY *p2 = (QUERY_PLAN_ENTRY *)b; + return (p1->after < p2->after)?-1:1; +} + +static bool query_plan(QUERY_ENGINE_OPS *ops, time_t after_wanted, time_t before_wanted, size_t points_wanted) { + QUERY_METRIC *qm = ops->qm; + + // put our selected tier as the first plan + size_t selected_tier; + bool switch_tiers = true; + + if((ops->r->internal.qt->window.options & RRDR_OPTION_SELECTED_TIER) + && ops->r->internal.qt->window.tier < storage_tiers + && query_metric_is_valid_tier(qm, ops->r->internal.qt->window.tier)) { + selected_tier = ops->r->internal.qt->window.tier; + switch_tiers = false; + } + else { + selected_tier = query_metric_best_tier_for_timeframe(qm, after_wanted, before_wanted, points_wanted); + + if(!query_metric_is_valid_tier(qm, selected_tier)) + return false; + } + + if(qm->tiers[selected_tier].db_first_time_s > before_wanted || + qm->tiers[selected_tier].db_last_time_s < after_wanted) { + // we don't have any data to satisfy this query + return false; + } + + qm->plan.used = 1; + qm->plan.array[0].tier = selected_tier; + qm->plan.array[0].after = (qm->tiers[selected_tier].db_first_time_s < after_wanted) ? after_wanted : qm->tiers[selected_tier].db_first_time_s; + qm->plan.array[0].before = (qm->tiers[selected_tier].db_last_time_s > before_wanted) ? before_wanted : qm->tiers[selected_tier].db_last_time_s; + + if(switch_tiers) { + // the selected tier + time_t selected_tier_first_time_s = qm->plan.array[0].after; + time_t selected_tier_last_time_s = qm->plan.array[0].before; + + // check if our selected tier can start the query + if (selected_tier_first_time_s > after_wanted) { + // we need some help from other tiers + for (size_t tr = (int)selected_tier + 1; tr < storage_tiers && qm->plan.used < QUERY_PLANS_MAX ; tr++) { + if(!query_metric_is_valid_tier(qm, tr)) + continue; + + // find the first time of this tier + time_t tier_first_time_s = qm->tiers[tr].db_first_time_s; + time_t tier_last_time_s = qm->tiers[tr].db_last_time_s; + + // can it help? + if (tier_first_time_s < selected_tier_first_time_s && tier_first_time_s <= before_wanted && tier_last_time_s >= after_wanted) { + // it can help us add detail at the beginning of the query + QUERY_PLAN_ENTRY t = { + .tier = tr, + .after = (tier_first_time_s < after_wanted) ? after_wanted : tier_first_time_s, + .before = selected_tier_first_time_s, + }; + ops->plans[qm->plan.used].initialized = false; + ops->plans[qm->plan.used].finalized = false; + qm->plan.array[qm->plan.used++] = t; + + internal_fatal(!t.after || !t.before, "QUERY: invalid plan selected"); + + // prepare for the tier + selected_tier_first_time_s = t.after; + + if (t.after <= after_wanted) + break; + } + } + } + + // check if our selected tier can finish the query + if (selected_tier_last_time_s < before_wanted) { + // we need some help from other tiers + for (int tr = (int)selected_tier - 1; tr >= 0 && qm->plan.used < QUERY_PLANS_MAX ; tr--) { + if(!query_metric_is_valid_tier(qm, tr)) + continue; + + // find the last time of this tier + time_t tier_first_time_s = qm->tiers[tr].db_first_time_s; + time_t tier_last_time_s = qm->tiers[tr].db_last_time_s; + + //buffer_sprintf(wb, ": EVAL BEFORE tier %d, %ld", tier, last_time_s); + + // can it help? + if (tier_last_time_s > selected_tier_last_time_s && tier_first_time_s <= before_wanted && tier_last_time_s >= after_wanted) { + // it can help us add detail at the end of the query + QUERY_PLAN_ENTRY t = { + .tier = tr, + .after = selected_tier_last_time_s, + .before = (tier_last_time_s > before_wanted) ? before_wanted : tier_last_time_s, + }; + ops->plans[qm->plan.used].initialized = false; + ops->plans[qm->plan.used].finalized = false; + qm->plan.array[qm->plan.used++] = t; + + // prepare for the tier + selected_tier_last_time_s = t.before; + + internal_fatal(!t.after || !t.before, "QUERY: invalid plan selected"); + + if (t.before >= before_wanted) + break; + } + } + } + } + + // sort the query plan + if(qm->plan.used > 1) + qsort(&qm->plan.array, qm->plan.used, sizeof(QUERY_PLAN_ENTRY), compare_query_plan_entries_on_start_time); + + if(!query_metric_is_valid_tier(qm, qm->plan.array[0].tier)) + return false; + +#ifdef NETDATA_INTERNAL_CHECKS + for(size_t p = 0; p < qm->plan.used ;p++) { + internal_fatal(qm->plan.array[p].after > qm->plan.array[p].before, "QUERY: flipped after/before"); + internal_fatal(qm->plan.array[p].after < after_wanted, "QUERY: too small plan first time"); + internal_fatal(qm->plan.array[p].before > before_wanted, "QUERY: too big plan last time"); + } +#endif + + query_planer_initialize_plans(ops); + query_planer_activate_plan(ops, 0, 0); + + return true; +} + + +// ---------------------------------------------------------------------------- +// dimension level query engine + +#define query_interpolate_point(this_point, last_point, now) do { \ + if(likely( \ + /* the point to interpolate is more than 1s wide */ \ + (this_point).sp.end_time_s - (this_point).sp.start_time_s > 1 \ + \ + /* the two points are exactly next to each other */ \ + && (last_point).sp.end_time_s == (this_point).sp.start_time_s \ + \ + /* both points are valid numbers */ \ + && netdata_double_isnumber((this_point).value) \ + && netdata_double_isnumber((last_point).value) \ + \ + )) { \ + (this_point).value = (last_point).value + ((this_point).value - (last_point).value) * (1.0 - (NETDATA_DOUBLE)((this_point).sp.end_time_s - (now)) / (NETDATA_DOUBLE)((this_point).sp.end_time_s - (this_point).sp.start_time_s)); \ + (this_point).sp.end_time_s = now; \ + } \ +} while(0) + +#define query_add_point_to_group(r, point, ops, add_flush) do { \ + if(likely(netdata_double_isnumber((point).value))) { \ + if(likely(fpclassify((point).value) != FP_ZERO)) \ + (ops)->group_points_non_zero++; \ + \ + if(unlikely((point).sp.flags & SN_FLAG_RESET)) \ + (ops)->group_value_flags |= RRDR_VALUE_RESET; \ + \ + time_grouping_add(r, (point).value, add_flush); \ + \ + storage_point_merge_to((ops)->group_point, (point).sp); \ + if(!(point).added) \ + storage_point_merge_to((ops)->query_point, (point).sp); \ + } \ + \ + (ops)->group_points_added++; \ +} while(0) + +static __thread QUERY_ENGINE_OPS *released_ops = NULL; + +static void rrd2rrdr_query_ops_freeall(RRDR *r __maybe_unused) { + while(released_ops) { + QUERY_ENGINE_OPS *ops = released_ops; + released_ops = ops->next; + + onewayalloc_freez(r->internal.owa, ops); + } +} + +static void rrd2rrdr_query_ops_release(QUERY_ENGINE_OPS *ops) { + if(!ops) return; + + ops->next = released_ops; + released_ops = ops; +} + +static QUERY_ENGINE_OPS *rrd2rrdr_query_ops_get(RRDR *r) { + QUERY_ENGINE_OPS *ops; + if(released_ops) { + ops = released_ops; + released_ops = ops->next; + } + else { + ops = onewayalloc_mallocz(r->internal.owa, sizeof(QUERY_ENGINE_OPS)); + } + + memset(ops, 0, sizeof(*ops)); + return ops; +} + +static QUERY_ENGINE_OPS *rrd2rrdr_query_ops_prep(RRDR *r, size_t query_metric_id) { + QUERY_TARGET *qt = r->internal.qt; + + QUERY_ENGINE_OPS *ops = rrd2rrdr_query_ops_get(r); + *ops = (QUERY_ENGINE_OPS) { + .r = r, + .qm = query_metric(qt, query_metric_id), + .tier_query_fetch = r->time_grouping.tier_query_fetch, + .view_update_every = r->view.update_every, + .query_granularity = (time_t)(r->view.update_every / r->view.group), + .group_value_flags = RRDR_VALUE_NOTHING, + }; + + if(!query_plan(ops, qt->window.after, qt->window.before, qt->window.points)) { + rrd2rrdr_query_ops_release(ops); + return NULL; + } + + return ops; +} + +static void rrd2rrdr_query_execute(RRDR *r, size_t dim_id_in_rrdr, QUERY_ENGINE_OPS *ops) { + QUERY_TARGET *qt = r->internal.qt; + QUERY_METRIC *qm = ops->qm; + + const RRDR_TIME_GROUPING add_flush = r->time_grouping.add_flush; + + ops->group_point = STORAGE_POINT_UNSET; + ops->query_point = STORAGE_POINT_UNSET; + + RRDR_OPTIONS options = qt->window.options; + size_t points_wanted = qt->window.points; + time_t after_wanted = qt->window.after; + time_t before_wanted = qt->window.before; (void)before_wanted; + +// bool debug_this = false; +// if(strcmp("user", string2str(rd->id)) == 0 && strcmp("system.cpu", string2str(rd->rrdset->id)) == 0) +// debug_this = true; + + size_t points_added = 0; + + long rrdr_line = -1; + bool use_anomaly_bit_as_value = (r->internal.qt->window.options & RRDR_OPTION_ANOMALY_BIT) ? true : false; + + NETDATA_DOUBLE min = r->view.min, max = r->view.max; + + QUERY_POINT last2_point = QUERY_POINT_EMPTY; + QUERY_POINT last1_point = QUERY_POINT_EMPTY; + QUERY_POINT new_point = QUERY_POINT_EMPTY; + + // ONE POINT READ-AHEAD + // when we switch plans, we read-ahead a point from the next plan + // to join them smoothly at the exact time the next plan begins + STORAGE_POINT next1_point = STORAGE_POINT_UNSET; + + time_t now_start_time = after_wanted - ops->query_granularity; + time_t now_end_time = after_wanted + ops->view_update_every - ops->query_granularity; + + size_t db_points_read_since_plan_switch = 0; (void)db_points_read_since_plan_switch; + size_t query_is_finished_counter = 0; + + // The main loop, based on the query granularity we need + for( ; points_added < points_wanted && query_is_finished_counter <= 10 ; + now_start_time = now_end_time, now_end_time += ops->view_update_every) { + + if(unlikely(query_plan_should_switch_plan(ops, now_end_time))) { + query_planer_next_plan(ops, now_end_time, new_point.sp.end_time_s); + db_points_read_since_plan_switch = 0; + } + + // read all the points of the db, prior to the time we need (now_end_time) + + size_t count_same_end_time = 0; + while(count_same_end_time < 100) { + if(likely(count_same_end_time == 0)) { + last2_point = last1_point; + last1_point = new_point; + } + + if(unlikely(storage_engine_query_is_finished(ops->seqh))) { + query_is_finished_counter++; + + if(count_same_end_time != 0) { + last2_point = last1_point; + last1_point = new_point; + } + new_point = QUERY_POINT_EMPTY; + new_point.sp.start_time_s = last1_point.sp.end_time_s; + new_point.sp.end_time_s = now_end_time; +// +// if(debug_this) netdata_log_info("QUERY: is finished() returned true"); +// + break; + } + else + query_is_finished_counter = 0; + + // fetch the new point + { + STORAGE_POINT sp; + if(likely(storage_point_is_unset(next1_point))) { + db_points_read_since_plan_switch++; + sp = storage_engine_query_next_metric(ops->seqh); + ops->db_points_read_per_tier[ops->tier]++; + ops->db_total_points_read++; + + if(unlikely(options & RRDR_OPTION_ABSOLUTE)) + storage_point_make_positive(sp); + } + else { + // ONE POINT READ-AHEAD + sp = next1_point; + storage_point_unset(next1_point); + db_points_read_since_plan_switch = 1; + } + + // ONE POINT READ-AHEAD + if(unlikely(query_plan_should_switch_plan(ops, sp.end_time_s) && + query_planer_next_plan(ops, now_end_time, new_point.sp.end_time_s))) { + + // The end time of the current point, crosses our plans (tiers) + // so, we switched plan (tier) + // + // There are 2 cases now: + // + // A. the entire point of the previous plan is to the future of point from the next plan + // B. part of the point of the previous plan overlaps with the point from the next plan + + STORAGE_POINT sp2 = storage_engine_query_next_metric(ops->seqh); + ops->db_points_read_per_tier[ops->tier]++; + ops->db_total_points_read++; + + if(unlikely(options & RRDR_OPTION_ABSOLUTE)) + storage_point_make_positive(sp); + + if(sp.start_time_s > sp2.start_time_s) + // the point from the previous plan is useless + sp = sp2; + else + // let the query run from the previous plan + // but setting this will also cut off the interpolation + // of the point from the previous plan + next1_point = sp2; + } + + new_point.sp = sp; + new_point.added = false; + query_point_set_id(new_point, ops->db_total_points_read); + +// if(debug_this) +// netdata_log_info("QUERY: got point %zu, from time %ld to %ld // now from %ld to %ld // query from %ld to %ld", +// new_point.id, new_point.start_time, new_point.end_time, now_start_time, now_end_time, after_wanted, before_wanted); +// + // get the right value from the point we got + if(likely(!storage_point_is_unset(sp) && !storage_point_is_gap(sp))) { + + if(unlikely(use_anomaly_bit_as_value)) + new_point.value = storage_point_anomaly_rate(new_point.sp); + + else { + switch (ops->tier_query_fetch) { + default: + case TIER_QUERY_FETCH_AVERAGE: + new_point.value = sp.sum / (NETDATA_DOUBLE)sp.count; + break; + + case TIER_QUERY_FETCH_MIN: + new_point.value = sp.min; + break; + + case TIER_QUERY_FETCH_MAX: + new_point.value = sp.max; + break; + + case TIER_QUERY_FETCH_SUM: + new_point.value = sp.sum; + break; + } + } + } + else + new_point.value = NAN; + } + + // check if the db is giving us zero duration points + if(unlikely(db_points_read_since_plan_switch > 1 && + new_point.sp.start_time_s == new_point.sp.end_time_s)) { + + internal_error(true, "QUERY: '%s', dimension '%s' next_metric() returned " + "point %zu from %ld to %ld, that are both equal", + qt->id, query_metric_id(qt, qm), + new_point.id, new_point.sp.start_time_s, new_point.sp.end_time_s); + + new_point.sp.start_time_s = new_point.sp.end_time_s - ops->tier_ptr->db_update_every_s; + } + + // check if the db is advancing the query + if(unlikely(db_points_read_since_plan_switch > 1 && + new_point.sp.end_time_s <= last1_point.sp.end_time_s)) { + + internal_error(true, + "QUERY: '%s', dimension '%s' next_metric() returned " + "point %zu from %ld to %ld, before the " + "last point %zu from %ld to %ld, " + "now is %ld to %ld", + qt->id, query_metric_id(qt, qm), + new_point.id, new_point.sp.start_time_s, new_point.sp.end_time_s, + last1_point.id, last1_point.sp.start_time_s, last1_point.sp.end_time_s, + now_start_time, now_end_time); + + count_same_end_time++; + continue; + } + count_same_end_time = 0; + + // decide how to use this point + if(likely(new_point.sp.end_time_s < now_end_time)) { // likely to favor tier0 + // this db point ends before our now_end_time + + if(likely(new_point.sp.end_time_s >= now_start_time)) { // likely to favor tier0 + // this db point ends after our now_start time + + query_add_point_to_group(r, new_point, ops, add_flush); + new_point.added = true; + } + else { + // we don't need this db point + // it is totally outside our current time-frame + + // this is desirable for the first point of the query + // because it allows us to interpolate the next point + // at exactly the time we will want + + // we only log if this is not point 1 + internal_error(new_point.sp.end_time_s < ops->plan_expanded_after && + db_points_read_since_plan_switch > 1, + "QUERY: '%s', dimension '%s' next_metric() " + "returned point %zu from %ld time %ld, " + "which is entirely before our current timeframe %ld to %ld " + "(and before the entire query, after %ld, before %ld)", + qt->id, query_metric_id(qt, qm), + new_point.id, new_point.sp.start_time_s, new_point.sp.end_time_s, + now_start_time, now_end_time, + ops->plan_expanded_after, ops->plan_expanded_before); + } + + } + else { + // the point ends in the future + // so, we will interpolate it below, at the inner loop + break; + } + } + + if(unlikely(count_same_end_time)) { + internal_error(true, + "QUERY: '%s', dimension '%s', the database does not advance the query," + " it returned an end time less or equal to the end time of the last " + "point we got %ld, %zu times", + qt->id, query_metric_id(qt, qm), + last1_point.sp.end_time_s, count_same_end_time); + + if(unlikely(new_point.sp.end_time_s <= last1_point.sp.end_time_s)) + new_point.sp.end_time_s = now_end_time; + } + + time_t stop_time = new_point.sp.end_time_s; + if(unlikely(!storage_point_is_unset(next1_point) && next1_point.start_time_s >= now_end_time)) { + // ONE POINT READ-AHEAD + // the point crosses the start time of the + // read ahead storage point we have read + stop_time = next1_point.start_time_s; + } + + // the inner loop + // we have 3 points in memory: last2, last1, new + // we select the one to use based on their timestamps + + internal_fatal(now_end_time > stop_time || points_added >= points_wanted, + "QUERY: first part of query provides invalid point to interpolate (now_end_time %ld, stop_time %ld", + now_end_time, stop_time); + + do { + // now_start_time is wrong in this loop + // but, we don't need it + + QUERY_POINT current_point; + + if(likely(now_end_time > new_point.sp.start_time_s)) { + // it is time for our NEW point to be used + current_point = new_point; + new_point.added = true; // first copy, then set it, so that new_point will not be added again + query_interpolate_point(current_point, last1_point, now_end_time); + +// internal_error(current_point.id > 0 +// && last1_point.id == 0 +// && current_point.end_time > after_wanted +// && current_point.end_time > now_end_time, +// "QUERY: '%s', dimension '%s', after %ld, before %ld, view update every %ld," +// " query granularity %ld, interpolating point %zu (from %ld to %ld) at %ld," +// " but we could really favor by having last_point1 in this query.", +// qt->id, string2str(qm->dimension.id), +// after_wanted, before_wanted, +// ops.view_update_every, ops.query_granularity, +// current_point.id, current_point.start_time, current_point.end_time, +// now_end_time); + } + else if(likely(now_end_time <= last1_point.sp.end_time_s)) { + // our LAST point is still valid + current_point = last1_point; + last1_point.added = true; // first copy, then set it, so that last1_point will not be added again + query_interpolate_point(current_point, last2_point, now_end_time); + +// internal_error(current_point.id > 0 +// && last2_point.id == 0 +// && current_point.end_time > after_wanted +// && current_point.end_time > now_end_time, +// "QUERY: '%s', dimension '%s', after %ld, before %ld, view update every %ld," +// " query granularity %ld, interpolating point %zu (from %ld to %ld) at %ld," +// " but we could really favor by having last_point2 in this query.", +// qt->id, string2str(qm->dimension.id), +// after_wanted, before_wanted, ops.view_update_every, ops.query_granularity, +// current_point.id, current_point.start_time, current_point.end_time, +// now_end_time); + } + else { + // a GAP, we don't have a value this time + current_point = QUERY_POINT_EMPTY; + } + + query_add_point_to_group(r, current_point, ops, add_flush); + + rrdr_line = rrdr_line_init(r, now_end_time, rrdr_line); + size_t rrdr_o_v_index = rrdr_line * r->d + dim_id_in_rrdr; + + // find the place to store our values + RRDR_VALUE_FLAGS *rrdr_value_options_ptr = &r->o[rrdr_o_v_index]; + + // update the dimension options + if(likely(ops->group_points_non_zero)) + r->od[dim_id_in_rrdr] |= RRDR_DIMENSION_NONZERO; + + // store the specific point options + *rrdr_value_options_ptr = ops->group_value_flags; + + // store the group value + NETDATA_DOUBLE group_value = time_grouping_flush(r, rrdr_value_options_ptr, add_flush); + r->v[rrdr_o_v_index] = group_value; + + r->ar[rrdr_o_v_index] = storage_point_anomaly_rate(ops->group_point); + + if(likely(points_added || r->internal.queries_count)) { + // find the min/max across all dimensions + + if(unlikely(group_value < min)) min = group_value; + if(unlikely(group_value > max)) max = group_value; + + } + else { + // runs only when r->internal.queries_count == 0 && points_added == 0 + // so, on the first point added for the query. + min = max = group_value; + } + + points_added++; + ops->group_points_added = 0; + ops->group_value_flags = RRDR_VALUE_NOTHING; + ops->group_points_non_zero = 0; + ops->group_point = STORAGE_POINT_UNSET; + + now_end_time += ops->view_update_every; + } while(now_end_time <= stop_time && points_added < points_wanted); + + // the loop above increased "now" by ops->view_update_every, + // but the main loop will increase it too, + // so, let's undo the last iteration of this loop + now_end_time -= ops->view_update_every; + } + query_planer_finalize_remaining_plans(ops); + + qm->query_points = ops->query_point; + + // fill the rest of the points with empty values + while (points_added < points_wanted) { + rrdr_line++; + size_t rrdr_o_v_index = rrdr_line * r->d + dim_id_in_rrdr; + r->o[rrdr_o_v_index] = RRDR_VALUE_EMPTY; + r->v[rrdr_o_v_index] = 0.0; + r->ar[rrdr_o_v_index] = 0.0; + points_added++; + } + + r->internal.queries_count++; + r->view.min = min; + r->view.max = max; + + r->stats.result_points_generated += points_added; + r->stats.db_points_read += ops->db_total_points_read; + for(size_t tr = 0; tr < storage_tiers ; tr++) + qt->db.tiers[tr].points += ops->db_points_read_per_tier[tr]; +} + +// ---------------------------------------------------------------------------- +// fill the gap of a tier + +void store_metric_at_tier(RRDDIM *rd, size_t tier, struct rrddim_tier *t, STORAGE_POINT sp, usec_t now_ut); + +void rrdr_fill_tier_gap_from_smaller_tiers(RRDDIM *rd, size_t tier, time_t now_s) { + if(unlikely(tier >= storage_tiers)) return; +#ifdef ENABLE_DBENGINE + if(default_backfill == RRD_BACKFILL_NONE) return; +#else + return; +#endif + + struct rrddim_tier *t = &rd->tiers[tier]; + if(unlikely(!t)) return; + + time_t latest_time_s = storage_engine_latest_time_s(t->seb, t->smh); + time_t granularity = (time_t)t->tier_grouping * (time_t)rd->rrdset->update_every; + time_t time_diff = now_s - latest_time_s; + + // if the user wants only NEW backfilling, and we don't have any data +#ifdef ENABLE_DBENGINE + if(default_backfill == RRD_BACKFILL_NEW && latest_time_s <= 0) return; +#else + return; +#endif + + // there is really nothing we can do + if(now_s <= latest_time_s || time_diff < granularity) return; + + struct storage_engine_query_handle seqh; + + // for each lower tier + for(int read_tier = (int)tier - 1; read_tier >= 0 ; read_tier--){ + time_t smaller_tier_first_time = storage_engine_oldest_time_s(rd->tiers[read_tier].seb, rd->tiers[read_tier].smh); + time_t smaller_tier_last_time = storage_engine_latest_time_s(rd->tiers[read_tier].seb, rd->tiers[read_tier].smh); + if(smaller_tier_last_time <= latest_time_s) continue; // it is as bad as we are + + long after_wanted = (latest_time_s < smaller_tier_first_time) ? smaller_tier_first_time : latest_time_s; + long before_wanted = smaller_tier_last_time; + + struct rrddim_tier *tmp = &rd->tiers[read_tier]; + storage_engine_query_init(tmp->seb, tmp->smh, &seqh, after_wanted, before_wanted, STORAGE_PRIORITY_HIGH); + + size_t points_read = 0; + + while(!storage_engine_query_is_finished(&seqh)) { + + STORAGE_POINT sp = storage_engine_query_next_metric(&seqh); + points_read++; + + if(sp.end_time_s > latest_time_s) { + latest_time_s = sp.end_time_s; + store_metric_at_tier(rd, tier, t, sp, sp.end_time_s * USEC_PER_SEC); + } + } + + storage_engine_query_finalize(&seqh); + store_metric_collection_completed(); + global_statistics_backfill_query_completed(points_read); + + //internal_error(true, "DBENGINE: backfilled chart '%s', dimension '%s', tier %d, from %ld to %ld, with %zu points from tier %d", + // rd->rrdset->name, rd->name, tier, after_wanted, before_wanted, points, tr); + } +} + +// ---------------------------------------------------------------------------- +// fill RRDR for the whole chart + +#ifdef NETDATA_INTERNAL_CHECKS +static void rrd2rrdr_log_request_response_metadata(RRDR *r + , RRDR_OPTIONS options __maybe_unused + , RRDR_TIME_GROUPING group_method + , bool aligned + , size_t group + , time_t resampling_time + , size_t resampling_group + , time_t after_wanted + , time_t after_requested + , time_t before_wanted + , time_t before_requested + , size_t points_requested + , size_t points_wanted + //, size_t after_slot + //, size_t before_slot + , const char *msg + ) { + + QUERY_TARGET *qt = r->internal.qt; + time_t first_entry_s = qt->db.first_time_s; + time_t last_entry_s = qt->db.last_time_s; + + internal_error( + true, + "rrd2rrdr() on %s update every %ld with %s grouping %s (group: %zu, resampling_time: %ld, resampling_group: %zu), " + "after (got: %ld, want: %ld, req: %ld, db: %ld), " + "before (got: %ld, want: %ld, req: %ld, db: %ld), " + "duration (got: %ld, want: %ld, req: %ld, db: %ld), " + "points (got: %zu, want: %zu, req: %zu), " + "%s" + , qt->id + , qt->window.query_granularity + + // grouping + , (aligned) ? "aligned" : "unaligned" + , time_grouping_id2txt(group_method) + , group + , resampling_time + , resampling_group + + // after + , r->view.after + , after_wanted + , after_requested + , first_entry_s + + // before + , r->view.before + , before_wanted + , before_requested + , last_entry_s + + // duration + , (long)(r->view.before - r->view.after + qt->window.query_granularity) + , (long)(before_wanted - after_wanted + qt->window.query_granularity) + , (long)before_requested - after_requested + , (long)((last_entry_s - first_entry_s) + qt->window.query_granularity) + + // points + , r->rows + , points_wanted + , points_requested + + // message + , msg + ); +} +#endif // NETDATA_INTERNAL_CHECKS + +// #define DEBUG_QUERY_LOGIC 1 + +#ifdef DEBUG_QUERY_LOGIC +#define query_debug_log_init() BUFFER *debug_log = buffer_create(1000) +#define query_debug_log(args...) buffer_sprintf(debug_log, ##args) +#define query_debug_log_fin() { \ + netdata_log_info("QUERY: '%s', after:%ld, before:%ld, duration:%ld, points:%zu, res:%ld - wanted => after:%ld, before:%ld, points:%zu, group:%zu, granularity:%ld, resgroup:%ld, resdiv:" NETDATA_DOUBLE_FORMAT_AUTO " %s", qt->id, after_requested, before_requested, before_requested - after_requested, points_requested, resampling_time_requested, after_wanted, before_wanted, points_wanted, group, query_granularity, resampling_group, resampling_divisor, buffer_tostring(debug_log)); \ + buffer_free(debug_log); \ + debug_log = NULL; \ + } +#define query_debug_log_free() do { buffer_free(debug_log); } while(0) +#else +#define query_debug_log_init() debug_dummy() +#define query_debug_log(args...) debug_dummy() +#define query_debug_log_fin() debug_dummy() +#define query_debug_log_free() debug_dummy() +#endif + +bool query_target_calculate_window(QUERY_TARGET *qt) { + if (unlikely(!qt)) return false; + + size_t points_requested = (long)qt->request.points; + time_t after_requested = qt->request.after; + time_t before_requested = qt->request.before; + RRDR_TIME_GROUPING group_method = qt->request.time_group_method; + time_t resampling_time_requested = qt->request.resampling_time; + RRDR_OPTIONS options = qt->window.options; + size_t tier = qt->request.tier; + time_t update_every = qt->db.minimum_latest_update_every_s ? qt->db.minimum_latest_update_every_s : 1; + + // RULES + // points_requested = 0 + // the user wants all the natural points the database has + // + // after_requested = 0 + // the user wants to start the query from the oldest point in our database + // + // before_requested = 0 + // the user wants the query to end to the latest point in our database + // + // when natural points are wanted, the query has to be aligned to the update_every + // of the database + + size_t points_wanted = points_requested; + time_t after_wanted = after_requested; + time_t before_wanted = before_requested; + + bool aligned = !(options & RRDR_OPTION_NOT_ALIGNED); + bool automatic_natural_points = (points_wanted == 0); + bool relative_period_requested = false; + bool natural_points = (options & RRDR_OPTION_NATURAL_POINTS) || automatic_natural_points; + bool before_is_aligned_to_db_end = false; + + query_debug_log_init(); + + if (ABS(before_requested) <= API_RELATIVE_TIME_MAX || ABS(after_requested) <= API_RELATIVE_TIME_MAX) { + relative_period_requested = true; + natural_points = true; + options |= RRDR_OPTION_NATURAL_POINTS; + query_debug_log(":relative+natural"); + } + + // if the user wants virtual points, make sure we do it + if (options & RRDR_OPTION_VIRTUAL_POINTS) + natural_points = false; + + // set the right flag about natural and virtual points + if (natural_points) { + options |= RRDR_OPTION_NATURAL_POINTS; + + if (options & RRDR_OPTION_VIRTUAL_POINTS) + options &= ~RRDR_OPTION_VIRTUAL_POINTS; + } + else { + options |= RRDR_OPTION_VIRTUAL_POINTS; + + if (options & RRDR_OPTION_NATURAL_POINTS) + options &= ~RRDR_OPTION_NATURAL_POINTS; + } + + if (after_wanted == 0 || before_wanted == 0) { + relative_period_requested = true; + + time_t first_entry_s = qt->db.first_time_s; + time_t last_entry_s = qt->db.last_time_s; + + if (first_entry_s == 0 || last_entry_s == 0) { + internal_error(true, "QUERY: no data detected on query '%s' (db first_entry_t = %ld, last_entry_t = %ld)", qt->id, first_entry_s, last_entry_s); + after_wanted = qt->window.after; + before_wanted = qt->window.before; + + if(after_wanted == before_wanted) + after_wanted = before_wanted - update_every; + + if (points_wanted == 0) { + points_wanted = (before_wanted - after_wanted) / update_every; + query_debug_log(":zero points_wanted %zu", points_wanted); + } + } + else { + query_debug_log(":first_entry_t %ld, last_entry_t %ld", first_entry_s, last_entry_s); + + if (after_wanted == 0) { + after_wanted = first_entry_s; + query_debug_log(":zero after_wanted %ld", after_wanted); + } + + if (before_wanted == 0) { + before_wanted = last_entry_s; + before_is_aligned_to_db_end = true; + query_debug_log(":zero before_wanted %ld", before_wanted); + } + + if (points_wanted == 0) { + points_wanted = (last_entry_s - first_entry_s) / update_every; + query_debug_log(":zero points_wanted %zu", points_wanted); + } + } + } + + if (points_wanted == 0) { + points_wanted = 600; + query_debug_log(":zero600 points_wanted %zu", points_wanted); + } + + // convert our before_wanted and after_wanted to absolute + rrdr_relative_window_to_absolute_query(&after_wanted, &before_wanted, NULL, unittest_running); + query_debug_log(":relative2absolute after %ld, before %ld", after_wanted, before_wanted); + + if (natural_points && (options & RRDR_OPTION_SELECTED_TIER) && tier > 0 && storage_tiers > 1) { + update_every = rrdset_find_natural_update_every_for_timeframe( + qt, after_wanted, before_wanted, points_wanted, options, tier); + + if (update_every <= 0) update_every = qt->db.minimum_latest_update_every_s; + query_debug_log(":natural update every %ld", update_every); + } + + // this is the update_every of the query + // it may be different to the update_every of the database + time_t query_granularity = (natural_points) ? update_every : 1; + if (query_granularity <= 0) query_granularity = 1; + query_debug_log(":query_granularity %ld", query_granularity); + + // align before_wanted and after_wanted to query_granularity + if (before_wanted % query_granularity) { + before_wanted -= before_wanted % query_granularity; + query_debug_log(":granularity align before_wanted %ld", before_wanted); + } + + if (after_wanted % query_granularity) { + after_wanted -= after_wanted % query_granularity; + query_debug_log(":granularity align after_wanted %ld", after_wanted); + } + + // automatic_natural_points is set when the user wants all the points available in the database + if (automatic_natural_points) { + points_wanted = (before_wanted - after_wanted + 1) / query_granularity; + if (unlikely(points_wanted <= 0)) points_wanted = 1; + query_debug_log(":auto natural points_wanted %zu", points_wanted); + } + + time_t duration = before_wanted - after_wanted; + + // if the resampling time is too big, extend the duration to the past + if (unlikely(resampling_time_requested > duration)) { + after_wanted = before_wanted - resampling_time_requested; + duration = before_wanted - after_wanted; + query_debug_log(":resampling after_wanted %ld", after_wanted); + } + + // if the duration is not aligned to resampling time + // extend the duration to the past, to avoid a gap at the chart + // only when the missing duration is above 1/10th of a point + if (resampling_time_requested > query_granularity && duration % resampling_time_requested) { + time_t delta = duration % resampling_time_requested; + if (delta > resampling_time_requested / 10) { + after_wanted -= resampling_time_requested - delta; + duration = before_wanted - after_wanted; + query_debug_log(":resampling2 after_wanted %ld", after_wanted); + } + } + + // the available points of the query + size_t points_available = (duration + 1) / query_granularity; + if (unlikely(points_available <= 0)) points_available = 1; + query_debug_log(":points_available %zu", points_available); + + if (points_wanted > points_available) { + points_wanted = points_available; + query_debug_log(":max points_wanted %zu", points_wanted); + } + + if(points_wanted > 86400 && !unittest_running) { + points_wanted = 86400; + query_debug_log(":absolute max points_wanted %zu", points_wanted); + } + + // calculate the desired grouping of source data points + size_t group = points_available / points_wanted; + if (group == 0) group = 1; + + // round "group" to the closest integer + if (points_available % points_wanted > points_wanted / 2) + group++; + + query_debug_log(":group %zu", group); + + if (points_wanted * group * query_granularity < (size_t)duration) { + // the grouping we are going to do, is not enough + // to cover the entire duration requested, so + // we have to change the number of points, to make sure we will + // respect the timeframe as closely as possibly + + // let's see how many points are the optimal + points_wanted = points_available / group; + + if (points_wanted * group < points_available) + points_wanted++; + + if (unlikely(points_wanted == 0)) + points_wanted = 1; + + query_debug_log(":optimal points %zu", points_wanted); + } + + // resampling_time_requested enforces a certain grouping multiple + NETDATA_DOUBLE resampling_divisor = 1.0; + size_t resampling_group = 1; + if (unlikely(resampling_time_requested > query_granularity)) { + // the points we should group to satisfy gtime + resampling_group = resampling_time_requested / query_granularity; + if (unlikely(resampling_time_requested % query_granularity)) + resampling_group++; + + query_debug_log(":resampling group %zu", resampling_group); + + // adapt group according to resampling_group + if (unlikely(group < resampling_group)) { + group = resampling_group; // do not allow grouping below the desired one + query_debug_log(":group less res %zu", group); + } + if (unlikely(group % resampling_group)) { + group += resampling_group - (group % resampling_group); // make sure group is multiple of resampling_group + query_debug_log(":group mod res %zu", group); + } + + // resampling_divisor = group / resampling_group; + resampling_divisor = (NETDATA_DOUBLE) (group * query_granularity) / (NETDATA_DOUBLE) resampling_time_requested; + query_debug_log(":resampling divisor " NETDATA_DOUBLE_FORMAT, resampling_divisor); + } + + // now that we have group, align the requested timeframe to fit it. + if (aligned && before_wanted % (group * query_granularity)) { + if (before_is_aligned_to_db_end) + before_wanted -= before_wanted % (time_t)(group * query_granularity); + else + before_wanted += (time_t)(group * query_granularity) - before_wanted % (time_t)(group * query_granularity); + query_debug_log(":align before_wanted %ld", before_wanted); + } + + after_wanted = before_wanted - (time_t)(points_wanted * group * query_granularity) + query_granularity; + query_debug_log(":final after_wanted %ld", after_wanted); + + duration = before_wanted - after_wanted; + query_debug_log(":final duration %ld", duration + 1); + + query_debug_log_fin(); + + internal_error(points_wanted != duration / (query_granularity * group) + 1, + "QUERY: points_wanted %zu is not points %zu", + points_wanted, (size_t)(duration / (query_granularity * group) + 1)); + + internal_error(group < resampling_group, + "QUERY: group %zu is less than the desired group points %zu", + group, resampling_group); + + internal_error(group > resampling_group && group % resampling_group, + "QUERY: group %zu is not a multiple of the desired group points %zu", + group, resampling_group); + + // ------------------------------------------------------------------------- + // update QUERY_TARGET with our calculations + + qt->window.after = after_wanted; + qt->window.before = before_wanted; + qt->window.relative = relative_period_requested; + qt->window.points = points_wanted; + qt->window.group = group; + qt->window.time_group_method = group_method; + qt->window.time_group_options = qt->request.time_group_options; + qt->window.query_granularity = query_granularity; + qt->window.resampling_group = resampling_group; + qt->window.resampling_divisor = resampling_divisor; + qt->window.options = options; + qt->window.tier = tier; + qt->window.aligned = aligned; + + return true; +} + +// ---------------------------------------------------------------------------- +// group by + +struct group_by_label_key { + DICTIONARY *values; +}; + +static void group_by_label_key_insert_cb(const DICTIONARY_ITEM *item __maybe_unused, void *value, void *data) { + // add the key to our r->label_keys global keys dictionary + DICTIONARY *label_keys = data; + dictionary_set(label_keys, dictionary_acquired_item_name(item), NULL, 0); + + // create a dictionary for the values of this key + struct group_by_label_key *k = value; + k->values = dictionary_create_advanced(DICT_OPTION_SINGLE_THREADED | DICT_OPTION_DONT_OVERWRITE_VALUE, NULL, 0); +} + +static void group_by_label_key_delete_cb(const DICTIONARY_ITEM *item __maybe_unused, void *value, void *data __maybe_unused) { + struct group_by_label_key *k = value; + dictionary_destroy(k->values); +} + +static int rrdlabels_traversal_cb_to_group_by_label_key(const char *name, const char *value, RRDLABEL_SRC ls __maybe_unused, void *data) { + DICTIONARY *dl = data; + struct group_by_label_key *k = dictionary_set(dl, name, NULL, sizeof(struct group_by_label_key)); + dictionary_set(k->values, value, NULL, 0); + return 1; +} + +void rrdr_json_group_by_labels(BUFFER *wb, const char *key, RRDR *r, RRDR_OPTIONS options) { + if(!r->label_keys || !r->dl) + return; + + buffer_json_member_add_object(wb, key); + + void *t; + dfe_start_read(r->label_keys, t) { + buffer_json_member_add_array(wb, t_dfe.name); + + for(size_t d = 0; d < r->d ;d++) { + if(!rrdr_dimension_should_be_exposed(r->od[d], options)) + continue; + + struct group_by_label_key *k = dictionary_get(r->dl[d], t_dfe.name); + if(k) { + buffer_json_add_array_item_array(wb); + void *tt; + dfe_start_read(k->values, tt) { + buffer_json_add_array_item_string(wb, tt_dfe.name); + } + dfe_done(tt); + buffer_json_array_close(wb); + } + else + buffer_json_add_array_item_string(wb, NULL); + } + + buffer_json_array_close(wb); + } + dfe_done(t); + + buffer_json_object_close(wb); // key +} + +static void rrd2rrdr_set_timestamps(RRDR *r) { + QUERY_TARGET *qt = r->internal.qt; + + internal_fatal(qt->window.points != r->n, "QUERY: mismatch to the number of points in qt and r"); + + r->view.group = qt->window.group; + r->view.update_every = (int) query_view_update_every(qt); + r->view.before = qt->window.before; + r->view.after = qt->window.after; + + r->time_grouping.points_wanted = qt->window.points; + r->time_grouping.resampling_group = qt->window.resampling_group; + r->time_grouping.resampling_divisor = qt->window.resampling_divisor; + + r->rows = qt->window.points; + + size_t points_wanted = qt->window.points; + time_t after_wanted = qt->window.after; + time_t before_wanted = qt->window.before; (void)before_wanted; + + time_t view_update_every = r->view.update_every; + time_t query_granularity = (time_t)(r->view.update_every / r->view.group); + + size_t rrdr_line = 0; + time_t first_point_end_time = after_wanted + view_update_every - query_granularity; + time_t now_end_time = first_point_end_time; + + while (rrdr_line < points_wanted) { + r->t[rrdr_line++] = now_end_time; + now_end_time += view_update_every; + } + + internal_fatal(r->t[0] != first_point_end_time, "QUERY: wrong first timestamp in the query"); + internal_error(r->t[points_wanted - 1] != before_wanted, + "QUERY: wrong last timestamp in the query, expected %ld, found %ld", + before_wanted, r->t[points_wanted - 1]); +} + +static void query_group_by_make_dimension_key(BUFFER *key, RRDR_GROUP_BY group_by, size_t group_by_id, QUERY_TARGET *qt, QUERY_NODE *qn, QUERY_CONTEXT *qc, QUERY_INSTANCE *qi, QUERY_DIMENSION *qd __maybe_unused, QUERY_METRIC *qm, bool query_has_percentage_of_group) { + buffer_flush(key); + if(unlikely(!query_has_percentage_of_group && qm->status & RRDR_DIMENSION_HIDDEN)) { + buffer_strcat(key, "__hidden_dimensions__"); + } + else if(unlikely(group_by & RRDR_GROUP_BY_SELECTED)) { + buffer_strcat(key, "selected"); + } + else { + if (group_by & RRDR_GROUP_BY_DIMENSION) { + buffer_fast_strcat(key, "|", 1); + buffer_strcat(key, query_metric_name(qt, qm)); + } + + if (group_by & (RRDR_GROUP_BY_INSTANCE|RRDR_GROUP_BY_PERCENTAGE_OF_INSTANCE)) { + buffer_fast_strcat(key, "|", 1); + buffer_strcat(key, string2str(query_instance_id_fqdn(qi, qt->request.version))); + } + + if (group_by & RRDR_GROUP_BY_LABEL) { + RRDLABELS *labels = rrdinstance_acquired_labels(qi->ria); + for (size_t l = 0; l < qt->group_by[group_by_id].used; l++) { + buffer_fast_strcat(key, "|", 1); + rrdlabels_get_value_to_buffer_or_unset(labels, key, qt->group_by[group_by_id].label_keys[l], "[unset]"); + } + } + + if (group_by & RRDR_GROUP_BY_NODE) { + buffer_fast_strcat(key, "|", 1); + buffer_strcat(key, qn->rrdhost->machine_guid); + } + + if (group_by & RRDR_GROUP_BY_CONTEXT) { + buffer_fast_strcat(key, "|", 1); + buffer_strcat(key, rrdcontext_acquired_id(qc->rca)); + } + + if (group_by & RRDR_GROUP_BY_UNITS) { + buffer_fast_strcat(key, "|", 1); + buffer_strcat(key, query_target_has_percentage_units(qt) ? "%" : rrdinstance_acquired_units(qi->ria)); + } + } +} + +static void query_group_by_make_dimension_id(BUFFER *key, RRDR_GROUP_BY group_by, size_t group_by_id, QUERY_TARGET *qt, QUERY_NODE *qn, QUERY_CONTEXT *qc, QUERY_INSTANCE *qi, QUERY_DIMENSION *qd __maybe_unused, QUERY_METRIC *qm, bool query_has_percentage_of_group) { + buffer_flush(key); + if(unlikely(!query_has_percentage_of_group && qm->status & RRDR_DIMENSION_HIDDEN)) { + buffer_strcat(key, "__hidden_dimensions__"); + } + else if(unlikely(group_by & RRDR_GROUP_BY_SELECTED)) { + buffer_strcat(key, "selected"); + } + else { + if (group_by & RRDR_GROUP_BY_DIMENSION) { + buffer_strcat(key, query_metric_name(qt, qm)); + } + + if (group_by & (RRDR_GROUP_BY_INSTANCE|RRDR_GROUP_BY_PERCENTAGE_OF_INSTANCE)) { + if (buffer_strlen(key) != 0) + buffer_fast_strcat(key, ",", 1); + + if (group_by & RRDR_GROUP_BY_NODE) + buffer_strcat(key, rrdinstance_acquired_id(qi->ria)); + else + buffer_strcat(key, string2str(query_instance_id_fqdn(qi, qt->request.version))); + } + + if (group_by & RRDR_GROUP_BY_LABEL) { + RRDLABELS *labels = rrdinstance_acquired_labels(qi->ria); + for (size_t l = 0; l < qt->group_by[group_by_id].used; l++) { + if (buffer_strlen(key) != 0) + buffer_fast_strcat(key, ",", 1); + rrdlabels_get_value_to_buffer_or_unset(labels, key, qt->group_by[group_by_id].label_keys[l], "[unset]"); + } + } + + if (group_by & RRDR_GROUP_BY_NODE) { + if (buffer_strlen(key) != 0) + buffer_fast_strcat(key, ",", 1); + + buffer_strcat(key, qn->rrdhost->machine_guid); + } + + if (group_by & RRDR_GROUP_BY_CONTEXT) { + if (buffer_strlen(key) != 0) + buffer_fast_strcat(key, ",", 1); + + buffer_strcat(key, rrdcontext_acquired_id(qc->rca)); + } + + if (group_by & RRDR_GROUP_BY_UNITS) { + if (buffer_strlen(key) != 0) + buffer_fast_strcat(key, ",", 1); + + buffer_strcat(key, query_target_has_percentage_units(qt) ? "%" : rrdinstance_acquired_units(qi->ria)); + } + } +} + +static void query_group_by_make_dimension_name(BUFFER *key, RRDR_GROUP_BY group_by, size_t group_by_id, QUERY_TARGET *qt, QUERY_NODE *qn, QUERY_CONTEXT *qc, QUERY_INSTANCE *qi, QUERY_DIMENSION *qd __maybe_unused, QUERY_METRIC *qm, bool query_has_percentage_of_group) { + buffer_flush(key); + if(unlikely(!query_has_percentage_of_group && qm->status & RRDR_DIMENSION_HIDDEN)) { + buffer_strcat(key, "__hidden_dimensions__"); + } + else if(unlikely(group_by & RRDR_GROUP_BY_SELECTED)) { + buffer_strcat(key, "selected"); + } + else { + if (group_by & RRDR_GROUP_BY_DIMENSION) { + buffer_strcat(key, query_metric_name(qt, qm)); + } + + if (group_by & (RRDR_GROUP_BY_INSTANCE|RRDR_GROUP_BY_PERCENTAGE_OF_INSTANCE)) { + if (buffer_strlen(key) != 0) + buffer_fast_strcat(key, ",", 1); + + if (group_by & RRDR_GROUP_BY_NODE) + buffer_strcat(key, rrdinstance_acquired_name(qi->ria)); + else + buffer_strcat(key, string2str(query_instance_name_fqdn(qi, qt->request.version))); + } + + if (group_by & RRDR_GROUP_BY_LABEL) { + RRDLABELS *labels = rrdinstance_acquired_labels(qi->ria); + for (size_t l = 0; l < qt->group_by[group_by_id].used; l++) { + if (buffer_strlen(key) != 0) + buffer_fast_strcat(key, ",", 1); + rrdlabels_get_value_to_buffer_or_unset(labels, key, qt->group_by[group_by_id].label_keys[l], "[unset]"); + } + } + + if (group_by & RRDR_GROUP_BY_NODE) { + if (buffer_strlen(key) != 0) + buffer_fast_strcat(key, ",", 1); + + buffer_strcat(key, rrdhost_hostname(qn->rrdhost)); + } + + if (group_by & RRDR_GROUP_BY_CONTEXT) { + if (buffer_strlen(key) != 0) + buffer_fast_strcat(key, ",", 1); + + buffer_strcat(key, rrdcontext_acquired_id(qc->rca)); + } + + if (group_by & RRDR_GROUP_BY_UNITS) { + if (buffer_strlen(key) != 0) + buffer_fast_strcat(key, ",", 1); + + buffer_strcat(key, query_target_has_percentage_units(qt) ? "%" : rrdinstance_acquired_units(qi->ria)); + } + } +} + +struct rrdr_group_by_entry { + size_t priority; + size_t count; + STRING *id; + STRING *name; + STRING *units; + RRDR_DIMENSION_FLAGS od; + DICTIONARY *dl; +}; + +static RRDR *rrd2rrdr_group_by_initialize(ONEWAYALLOC *owa, QUERY_TARGET *qt) { + RRDR *r_tmp = NULL; + RRDR_OPTIONS options = qt->window.options; + + if(qt->request.version < 2) { + // v1 query + RRDR *r = rrdr_create(owa, qt, qt->query.used, qt->window.points); + if(unlikely(!r)) { + internal_error(true, "QUERY: cannot create RRDR for %s, after=%ld, before=%ld, dimensions=%u, points=%zu", + qt->id, qt->window.after, qt->window.before, qt->query.used, qt->window.points); + return NULL; + } + r->group_by.r = NULL; + + for(size_t d = 0; d < qt->query.used ; d++) { + QUERY_METRIC *qm = query_metric(qt, d); + QUERY_DIMENSION *qd = query_dimension(qt, qm->link.query_dimension_id); + r->di[d] = rrdmetric_acquired_id_dup(qd->rma); + r->dn[d] = rrdmetric_acquired_name_dup(qd->rma); + } + + rrd2rrdr_set_timestamps(r); + return r; + } + // v2 query + + // parse all the group-by label keys + for(size_t g = 0; g < MAX_QUERY_GROUP_BY_PASSES ;g++) { + if (qt->request.group_by[g].group_by & RRDR_GROUP_BY_LABEL && + qt->request.group_by[g].group_by_label && *qt->request.group_by[g].group_by_label) + qt->group_by[g].used = quoted_strings_splitter_query_group_by_label( + qt->request.group_by[g].group_by_label, qt->group_by[g].label_keys, + GROUP_BY_MAX_LABEL_KEYS); + + if (!qt->group_by[g].used) + qt->request.group_by[g].group_by &= ~RRDR_GROUP_BY_LABEL; + } + + // make sure there are valid group-by methods + for(size_t g = 0; g < MAX_QUERY_GROUP_BY_PASSES ;g++) { + if(!(qt->request.group_by[g].group_by & SUPPORTED_GROUP_BY_METHODS)) + qt->request.group_by[g].group_by = (g == 0) ? RRDR_GROUP_BY_DIMENSION : RRDR_GROUP_BY_NONE; + } + + bool query_has_percentage_of_group = query_target_has_percentage_of_group(qt); + + // merge all group-by options to upper levels, + // so that the top level has all the groupings of the inner levels, + // and each subsequent level has all the groupings of its inner levels. + for(size_t g = 0; g < MAX_QUERY_GROUP_BY_PASSES - 1 ;g++) { + if(qt->request.group_by[g].group_by == RRDR_GROUP_BY_NONE) + continue; + + if(qt->request.group_by[g].group_by == RRDR_GROUP_BY_SELECTED) { + for (size_t r = g + 1; r < MAX_QUERY_GROUP_BY_PASSES; r++) + qt->request.group_by[r].group_by = RRDR_GROUP_BY_NONE; + } + else { + for (size_t r = g + 1; r < MAX_QUERY_GROUP_BY_PASSES; r++) { + if (qt->request.group_by[r].group_by == RRDR_GROUP_BY_NONE) + continue; + + if (qt->request.group_by[r].group_by != RRDR_GROUP_BY_SELECTED) { + if(qt->request.group_by[r].group_by & RRDR_GROUP_BY_PERCENTAGE_OF_INSTANCE) + qt->request.group_by[g].group_by |= RRDR_GROUP_BY_INSTANCE; + else + qt->request.group_by[g].group_by |= qt->request.group_by[r].group_by; + + if(qt->request.group_by[r].group_by & RRDR_GROUP_BY_LABEL) { + for (size_t lr = 0; lr < qt->group_by[r].used; lr++) { + bool found = false; + for (size_t lg = 0; lg < qt->group_by[g].used; lg++) { + if (strcmp(qt->group_by[g].label_keys[lg], qt->group_by[r].label_keys[lr]) == 0) { + found = true; + break; + } + } + + if (!found && qt->group_by[g].used < GROUP_BY_MAX_LABEL_KEYS * MAX_QUERY_GROUP_BY_PASSES) + qt->group_by[g].label_keys[qt->group_by[g].used++] = qt->group_by[r].label_keys[lr]; + } + } + } + } + } + } + + int added = 0; + RRDR *first_r = NULL, *last_r = NULL; + BUFFER *key = buffer_create(0, NULL); + struct rrdr_group_by_entry *entries = onewayalloc_mallocz(owa, qt->query.used * sizeof(struct rrdr_group_by_entry)); + DICTIONARY *groups = dictionary_create(DICT_OPTION_SINGLE_THREADED | DICT_OPTION_DONT_OVERWRITE_VALUE); + DICTIONARY *label_keys = NULL; + + for(size_t g = 0; g < MAX_QUERY_GROUP_BY_PASSES ;g++) { + RRDR_GROUP_BY group_by = qt->request.group_by[g].group_by; + RRDR_GROUP_BY_FUNCTION aggregation_method = qt->request.group_by[g].aggregation; + + if(group_by == RRDR_GROUP_BY_NONE) + break; + + memset(entries, 0, qt->query.used * sizeof(struct rrdr_group_by_entry)); + dictionary_flush(groups); + added = 0; + + size_t hidden_dimensions = 0; + bool final_grouping = (g == MAX_QUERY_GROUP_BY_PASSES - 1 || qt->request.group_by[g + 1].group_by == RRDR_GROUP_BY_NONE) ? true : false; + + if (final_grouping && (options & RRDR_OPTION_GROUP_BY_LABELS)) + label_keys = dictionary_create_advanced(DICT_OPTION_SINGLE_THREADED | DICT_OPTION_DONT_OVERWRITE_VALUE, NULL, 0); + + QUERY_INSTANCE *last_qi = NULL; + size_t priority = 0; + time_t update_every_max = 0; + for (size_t d = 0; d < qt->query.used; d++) { + QUERY_METRIC *qm = query_metric(qt, d); + QUERY_DIMENSION *qd = query_dimension(qt, qm->link.query_dimension_id); + QUERY_INSTANCE *qi = query_instance(qt, qm->link.query_instance_id); + QUERY_CONTEXT *qc = query_context(qt, qm->link.query_context_id); + QUERY_NODE *qn = query_node(qt, qm->link.query_node_id); + + if (qi != last_qi) { + last_qi = qi; + + time_t update_every = rrdinstance_acquired_update_every(qi->ria); + if (update_every > update_every_max) + update_every_max = update_every; + } + + priority = qd->priority; + + if(qm->status & RRDR_DIMENSION_HIDDEN) + hidden_dimensions++; + + // -------------------------------------------------------------------- + // generate the group by key + + query_group_by_make_dimension_key(key, group_by, g, qt, qn, qc, qi, qd, qm, query_has_percentage_of_group); + + // lookup the key in the dictionary + + int pos = -1; + int *set = dictionary_set(groups, buffer_tostring(key), &pos, sizeof(pos)); + if (*set == -1) { + // the key just added to the dictionary + + *set = pos = added++; + + // ---------------------------------------------------------------- + // generate the dimension id + + query_group_by_make_dimension_id(key, group_by, g, qt, qn, qc, qi, qd, qm, query_has_percentage_of_group); + entries[pos].id = string_strdupz(buffer_tostring(key)); + + // ---------------------------------------------------------------- + // generate the dimension name + + query_group_by_make_dimension_name(key, group_by, g, qt, qn, qc, qi, qd, qm, query_has_percentage_of_group); + entries[pos].name = string_strdupz(buffer_tostring(key)); + + // add the rest of the info + entries[pos].units = rrdinstance_acquired_units_dup(qi->ria); + entries[pos].priority = priority; + + if (label_keys) { + entries[pos].dl = dictionary_create_advanced( + DICT_OPTION_SINGLE_THREADED | DICT_OPTION_FIXED_SIZE | DICT_OPTION_DONT_OVERWRITE_VALUE, + NULL, sizeof(struct group_by_label_key)); + dictionary_register_insert_callback(entries[pos].dl, group_by_label_key_insert_cb, label_keys); + dictionary_register_delete_callback(entries[pos].dl, group_by_label_key_delete_cb, label_keys); + } + } else { + // the key found in the dictionary + pos = *set; + } + + entries[pos].count++; + + if (unlikely(priority < entries[pos].priority)) + entries[pos].priority = priority; + + if(g > 0) + last_r->dgbs[qm->grouped_as.slot] = pos; + else + qm->grouped_as.first_slot = pos; + + qm->grouped_as.slot = pos; + qm->grouped_as.id = entries[pos].id; + qm->grouped_as.name = entries[pos].name; + qm->grouped_as.units = entries[pos].units; + + // copy the dimension flags decided by the query target + // we need this, because if a dimension is explicitly selected + // the query target adds to it the non-zero flag + qm->status |= RRDR_DIMENSION_GROUPED; + + if(query_has_percentage_of_group) + // when the query has percentage of group + // there will be no hidden dimensions in the final query, + // so we have to remove the hidden flag from all dimensions + entries[pos].od |= qm->status & ~RRDR_DIMENSION_HIDDEN; + else + entries[pos].od |= qm->status; + + if (entries[pos].dl) + rrdlabels_walkthrough_read(rrdinstance_acquired_labels(qi->ria), + rrdlabels_traversal_cb_to_group_by_label_key, entries[pos].dl); + } + + RRDR *r = rrdr_create(owa, qt, added, qt->window.points); + if (!r) { + internal_error(true, + "QUERY: cannot create group by RRDR for %s, after=%ld, before=%ld, dimensions=%d, points=%zu", + qt->id, qt->window.after, qt->window.before, added, qt->window.points); + goto cleanup; + } + // prevent double free at cleanup in case of error + added = 0; + + // link this RRDR + if(!last_r) + first_r = last_r = r; + else + last_r->group_by.r = r; + + last_r = r; + + rrd2rrdr_set_timestamps(r); + r->dp = onewayalloc_callocz(owa, r->d, sizeof(*r->dp)); + r->dview = onewayalloc_callocz(owa, r->d, sizeof(*r->dview)); + r->dgbc = onewayalloc_callocz(owa, r->d, sizeof(*r->dgbc)); + r->gbc = onewayalloc_callocz(owa, r->n * r->d, sizeof(*r->gbc)); + r->dqp = onewayalloc_callocz(owa, r->d, sizeof(STORAGE_POINT)); + + if(hidden_dimensions && ((group_by & RRDR_GROUP_BY_PERCENTAGE_OF_INSTANCE) || (aggregation_method == RRDR_GROUP_BY_FUNCTION_PERCENTAGE))) + // this is where we are going to group the hidden dimensions + r->vh = onewayalloc_mallocz(owa, r->n * r->d * sizeof(*r->vh)); + + if(!final_grouping) + // this is where we are going to store the slot in the next RRDR + // that we are going to group by the dimension of this RRDR + r->dgbs = onewayalloc_callocz(owa, r->d, sizeof(*r->dgbs)); + + if (label_keys) { + r->dl = onewayalloc_callocz(owa, r->d, sizeof(DICTIONARY *)); + r->label_keys = label_keys; + label_keys = NULL; + } + + // zero r (dimension options, names, and ids) + // this is required, because group-by may lead to empty dimensions + for (size_t d = 0; d < r->d; d++) { + r->di[d] = entries[d].id; + r->dn[d] = entries[d].name; + + r->od[d] = entries[d].od; + r->du[d] = entries[d].units; + r->dp[d] = entries[d].priority; + r->dgbc[d] = entries[d].count; + + if (r->dl) + r->dl[d] = entries[d].dl; + } + + // initialize partial trimming + r->partial_data_trimming.max_update_every = update_every_max * 2; + r->partial_data_trimming.expected_after = + (!query_target_aggregatable(qt) && + qt->window.before >= qt->window.now - r->partial_data_trimming.max_update_every) ? + qt->window.before - r->partial_data_trimming.max_update_every : + qt->window.before; + r->partial_data_trimming.trimmed_after = qt->window.before; + + // make all values empty + for (size_t i = 0; i != r->n; i++) { + NETDATA_DOUBLE *cn = &r->v[i * r->d]; + RRDR_VALUE_FLAGS *co = &r->o[i * r->d]; + NETDATA_DOUBLE *ar = &r->ar[i * r->d]; + NETDATA_DOUBLE *vh = r->vh ? &r->vh[i * r->d] : NULL; + + for (size_t d = 0; d < r->d; d++) { + cn[d] = NAN; + ar[d] = 0.0; + co[d] = RRDR_VALUE_EMPTY; + + if(vh) + vh[d] = NAN; + } + } + } + + if(!first_r || !last_r) + goto cleanup; + + r_tmp = rrdr_create(owa, qt, 1, qt->window.points); + if (!r_tmp) { + internal_error(true, + "QUERY: cannot create group by temporary RRDR for %s, after=%ld, before=%ld, dimensions=%d, points=%zu", + qt->id, qt->window.after, qt->window.before, 1, qt->window.points); + goto cleanup; + } + rrd2rrdr_set_timestamps(r_tmp); + r_tmp->group_by.r = first_r; + +cleanup: + if(!first_r || !last_r || !r_tmp) { + if(r_tmp) { + r_tmp->group_by.r = NULL; + rrdr_free(owa, r_tmp); + } + + if(first_r) { + RRDR *r = first_r; + while (r) { + r_tmp = r->group_by.r; + r->group_by.r = NULL; + rrdr_free(owa, r); + r = r_tmp; + } + } + + if(entries && added) { + for (int d = 0; d < added; d++) { + string_freez(entries[d].id); + string_freez(entries[d].name); + string_freez(entries[d].units); + dictionary_destroy(entries[d].dl); + } + } + dictionary_destroy(label_keys); + + first_r = last_r = r_tmp = NULL; + } + + buffer_free(key); + onewayalloc_freez(owa, entries); + dictionary_destroy(groups); + + return r_tmp; +} + +static void rrd2rrdr_group_by_add_metric(RRDR *r_dst, size_t d_dst, RRDR *r_tmp, size_t d_tmp, + RRDR_GROUP_BY_FUNCTION group_by_aggregate_function, + STORAGE_POINT *query_points, size_t pass __maybe_unused) { + if(!r_tmp || r_dst == r_tmp || !(r_tmp->od[d_tmp] & RRDR_DIMENSION_QUERIED)) + return; + + internal_fatal(r_dst->n != r_tmp->n, "QUERY: group-by source and destination do not have the same number of rows"); + internal_fatal(d_dst >= r_dst->d, "QUERY: group-by destination dimension number exceeds destination RRDR size"); + internal_fatal(d_tmp >= r_tmp->d, "QUERY: group-by source dimension number exceeds source RRDR size"); + internal_fatal(!r_dst->dqp, "QUERY: group-by destination is not properly prepared (missing dqp array)"); + internal_fatal(!r_dst->gbc, "QUERY: group-by destination is not properly prepared (missing gbc array)"); + + bool hidden_dimension_on_percentage_of_group = (r_tmp->od[d_tmp] & RRDR_DIMENSION_HIDDEN) && r_dst->vh; + + if(!hidden_dimension_on_percentage_of_group) { + r_dst->od[d_dst] |= r_tmp->od[d_tmp]; + storage_point_merge_to(r_dst->dqp[d_dst], *query_points); + } + + // do the group_by + for(size_t i = 0; i != rrdr_rows(r_tmp) ; i++) { + + size_t idx_tmp = i * r_tmp->d + d_tmp; + NETDATA_DOUBLE n_tmp = r_tmp->v[ idx_tmp ]; + RRDR_VALUE_FLAGS o_tmp = r_tmp->o[ idx_tmp ]; + NETDATA_DOUBLE ar_tmp = r_tmp->ar[ idx_tmp ]; + + if(o_tmp & RRDR_VALUE_EMPTY) + continue; + + size_t idx_dst = i * r_dst->d + d_dst; + NETDATA_DOUBLE *cn = (hidden_dimension_on_percentage_of_group) ? &r_dst->vh[ idx_dst ] : &r_dst->v[ idx_dst ]; + RRDR_VALUE_FLAGS *co = &r_dst->o[ idx_dst ]; + NETDATA_DOUBLE *ar = &r_dst->ar[ idx_dst ]; + uint32_t *gbc = &r_dst->gbc[ idx_dst ]; + + switch(group_by_aggregate_function) { + default: + case RRDR_GROUP_BY_FUNCTION_AVERAGE: + case RRDR_GROUP_BY_FUNCTION_SUM: + case RRDR_GROUP_BY_FUNCTION_PERCENTAGE: + if(isnan(*cn)) + *cn = n_tmp; + else + *cn += n_tmp; + break; + + case RRDR_GROUP_BY_FUNCTION_MIN: + if(isnan(*cn) || n_tmp < *cn) + *cn = n_tmp; + break; + + case RRDR_GROUP_BY_FUNCTION_MAX: + if(isnan(*cn) || n_tmp > *cn) + *cn = n_tmp; + break; + } + + if(!hidden_dimension_on_percentage_of_group) { + *co &= ~RRDR_VALUE_EMPTY; + *co |= (o_tmp & (RRDR_VALUE_RESET | RRDR_VALUE_PARTIAL)); + *ar += ar_tmp; + (*gbc)++; + } + } +} + +static void rrdr2rrdr_group_by_partial_trimming(RRDR *r) { + time_t trimmable_after = r->partial_data_trimming.expected_after; + + // find the point just before the trimmable ones + ssize_t i = (ssize_t)r->n - 1; + for( ; i >= 0 ;i--) { + if (r->t[i] < trimmable_after) + break; + } + + if(unlikely(i < 0)) + return; + + // internal_error(true, "Found trimmable index %zd (from 0 to %zu)", i, r->n - 1); + + size_t last_row_gbc = 0; + for (; i < (ssize_t)r->n; i++) { + size_t row_gbc = 0; + for (size_t d = 0; d < r->d; d++) { + if (unlikely(!(r->od[d] & RRDR_DIMENSION_QUERIED))) + continue; + + row_gbc += r->gbc[ i * r->d + d ]; + } + + // internal_error(true, "GBC of index %zd is %zu", i, row_gbc); + + if (unlikely(r->t[i] >= trimmable_after && (row_gbc < last_row_gbc || !row_gbc))) { + // discard the rest of the points + // internal_error(true, "Discarding points %zd to %zu", i, r->n - 1); + r->partial_data_trimming.trimmed_after = r->t[i]; + r->rows = i; + break; + } + else + last_row_gbc = row_gbc; + } +} + +static void rrdr2rrdr_group_by_calculate_percentage_of_group(RRDR *r) { + if(!r->vh) + return; + + if(query_target_aggregatable(r->internal.qt) && query_has_group_by_aggregation_percentage(r->internal.qt)) + return; + + for(size_t i = 0; i < r->n ;i++) { + NETDATA_DOUBLE *cn = &r->v[ i * r->d ]; + NETDATA_DOUBLE *ch = &r->vh[ i * r->d ]; + + for(size_t d = 0; d < r->d ;d++) { + NETDATA_DOUBLE n = cn[d]; + NETDATA_DOUBLE h = ch[d]; + + if(isnan(n)) + cn[d] = 0.0; + + else if(isnan(h)) + cn[d] = 100.0; + + else + cn[d] = n * 100.0 / (n + h); + } + } +} + +static void rrd2rrdr_convert_values_to_percentage_of_total(RRDR *r) { + if(!(r->internal.qt->window.options & RRDR_OPTION_PERCENTAGE) || query_target_aggregatable(r->internal.qt)) + return; + + size_t global_min_max_values = 0; + NETDATA_DOUBLE global_min = NAN, global_max = NAN; + + for(size_t i = 0; i != r->n ;i++) { + NETDATA_DOUBLE *cn = &r->v[ i * r->d ]; + RRDR_VALUE_FLAGS *co = &r->o[ i * r->d ]; + + NETDATA_DOUBLE total = 0; + for (size_t d = 0; d < r->d; d++) { + if (unlikely(!(r->od[d] & RRDR_DIMENSION_QUERIED))) + continue; + + if(co[d] & RRDR_VALUE_EMPTY) + continue; + + total += cn[d]; + } + + if(total == 0.0) + total = 1.0; + + for (size_t d = 0; d < r->d; d++) { + if (unlikely(!(r->od[d] & RRDR_DIMENSION_QUERIED))) + continue; + + if(co[d] & RRDR_VALUE_EMPTY) + continue; + + NETDATA_DOUBLE n = cn[d]; + n = cn[d] = n * 100.0 / total; + + if(unlikely(!global_min_max_values++)) + global_min = global_max = n; + else { + if(n < global_min) + global_min = n; + if(n > global_max) + global_max = n; + } + } + } + + r->view.min = global_min; + r->view.max = global_max; + + if(!r->dview) + // v1 query + return; + + // v2 query + + for (size_t d = 0; d < r->d; d++) { + if (unlikely(!(r->od[d] & RRDR_DIMENSION_QUERIED))) + continue; + + size_t count = 0; + NETDATA_DOUBLE min = 0.0, max = 0.0, sum = 0.0, ars = 0.0; + for(size_t i = 0; i != r->rows ;i++) { // we use r->rows to respect trimming + size_t idx = i * r->d + d; + + RRDR_VALUE_FLAGS o = r->o[ idx ]; + + if (o & RRDR_VALUE_EMPTY) + continue; + + NETDATA_DOUBLE ar = r->ar[ idx ]; + ars += ar; + + NETDATA_DOUBLE n = r->v[ idx ]; + sum += n; + + if(!count++) + min = max = n; + else { + if(n < min) + min = n; + if(n > max) + max = n; + } + } + + r->dview[d] = (STORAGE_POINT) { + .sum = sum, + .count = count, + .min = min, + .max = max, + .anomaly_count = (size_t)(ars * (NETDATA_DOUBLE)count), + }; + } +} + +static RRDR *rrd2rrdr_group_by_finalize(RRDR *r_tmp) { + QUERY_TARGET *qt = r_tmp->internal.qt; + + if(!r_tmp->group_by.r) { + // v1 query + rrd2rrdr_convert_values_to_percentage_of_total(r_tmp); + return r_tmp; + } + // v2 query + + // do the additional passes on RRDRs + RRDR *last_r = r_tmp->group_by.r; + rrdr2rrdr_group_by_calculate_percentage_of_group(last_r); + + RRDR *r = last_r->group_by.r; + size_t pass = 0; + while(r) { + pass++; + for(size_t d = 0; d < last_r->d ;d++) { + rrd2rrdr_group_by_add_metric(r, last_r->dgbs[d], last_r, d, + qt->request.group_by[pass].aggregation, + &last_r->dqp[d], pass); + } + rrdr2rrdr_group_by_calculate_percentage_of_group(r); + + last_r = r; + r = last_r->group_by.r; + } + + // free all RRDRs except the last one + r = r_tmp; + while(r != last_r) { + r_tmp = r->group_by.r; + r->group_by.r = NULL; + rrdr_free(r->internal.owa, r); + r = r_tmp; + } + r = last_r; + + // find the final aggregation + RRDR_GROUP_BY_FUNCTION aggregation = qt->request.group_by[0].aggregation; + for(size_t g = 0; g < MAX_QUERY_GROUP_BY_PASSES ;g++) + if(qt->request.group_by[g].group_by != RRDR_GROUP_BY_NONE) + aggregation = qt->request.group_by[g].aggregation; + + if(!query_target_aggregatable(qt) && r->partial_data_trimming.expected_after < qt->window.before) + rrdr2rrdr_group_by_partial_trimming(r); + + // apply averaging, remove RRDR_VALUE_EMPTY, find the non-zero dimensions, min and max + size_t global_min_max_values = 0; + size_t dimensions_nonzero = 0; + NETDATA_DOUBLE global_min = NAN, global_max = NAN; + for (size_t d = 0; d < r->d; d++) { + if (unlikely(!(r->od[d] & RRDR_DIMENSION_QUERIED))) + continue; + + size_t points_nonzero = 0; + NETDATA_DOUBLE min = 0, max = 0, sum = 0, ars = 0; + size_t count = 0; + + for(size_t i = 0; i != r->n ;i++) { + size_t idx = i * r->d + d; + + NETDATA_DOUBLE *cn = &r->v[ idx ]; + RRDR_VALUE_FLAGS *co = &r->o[ idx ]; + NETDATA_DOUBLE *ar = &r->ar[ idx ]; + uint32_t gbc = r->gbc[ idx ]; + + if(likely(gbc)) { + *co &= ~RRDR_VALUE_EMPTY; + + if(gbc != r->dgbc[d]) + *co |= RRDR_VALUE_PARTIAL; + + NETDATA_DOUBLE n; + + sum += *cn; + ars += *ar; + + if(aggregation == RRDR_GROUP_BY_FUNCTION_AVERAGE && !query_target_aggregatable(qt)) + n = (*cn /= gbc); + else + n = *cn; + + if(!query_target_aggregatable(qt)) + *ar /= gbc; + + if(islessgreater(n, 0.0)) + points_nonzero++; + + if(unlikely(!count)) + min = max = n; + else { + if(n < min) + min = n; + + if(n > max) + max = n; + } + + if(unlikely(!global_min_max_values++)) + global_min = global_max = n; + else { + if(n < global_min) + global_min = n; + + if(n > global_max) + global_max = n; + } + + count += gbc; + } + } + + if(points_nonzero) { + r->od[d] |= RRDR_DIMENSION_NONZERO; + dimensions_nonzero++; + } + + r->dview[d] = (STORAGE_POINT) { + .sum = sum, + .count = count, + .min = min, + .max = max, + .anomaly_count = (size_t)(ars * RRDR_DVIEW_ANOMALY_COUNT_MULTIPLIER / 100.0), + }; + } + + r->view.min = global_min; + r->view.max = global_max; + + if(!dimensions_nonzero && (qt->window.options & RRDR_OPTION_NONZERO)) { + // all dimensions are zero + // remove the nonzero option + qt->window.options &= ~RRDR_OPTION_NONZERO; + } + + rrd2rrdr_convert_values_to_percentage_of_total(r); + + // update query instance counts in query host and query context + { + size_t h = 0, c = 0, i = 0; + for(; h < qt->nodes.used ; h++) { + QUERY_NODE *qn = &qt->nodes.array[h]; + + for(; c < qt->contexts.used ;c++) { + QUERY_CONTEXT *qc = &qt->contexts.array[c]; + + if(!rrdcontext_acquired_belongs_to_host(qc->rca, qn->rrdhost)) + break; + + for(; i < qt->instances.used ;i++) { + QUERY_INSTANCE *qi = &qt->instances.array[i]; + + if(!rrdinstance_acquired_belongs_to_context(qi->ria, qc->rca)) + break; + + if(qi->metrics.queried) { + qc->instances.queried++; + qn->instances.queried++; + } + else if(qi->metrics.failed) { + qc->instances.failed++; + qn->instances.failed++; + } + } + } + } + } + + return r; +} + +// ---------------------------------------------------------------------------- +// query entry point + +RRDR *rrd2rrdr_legacy( + ONEWAYALLOC *owa, + RRDSET *st, size_t points, time_t after, time_t before, + RRDR_TIME_GROUPING group_method, time_t resampling_time, RRDR_OPTIONS options, const char *dimensions, + const char *group_options, time_t timeout_ms, size_t tier, QUERY_SOURCE query_source, + STORAGE_PRIORITY priority) { + + QUERY_TARGET_REQUEST qtr = { + .version = 1, + .st = st, + .points = points, + .after = after, + .before = before, + .time_group_method = group_method, + .resampling_time = resampling_time, + .options = options, + .dimensions = dimensions, + .time_group_options = group_options, + .timeout_ms = timeout_ms, + .tier = tier, + .query_source = query_source, + .priority = priority, + }; + + QUERY_TARGET *qt = query_target_create(&qtr); + RRDR *r = rrd2rrdr(owa, qt); + if(!r) { + query_target_release(qt); + return NULL; + } + + r->internal.release_with_rrdr_qt = qt; + return r; +} + +RRDR *rrd2rrdr(ONEWAYALLOC *owa, QUERY_TARGET *qt) { + if(!qt || !owa) + return NULL; + + // qt.window members are the WANTED ones. + // qt.request members are the REQUESTED ones. + + RRDR *r_tmp = rrd2rrdr_group_by_initialize(owa, qt); + if(!r_tmp) + return NULL; + + // the RRDR we group-by at + RRDR *r = (r_tmp->group_by.r) ? r_tmp->group_by.r : r_tmp; + + // the final RRDR to return to callers + RRDR *last_r = r_tmp; + while(last_r->group_by.r) + last_r = last_r->group_by.r; + + if(qt->window.relative) + last_r->view.flags |= RRDR_RESULT_FLAG_RELATIVE; + else + last_r->view.flags |= RRDR_RESULT_FLAG_ABSOLUTE; + + // ------------------------------------------------------------------------- + // assign the processor functions + rrdr_set_grouping_function(r_tmp, qt->window.time_group_method); + + // allocate any memory required by the grouping method + r_tmp->time_grouping.create(r_tmp, qt->window.time_group_options); + + // ------------------------------------------------------------------------- + // do the work for each dimension + + time_t max_after = 0, min_before = 0; + size_t max_rows = 0; + + long dimensions_used = 0, dimensions_nonzero = 0; + size_t last_db_points_read = 0; + size_t last_result_points_generated = 0; + + internal_fatal(released_ops, "QUERY: released_ops should be NULL when the query starts"); + + query_progress_set_finish_line(qt->request.transaction, qt->query.used); + + QUERY_ENGINE_OPS **ops = NULL; + if(qt->query.used) + ops = onewayalloc_callocz(owa, qt->query.used, sizeof(QUERY_ENGINE_OPS *)); + + size_t capacity = libuv_worker_threads * 10; + size_t max_queries_to_prepare = (qt->query.used > (capacity - 1)) ? (capacity - 1) : qt->query.used; + size_t queries_prepared = 0; + while(queries_prepared < max_queries_to_prepare) { + // preload another query + ops[queries_prepared] = rrd2rrdr_query_ops_prep(r_tmp, queries_prepared); + queries_prepared++; + } + + QUERY_NODE *last_qn = NULL; + usec_t last_ut = now_monotonic_usec(); + usec_t last_qn_ut = last_ut; + + for(size_t d = 0; d < qt->query.used ; d++) { + QUERY_METRIC *qm = query_metric(qt, d); + QUERY_DIMENSION *qd = query_dimension(qt, qm->link.query_dimension_id); + QUERY_INSTANCE *qi = query_instance(qt, qm->link.query_instance_id); + QUERY_CONTEXT *qc = query_context(qt, qm->link.query_context_id); + QUERY_NODE *qn = query_node(qt, qm->link.query_node_id); + + usec_t now_ut = last_ut; + if(qn != last_qn) { + if(last_qn) + last_qn->duration_ut = now_ut - last_qn_ut; + + last_qn = qn; + last_qn_ut = now_ut; + } + + if(queries_prepared < qt->query.used) { + // preload another query + ops[queries_prepared] = rrd2rrdr_query_ops_prep(r_tmp, queries_prepared); + queries_prepared++; + } + + size_t dim_in_rrdr_tmp = (r_tmp != r) ? 0 : d; + + // set the query target dimension options to rrdr + r_tmp->od[dim_in_rrdr_tmp] = qm->status; + + // reset the grouping for the new dimension + r_tmp->time_grouping.reset(r_tmp); + + if(ops[d]) { + rrd2rrdr_query_execute(r_tmp, dim_in_rrdr_tmp, ops[d]); + r_tmp->od[dim_in_rrdr_tmp] |= RRDR_DIMENSION_QUERIED; + + now_ut = now_monotonic_usec(); + qm->duration_ut = now_ut - last_ut; + last_ut = now_ut; + + if(r_tmp != r) { + // copy back whatever got updated from the temporary r + + // the query updates RRDR_DIMENSION_NONZERO + qm->status = r_tmp->od[dim_in_rrdr_tmp]; + + // the query updates these + r->view.min = r_tmp->view.min; + r->view.max = r_tmp->view.max; + r->view.after = r_tmp->view.after; + r->view.before = r_tmp->view.before; + r->rows = r_tmp->rows; + + rrd2rrdr_group_by_add_metric(r, qm->grouped_as.first_slot, r_tmp, dim_in_rrdr_tmp, + qt->request.group_by[0].aggregation, &qm->query_points, 0); + } + + rrd2rrdr_query_ops_release(ops[d]); // reuse this ops allocation + ops[d] = NULL; + + qi->metrics.queried++; + qc->metrics.queried++; + qn->metrics.queried++; + + qd->status |= QUERY_STATUS_QUERIED; + qm->status |= RRDR_DIMENSION_QUERIED; + + if(qt->request.version >= 2) { + // we need to make the query points positive now + // since we will aggregate it across multiple dimensions + storage_point_make_positive(qm->query_points); + storage_point_merge_to(qi->query_points, qm->query_points); + storage_point_merge_to(qc->query_points, qm->query_points); + storage_point_merge_to(qn->query_points, qm->query_points); + storage_point_merge_to(qt->query_points, qm->query_points); + } + } + else { + qi->metrics.failed++; + qc->metrics.failed++; + qn->metrics.failed++; + + qd->status |= QUERY_STATUS_FAILED; + qm->status |= RRDR_DIMENSION_FAILED; + + continue; + } + + global_statistics_rrdr_query_completed( + 1, + r_tmp->stats.db_points_read - last_db_points_read, + r_tmp->stats.result_points_generated - last_result_points_generated, + qt->request.query_source); + + last_db_points_read = r_tmp->stats.db_points_read; + last_result_points_generated = r_tmp->stats.result_points_generated; + + if(qm->status & RRDR_DIMENSION_NONZERO) + dimensions_nonzero++; + + // verify all dimensions are aligned + if(unlikely(!dimensions_used)) { + min_before = r->view.before; + max_after = r->view.after; + max_rows = r->rows; + } + else { + if(r->view.after != max_after) { + internal_error(true, "QUERY: 'after' mismatch between dimensions for chart '%s': max is %zu, dimension '%s' has %zu", + rrdinstance_acquired_id(qi->ria), (size_t)max_after, rrdmetric_acquired_id(qd->rma), (size_t)r->view.after); + + r->view.after = (r->view.after > max_after) ? r->view.after : max_after; + } + + if(r->view.before != min_before) { + internal_error(true, "QUERY: 'before' mismatch between dimensions for chart '%s': max is %zu, dimension '%s' has %zu", + rrdinstance_acquired_id(qi->ria), (size_t)min_before, rrdmetric_acquired_id(qd->rma), (size_t)r->view.before); + + r->view.before = (r->view.before < min_before) ? r->view.before : min_before; + } + + if(r->rows != max_rows) { + internal_error(true, "QUERY: 'rows' mismatch between dimensions for chart '%s': max is %zu, dimension '%s' has %zu", + rrdinstance_acquired_id(qi->ria), (size_t)max_rows, rrdmetric_acquired_id(qd->rma), (size_t)r->rows); + + r->rows = (r->rows > max_rows) ? r->rows : max_rows; + } + } + + dimensions_used++; + + bool cancel = false; + if (qt->request.interrupt_callback && qt->request.interrupt_callback(qt->request.interrupt_callback_data)) { + cancel = true; + nd_log(NDLS_ACCESS, NDLP_NOTICE, "QUERY INTERRUPTED"); + } + + if (qt->request.timeout_ms && ((NETDATA_DOUBLE)(now_ut - qt->timings.received_ut) / 1000.0) > (NETDATA_DOUBLE)qt->request.timeout_ms) { + cancel = true; + nd_log(NDLS_ACCESS, NDLP_WARNING, "QUERY CANCELED RUNTIME EXCEEDED %0.2f ms (LIMIT %lld ms)", + (NETDATA_DOUBLE)(now_ut - qt->timings.received_ut) / 1000.0, (long long)qt->request.timeout_ms); + } + + if(cancel) { + r->view.flags |= RRDR_RESULT_FLAG_CANCEL; + + for(size_t i = d + 1; i < queries_prepared ; i++) { + if(ops[i]) { + query_planer_finalize_remaining_plans(ops[i]); + rrd2rrdr_query_ops_release(ops[i]); + ops[i] = NULL; + } + } + + break; + } + else + query_progress_done_step(qt->request.transaction, 1); + } + + // free all resources used by the grouping method + r_tmp->time_grouping.free(r_tmp); + + // get the final RRDR to send to the caller + r = rrd2rrdr_group_by_finalize(r_tmp); + +#ifdef NETDATA_INTERNAL_CHECKS + if (dimensions_used && !(r->view.flags & RRDR_RESULT_FLAG_CANCEL)) { + if(r->internal.log) + rrd2rrdr_log_request_response_metadata(r, qt->window.options, qt->window.time_group_method, qt->window.aligned, qt->window.group, qt->request.resampling_time, qt->window.resampling_group, + qt->window.after, qt->request.after, qt->window.before, qt->request.before, + qt->request.points, qt->window.points, /*after_slot, before_slot,*/ + r->internal.log); + + if(r->rows != qt->window.points) + rrd2rrdr_log_request_response_metadata(r, qt->window.options, qt->window.time_group_method, qt->window.aligned, qt->window.group, qt->request.resampling_time, qt->window.resampling_group, + qt->window.after, qt->request.after, qt->window.before, qt->request.before, + qt->request.points, qt->window.points, /*after_slot, before_slot,*/ + "got 'points' is not wanted 'points'"); + + if(qt->window.aligned && (r->view.before % query_view_update_every(qt)) != 0) + rrd2rrdr_log_request_response_metadata(r, qt->window.options, qt->window.time_group_method, qt->window.aligned, qt->window.group, qt->request.resampling_time, qt->window.resampling_group, + qt->window.after, qt->request.after, qt->window.before, qt->request.before, + qt->request.points, qt->window.points, /*after_slot, before_slot,*/ + "'before' is not aligned but alignment is required"); + + // 'after' should not be aligned, since we start inside the first group + //if(qt->window.aligned && (r->after % group) != 0) + // rrd2rrdr_log_request_response_metadata(r, qt->window.options, qt->window.group_method, qt->window.aligned, qt->window.group, qt->request.resampling_time, qt->window.resampling_group, qt->window.after, after_requested, before_wanted, before_requested, points_requested, points_wanted, after_slot, before_slot, "'after' is not aligned but alignment is required"); + + if(r->view.before != qt->window.before) + rrd2rrdr_log_request_response_metadata(r, qt->window.options, qt->window.time_group_method, qt->window.aligned, qt->window.group, qt->request.resampling_time, qt->window.resampling_group, + qt->window.after, qt->request.after, qt->window.before, qt->request.before, + qt->request.points, qt->window.points, /*after_slot, before_slot,*/ + "chart is not aligned to requested 'before'"); + + if(r->view.before != qt->window.before) + rrd2rrdr_log_request_response_metadata(r, qt->window.options, qt->window.time_group_method, qt->window.aligned, qt->window.group, qt->request.resampling_time, qt->window.resampling_group, + qt->window.after, qt->request.after, qt->window.before, qt->request.before, + qt->request.points, qt->window.points, /*after_slot, before_slot,*/ + "got 'before' is not wanted 'before'"); + + // reported 'after' varies, depending on group + if(r->view.after != qt->window.after) + rrd2rrdr_log_request_response_metadata(r, qt->window.options, qt->window.time_group_method, qt->window.aligned, qt->window.group, qt->request.resampling_time, qt->window.resampling_group, + qt->window.after, qt->request.after, qt->window.before, qt->request.before, + qt->request.points, qt->window.points, /*after_slot, before_slot,*/ + "got 'after' is not wanted 'after'"); + + } +#endif + + // free the query pipelining ops + for(size_t d = 0; d < qt->query.used ; d++) { + rrd2rrdr_query_ops_release(ops[d]); + ops[d] = NULL; + } + rrd2rrdr_query_ops_freeall(r); + internal_fatal(released_ops, "QUERY: released_ops should be NULL when the query ends"); + + onewayalloc_freez(owa, ops); + + if(likely(dimensions_used && (qt->window.options & RRDR_OPTION_NONZERO) && !dimensions_nonzero)) + // when all the dimensions are zero, we should return all of them + qt->window.options &= ~RRDR_OPTION_NONZERO; + + qt->timings.executed_ut = now_monotonic_usec(); + + return r; +} diff --git a/src/web/api/queries/query.h b/src/web/api/queries/query.h new file mode 100644 index 000000000..37202a0ba --- /dev/null +++ b/src/web/api/queries/query.h @@ -0,0 +1,100 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_API_DATA_QUERY_H +#define NETDATA_API_DATA_QUERY_H + +#ifdef __cplusplus +extern "C" { +#endif + +typedef enum rrdr_time_grouping { + RRDR_GROUPING_UNDEFINED = 0, + RRDR_GROUPING_AVERAGE, + RRDR_GROUPING_MIN, + RRDR_GROUPING_MAX, + RRDR_GROUPING_SUM, + RRDR_GROUPING_INCREMENTAL_SUM, + RRDR_GROUPING_TRIMMED_MEAN1, + RRDR_GROUPING_TRIMMED_MEAN2, + RRDR_GROUPING_TRIMMED_MEAN3, + RRDR_GROUPING_TRIMMED_MEAN, + RRDR_GROUPING_TRIMMED_MEAN10, + RRDR_GROUPING_TRIMMED_MEAN15, + RRDR_GROUPING_TRIMMED_MEAN20, + RRDR_GROUPING_TRIMMED_MEAN25, + RRDR_GROUPING_MEDIAN, + RRDR_GROUPING_TRIMMED_MEDIAN1, + RRDR_GROUPING_TRIMMED_MEDIAN2, + RRDR_GROUPING_TRIMMED_MEDIAN3, + RRDR_GROUPING_TRIMMED_MEDIAN, + RRDR_GROUPING_TRIMMED_MEDIAN10, + RRDR_GROUPING_TRIMMED_MEDIAN15, + RRDR_GROUPING_TRIMMED_MEDIAN20, + RRDR_GROUPING_TRIMMED_MEDIAN25, + RRDR_GROUPING_PERCENTILE25, + RRDR_GROUPING_PERCENTILE50, + RRDR_GROUPING_PERCENTILE75, + RRDR_GROUPING_PERCENTILE80, + RRDR_GROUPING_PERCENTILE90, + RRDR_GROUPING_PERCENTILE, + RRDR_GROUPING_PERCENTILE97, + RRDR_GROUPING_PERCENTILE98, + RRDR_GROUPING_PERCENTILE99, + RRDR_GROUPING_STDDEV, + RRDR_GROUPING_CV, + RRDR_GROUPING_SES, + RRDR_GROUPING_DES, + RRDR_GROUPING_COUNTIF, +} RRDR_TIME_GROUPING; + +const char *time_grouping_id2txt(RRDR_TIME_GROUPING group); +RRDR_TIME_GROUPING time_grouping_txt2id(const char *name); + +void time_grouping_init(void); +RRDR_TIME_GROUPING time_grouping_parse(const char *name, RRDR_TIME_GROUPING def); +const char *time_grouping_tostring(RRDR_TIME_GROUPING group); + +typedef enum rrdr_group_by { + RRDR_GROUP_BY_NONE = 0, + RRDR_GROUP_BY_SELECTED = (1 << 0), + RRDR_GROUP_BY_DIMENSION = (1 << 1), + RRDR_GROUP_BY_INSTANCE = (1 << 2), + RRDR_GROUP_BY_LABEL = (1 << 3), + RRDR_GROUP_BY_NODE = (1 << 4), + RRDR_GROUP_BY_CONTEXT = (1 << 5), + RRDR_GROUP_BY_UNITS = (1 << 6), + RRDR_GROUP_BY_PERCENTAGE_OF_INSTANCE = (1 << 7), +} RRDR_GROUP_BY; + +#define SUPPORTED_GROUP_BY_METHODS (\ + RRDR_GROUP_BY_SELECTED |\ + RRDR_GROUP_BY_DIMENSION |\ + RRDR_GROUP_BY_INSTANCE |\ + RRDR_GROUP_BY_LABEL |\ + RRDR_GROUP_BY_NODE |\ + RRDR_GROUP_BY_CONTEXT |\ + RRDR_GROUP_BY_UNITS |\ + RRDR_GROUP_BY_PERCENTAGE_OF_INSTANCE \ +) + +struct web_buffer; + +RRDR_GROUP_BY group_by_parse(char *s); +void buffer_json_group_by_to_array(struct web_buffer *wb, RRDR_GROUP_BY group_by); + +typedef enum rrdr_group_by_function { + RRDR_GROUP_BY_FUNCTION_AVERAGE = 0, + RRDR_GROUP_BY_FUNCTION_MIN, + RRDR_GROUP_BY_FUNCTION_MAX, + RRDR_GROUP_BY_FUNCTION_SUM, + RRDR_GROUP_BY_FUNCTION_PERCENTAGE, +} RRDR_GROUP_BY_FUNCTION; + +RRDR_GROUP_BY_FUNCTION group_by_aggregate_function_parse(const char *s); +const char *group_by_aggregate_function_to_string(RRDR_GROUP_BY_FUNCTION group_by_function); + +#ifdef __cplusplus +} +#endif + +#endif //NETDATA_API_DATA_QUERY_H diff --git a/src/web/api/queries/rrdr.c b/src/web/api/queries/rrdr.c new file mode 100644 index 000000000..2a0016891 --- /dev/null +++ b/src/web/api/queries/rrdr.c @@ -0,0 +1,146 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#include "rrdr.h" + +/* +static void rrdr_dump(RRDR *r) +{ + long c, i; + RRDDIM *d; + + fprintf(stderr, "\nCHART %s (%s)\n", r->st->id, r->st->name); + + for(c = 0, d = r->st->dimensions; d ;c++, d = d->next) { + fprintf(stderr, "DIMENSION %s (%s), %s%s%s%s\n" + , d->id + , d->name + , (r->od[c] & RRDR_EMPTY)?"EMPTY ":"" + , (r->od[c] & RRDR_RESET)?"RESET ":"" + , (r->od[c] & RRDR_DIMENSION_HIDDEN)?"HIDDEN ":"" + , (r->od[c] & RRDR_DIMENSION_NONZERO)?"NONZERO ":"" + ); + } + + if(r->rows <= 0) { + fprintf(stderr, "RRDR does not have any values in it.\n"); + return; + } + + fprintf(stderr, "RRDR includes %d values in it:\n", r->rows); + + // for each line in the array + for(i = 0; i < r->rows ;i++) { + NETDATA_DOUBLE *cn = &r->v[ i * r->d ]; + RRDR_DIMENSION_FLAGS *co = &r->o[ i * r->d ]; + + // print the id and the timestamp of the line + fprintf(stderr, "%ld %ld ", i + 1, r->t[i]); + + // for each dimension + for(c = 0, d = r->st->dimensions; d ;c++, d = d->next) { + if(unlikely(r->od[c] & RRDR_DIMENSION_HIDDEN)) continue; + if(unlikely(!(r->od[c] & RRDR_DIMENSION_NONZERO))) continue; + + if(co[c] & RRDR_EMPTY) + fprintf(stderr, "null "); + else + fprintf(stderr, NETDATA_DOUBLE_FORMAT " %s%s%s%s " + , cn[c] + , (co[c] & RRDR_EMPTY)?"E":" " + , (co[c] & RRDR_RESET)?"R":" " + , (co[c] & RRDR_DIMENSION_HIDDEN)?"H":" " + , (co[c] & RRDR_DIMENSION_NONZERO)?"N":" " + ); + } + + fprintf(stderr, "\n"); + } +} +*/ + +inline void rrdr_free(ONEWAYALLOC *owa, RRDR *r) { + if(unlikely(!r)) return; + + for(size_t d = 0; d < r->d ;d++) { + string_freez(r->di[d]); + string_freez(r->dn[d]); + string_freez(r->du[d]); + } + + query_target_release(r->internal.release_with_rrdr_qt); + + onewayalloc_freez(owa, r->t); + onewayalloc_freez(owa, r->v); + onewayalloc_freez(owa, r->vh); + onewayalloc_freez(owa, r->o); + onewayalloc_freez(owa, r->od); + onewayalloc_freez(owa, r->di); + onewayalloc_freez(owa, r->dn); + onewayalloc_freez(owa, r->du); + onewayalloc_freez(owa, r->dp); + onewayalloc_freez(owa, r->dview); + onewayalloc_freez(owa, r->dqp); + onewayalloc_freez(owa, r->ar); + onewayalloc_freez(owa, r->gbc); + onewayalloc_freez(owa, r->dgbc); + onewayalloc_freez(owa, r->dgbs); + + if(r->dl) { + for(size_t d = 0; d < r->d ;d++) + dictionary_destroy(r->dl[d]); + + onewayalloc_freez(owa, r->dl); + } + + dictionary_destroy(r->label_keys); + + if(r->group_by.r) { + // prevent accidental infinite recursion + r->group_by.r->group_by.r = NULL; + + // do not release qt twice + r->group_by.r->internal.qt = NULL; + + rrdr_free(owa, r->group_by.r); + } + + onewayalloc_freez(owa, r); +} + +RRDR *rrdr_create(ONEWAYALLOC *owa, QUERY_TARGET *qt, size_t dimensions, size_t points) { + if(unlikely(!qt)) + return NULL; + + // create the rrdr + RRDR *r = onewayalloc_callocz(owa, 1, sizeof(RRDR)); + r->internal.owa = owa; + r->internal.qt = qt; + + r->view.before = qt->window.before; + r->view.after = qt->window.after; + r->time_grouping.points_wanted = points; + r->d = (int)dimensions; + r->n = (int)points; + + if(points && dimensions) { + r->v = onewayalloc_mallocz(owa, points * dimensions * sizeof(NETDATA_DOUBLE)); + r->o = onewayalloc_mallocz(owa, points * dimensions * sizeof(RRDR_VALUE_FLAGS)); + r->ar = onewayalloc_mallocz(owa, points * dimensions * sizeof(NETDATA_DOUBLE)); + } + + if(points) { + r->t = onewayalloc_callocz(owa, points, sizeof(time_t)); + } + + if(dimensions) { + r->od = onewayalloc_mallocz(owa, dimensions * sizeof(RRDR_DIMENSION_FLAGS)); + r->di = onewayalloc_callocz(owa, dimensions, sizeof(STRING *)); + r->dn = onewayalloc_callocz(owa, dimensions, sizeof(STRING *)); + r->du = onewayalloc_callocz(owa, dimensions, sizeof(STRING *)); + } + + r->view.group = 1; + r->view.update_every = 1; + + return r; +} diff --git a/src/web/api/queries/rrdr.h b/src/web/api/queries/rrdr.h new file mode 100644 index 000000000..d36d3f5b3 --- /dev/null +++ b/src/web/api/queries/rrdr.h @@ -0,0 +1,215 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_QUERIES_RRDR_H +#define NETDATA_QUERIES_RRDR_H + +#include "libnetdata/libnetdata.h" +#include "web/api/queries/query.h" + +#ifdef __cplusplus +extern "C" { +#endif + +typedef enum tier_query_fetch { + TIER_QUERY_FETCH_SUM, + TIER_QUERY_FETCH_MIN, + TIER_QUERY_FETCH_MAX, + TIER_QUERY_FETCH_AVERAGE +} TIER_QUERY_FETCH; + +typedef enum rrdr_options { + RRDR_OPTION_NONZERO = (1 << 0), // don't output dimensions with just zero values + RRDR_OPTION_REVERSED = (1 << 1), // output the rows in reverse order (oldest to newest) + RRDR_OPTION_ABSOLUTE = (1 << 2), // values positive, for DATASOURCE_SSV before summing + RRDR_OPTION_DIMS_MIN2MAX = (1 << 3), // when adding dimensions, use max - min, instead of sum + RRDR_OPTION_DIMS_AVERAGE = (1 << 4), // when adding dimensions, use average, instead of sum + RRDR_OPTION_DIMS_MIN = (1 << 5), // when adding dimensions, use minimum, instead of sum + RRDR_OPTION_DIMS_MAX = (1 << 6), // when adding dimensions, use maximum, instead of sum + RRDR_OPTION_SECONDS = (1 << 7), // output seconds, instead of dates + RRDR_OPTION_MILLISECONDS = (1 << 8), // output milliseconds, instead of dates + RRDR_OPTION_NULL2ZERO = (1 << 9), // do not show nulls, convert them to zeros + RRDR_OPTION_OBJECTSROWS = (1 << 10), // each row of values should be an object, not an array + RRDR_OPTION_GOOGLE_JSON = (1 << 11), // comply with google JSON/JSONP specs + RRDR_OPTION_JSON_WRAP = (1 << 12), // wrap the response in a JSON header with info about the result + RRDR_OPTION_LABEL_QUOTES = (1 << 13), // in CSV output, wrap header labels in double quotes + RRDR_OPTION_PERCENTAGE = (1 << 14), // give values as percentage of total + RRDR_OPTION_NOT_ALIGNED = (1 << 15), // do not align charts for persistent timeframes + RRDR_OPTION_DISPLAY_ABS = (1 << 16), // for badges, display the absolute value, but calculate colors with sign + RRDR_OPTION_MATCH_IDS = (1 << 17), // when filtering dimensions, match only IDs + RRDR_OPTION_MATCH_NAMES = (1 << 18), // when filtering dimensions, match only names + RRDR_OPTION_NATURAL_POINTS = (1 << 19), // return the natural points of the database + RRDR_OPTION_VIRTUAL_POINTS = (1 << 20), // return virtual points + RRDR_OPTION_ANOMALY_BIT = (1 << 21), // Return the anomaly bit stored in each collected_number + RRDR_OPTION_RETURN_RAW = (1 << 22), // Return raw data for aggregating across multiple nodes + RRDR_OPTION_RETURN_JWAR = (1 << 23), // Return anomaly rates in jsonwrap + RRDR_OPTION_SELECTED_TIER = (1 << 24), // Use the selected tier for the query + RRDR_OPTION_ALL_DIMENSIONS = (1 << 25), // Return the full dimensions list + RRDR_OPTION_SHOW_DETAILS = (1 << 26), // v2 returns detailed object tree + RRDR_OPTION_DEBUG = (1 << 27), // v2 returns request description + RRDR_OPTION_MINIFY = (1 << 28), // remove JSON spaces and newlines from JSON output + RRDR_OPTION_GROUP_BY_LABELS = (1 << 29), // v2 returns flattened labels per dimension of the chart + + // internal ones - not to be exposed to the API + RRDR_OPTION_INTERNAL_AR = (1 << 31), // internal use only, to let the formatters know we want to render the anomaly rate +} RRDR_OPTIONS; + +typedef enum context_v2_options { + CONTEXT_V2_OPTION_MINIFY = (1 << 0), // remove JSON spaces and newlines from JSON output + CONTEXT_V2_OPTION_DEBUG = (1 << 1), // show the request + CONTEXT_V2_OPTION_ALERTS_WITH_CONFIGURATIONS = (1 << 2), // include alert configurations (used by /api/v2/alert_transitions) + CONTEXT_V2_OPTION_ALERTS_WITH_INSTANCES = (1 << 3), // include alert instances (used by /api/v2/alerts) + CONTEXT_V2_OPTION_ALERTS_WITH_VALUES = (1 << 4), // include alert latest values (used by /api/v2/alerts) + CONTEXT_V2_OPTION_ALERTS_WITH_SUMMARY = (1 << 5), // include alerts summary counters (used by /api/v2/alerts) +} CONTEXTS_V2_OPTIONS; + +typedef enum context_v2_alert_status { + CONTEXT_V2_ALERT_UNINITIALIZED = (1 << 5), // include UNINITIALIZED alerts + CONTEXT_V2_ALERT_UNDEFINED = (1 << 6), // include UNDEFINED alerts + CONTEXT_V2_ALERT_CLEAR = (1 << 7), // include CLEAR alerts + CONTEXT_V2_ALERT_RAISED = (1 << 8), // include WARNING & CRITICAL alerts + CONTEXT_V2_ALERT_WARNING = (1 << 9), // include WARNING alerts + CONTEXT_V2_ALERT_CRITICAL = (1 << 10), // include CRITICAL alerts +} CONTEXTS_V2_ALERT_STATUS; + +#define CONTEXTS_V2_ALERT_STATUSES (CONTEXT_V2_ALERT_UNINITIALIZED|CONTEXT_V2_ALERT_UNDEFINED|CONTEXT_V2_ALERT_CLEAR|CONTEXT_V2_ALERT_RAISED|CONTEXT_V2_ALERT_WARNING|CONTEXT_V2_ALERT_CRITICAL) + +typedef enum __attribute__ ((__packed__)) rrdr_value_flag { + + // IMPORTANT: + // THIS IS AN AGREED BIT MAP BETWEEN AGENT, CLOUD FRONT-END AND CLOUD BACK-END + // DO NOT CHANGE THE MAPPINGS ! + + RRDR_VALUE_NOTHING = 0, // no flag set (a good default) + RRDR_VALUE_EMPTY = (1 << 0), // the database value is empty + RRDR_VALUE_RESET = (1 << 1), // the database value is marked as reset (overflown) + RRDR_VALUE_PARTIAL = (1 << 2), // the database provides partial data about this point (used in group-by) +} RRDR_VALUE_FLAGS; + +typedef enum __attribute__ ((__packed__)) rrdr_dimension_flag { + RRDR_DIMENSION_DEFAULT = 0, + RRDR_DIMENSION_HIDDEN = (1 << 0), // the dimension is hidden (not to be presented to callers) + RRDR_DIMENSION_NONZERO = (1 << 1), // the dimension is non zero (contains non-zero values) + RRDR_DIMENSION_SELECTED = (1 << 2), // the dimension has been selected for query + RRDR_DIMENSION_QUERIED = (1 << 3), // the dimension has been queried + RRDR_DIMENSION_FAILED = (1 << 4), // the dimension failed to be queried + RRDR_DIMENSION_GROUPED = (1 << 5), // the dimension has been grouped in this RRDR +} RRDR_DIMENSION_FLAGS; + +// RRDR result options +typedef enum __attribute__ ((__packed__)) rrdr_result_flags { + RRDR_RESULT_FLAG_ABSOLUTE = (1 << 0), // the query uses absolute time-frames + // (can be cached by browsers and proxies) + RRDR_RESULT_FLAG_RELATIVE = (1 << 1), // the query uses relative time-frames + // (should not to be cached by browsers and proxies) + RRDR_RESULT_FLAG_CANCEL = (1 << 2), // the query needs to be cancelled +} RRDR_RESULT_FLAGS; + +#define RRDR_DVIEW_ANOMALY_COUNT_MULTIPLIER 1000.0 + +typedef struct rrdresult { + size_t d; // the number of dimensions + size_t n; // the number of values in the arrays (number of points per dimension) + size_t rows; // the number of actual rows used + + RRDR_DIMENSION_FLAGS *od; // the options for the dimensions + + STRING **di; // array of d dimension ids + STRING **dn; // array of d dimension names + STRING **du; // array of d dimension units + uint32_t *dgbs; // array of d dimension group by slots - NOT ALLOCATED when RRDR is created + uint32_t *dgbc; // array of d dimension group by counts - NOT ALLOCATED when RRDR is created + uint32_t *dp; // array of d dimension priority - NOT ALLOCATED when RRDR is created + DICTIONARY **dl; // array of d dimension labels - NOT ALLOCATED when RRDR is created + STORAGE_POINT *dqp; // array of d dimensions query points - NOT ALLOCATED when RRDR is created + STORAGE_POINT *dview; // array of d dimensions group by view - NOT ALLOCATED when RRDR is created + NETDATA_DOUBLE *vh; // array of n x d hidden values, while grouping - NOT ALLOCATED when RRDR is created + + DICTIONARY *label_keys; + + time_t *t; // array of n timestamps + NETDATA_DOUBLE *v; // array n x d values + RRDR_VALUE_FLAGS *o; // array n x d options for each value returned + NETDATA_DOUBLE *ar; // array n x d of anomaly rates (0 - 100) + uint32_t *gbc; // array n x d of group by count - NOT ALLOCATED when RRDR is created + + struct { + size_t group; // how many collected values were grouped for each row - NEEDED BY GROUPING FUNCTIONS + time_t after; + time_t before; + time_t update_every; // what is the suggested update frequency in seconds + NETDATA_DOUBLE min; + NETDATA_DOUBLE max; + RRDR_RESULT_FLAGS flags; // RRDR_RESULT_FLAG_* + } view; + + struct { + size_t db_points_read; + size_t result_points_generated; + } stats; + + struct { + void *data; // the internal data of the grouping function + + // grouping function pointers + RRDR_TIME_GROUPING add_flush; + void (*create)(struct rrdresult *r, const char *options); + void (*reset)(struct rrdresult *r); + void (*free)(struct rrdresult *r); + void (*add)(struct rrdresult *r, NETDATA_DOUBLE value); + NETDATA_DOUBLE (*flush)(struct rrdresult *r, RRDR_VALUE_FLAGS *rrdr_value_options_ptr); + + TIER_QUERY_FETCH tier_query_fetch; // which value to use from STORAGE_POINT + + size_t points_wanted; // used by SES and DES + size_t resampling_group; // used by AVERAGE + NETDATA_DOUBLE resampling_divisor; // used by AVERAGE + } time_grouping; + + struct { + struct rrdresult *r; + } group_by; + + struct { + time_t max_update_every; + time_t expected_after; + time_t trimmed_after; + } partial_data_trimming; + + struct { + ONEWAYALLOC *owa; // the allocator used + struct query_target *qt; // the QUERY_TARGET + size_t contexts; // temp needed between json_wrapper_begin2() and json_wrapper_end2() + size_t queries_count; // temp needed to know if a query is the first executed + +#ifdef NETDATA_INTERNAL_CHECKS + const char *log; +#endif + + struct query_target *release_with_rrdr_qt; + } internal; +} RRDR; + +#define rrdr_rows(r) ((r)->rows) + +#include "database/rrd.h" +void rrdr_free(ONEWAYALLOC *owa, RRDR *r); +RRDR *rrdr_create(ONEWAYALLOC *owa, struct query_target *qt, size_t dimensions, size_t points); + +#include "../web_api_v1.h" +#include "web/api/queries/query.h" + +RRDR *rrd2rrdr_legacy( + ONEWAYALLOC *owa, + RRDSET *st, size_t points, time_t after, time_t before, + RRDR_TIME_GROUPING group_method, time_t resampling_time, RRDR_OPTIONS options, const char *dimensions, + const char *group_options, time_t timeout_ms, size_t tier, QUERY_SOURCE query_source, + STORAGE_PRIORITY priority); + +RRDR *rrd2rrdr(ONEWAYALLOC *owa, struct query_target *qt); +bool query_target_calculate_window(struct query_target *qt); + +#ifdef __cplusplus +} +#endif + +#endif //NETDATA_QUERIES_RRDR_H diff --git a/src/web/api/queries/ses/README.md b/src/web/api/queries/ses/README.md new file mode 100644 index 000000000..e2fd65d7a --- /dev/null +++ b/src/web/api/queries/ses/README.md @@ -0,0 +1,65 @@ +<!-- +title: "Single (or Simple) Exponential Smoothing (`ses`)" +sidebar_label: "Single (or Simple) Exponential Smoothing (`ses`)" +custom_edit_url: https://github.com/netdata/netdata/edit/master/src/web/api/queries/ses/README.md +learn_status: "Published" +learn_topic_type: "References" +learn_rel_path: "Developers/Web/Api/Queries" +--> + +# Single (or Simple) Exponential Smoothing (`ses`) + +> This query is also available as `ema` and `ewma`. + +An exponential moving average (`ema`), also known as an exponentially weighted moving average (`ewma`) +is a first-order infinite impulse response filter that applies weighting factors which decrease +exponentially. The weighting for each older datum decreases exponentially, never reaching zero. + +In simple terms, this is like an average value, but more recent values are given more weight. + +Netdata automatically adjusts the weight (`alpha`) based on the number of values processed, +using the formula: + +``` +window = max(number of values, 15) +alpha = 2 / (window + 1) +``` + +You can change the fixed value `15` by setting in `netdata.conf`: + +``` +[web] + ses max window = 15 +``` + +## how to use + +Use it in alerts like this: + +``` + alarm: my_alert + on: my_chart +lookup: ses -1m unaligned of my_dimension + warn: $this > 1000 +``` + +`ses` does not change the units. For example, if the chart units is `requests/sec`, the exponential +moving average will be again expressed in the same units. + +It can also be used in APIs and badges as `&group=ses` in the URL. + +## Examples + +Examining last 1 minute `successful` web server responses: + +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=min&after=-60&label=min) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=average&after=-60&label=average&value_color=yellow) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=ses&after=-60&label=single+exponential+smoothing&value_color=orange) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=max&after=-60&label=max) + +## References + +- <https://en.wikipedia.org/wiki/Moving_average#exponential-moving-average> +- <https://en.wikipedia.org/wiki/Exponential_smoothing>. + + diff --git a/src/web/api/queries/ses/ses.c b/src/web/api/queries/ses/ses.c new file mode 100644 index 000000000..39eb445a0 --- /dev/null +++ b/src/web/api/queries/ses/ses.c @@ -0,0 +1,8 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#include "ses.h" + + +// ---------------------------------------------------------------------------- +// single exponential smoothing + diff --git a/src/web/api/queries/ses/ses.h b/src/web/api/queries/ses/ses.h new file mode 100644 index 000000000..de8645ff0 --- /dev/null +++ b/src/web/api/queries/ses/ses.h @@ -0,0 +1,92 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_API_QUERIES_SES_H +#define NETDATA_API_QUERIES_SES_H + +#include "../query.h" +#include "../rrdr.h" + +struct tg_ses { + NETDATA_DOUBLE alpha; + NETDATA_DOUBLE alpha_other; + NETDATA_DOUBLE level; + size_t count; +}; + +static size_t tg_ses_max_window_size = 15; + +static inline void tg_ses_init(void) { + long long ret = config_get_number(CONFIG_SECTION_WEB, "ses max tg_des_window", (long long)tg_ses_max_window_size); + if(ret <= 1) { + config_set_number(CONFIG_SECTION_WEB, "ses max tg_des_window", (long long)tg_ses_max_window_size); + } + else { + tg_ses_max_window_size = (size_t) ret; + } +} + +static inline NETDATA_DOUBLE tg_ses_window(RRDR *r, struct tg_ses *g) { + (void)g; + + NETDATA_DOUBLE points; + if(r->view.group == 1) { + // provide a running DES + points = (NETDATA_DOUBLE)r->time_grouping.points_wanted; + } + else { + // provide a SES with flush points + points = (NETDATA_DOUBLE)r->view.group; + } + + return (points > (NETDATA_DOUBLE)tg_ses_max_window_size) ? (NETDATA_DOUBLE)tg_ses_max_window_size : points; +} + +static inline void tg_ses_set_alpha(RRDR *r, struct tg_ses *g) { + // https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average + // A commonly used value for alpha is 2 / (N + 1) + g->alpha = 2.0 / (tg_ses_window(r, g) + 1.0); + g->alpha_other = 1.0 - g->alpha; +} + +static inline void tg_ses_create(RRDR *r, const char *options __maybe_unused) { + struct tg_ses *g = (struct tg_ses *)onewayalloc_callocz(r->internal.owa, 1, sizeof(struct tg_ses)); + tg_ses_set_alpha(r, g); + g->level = 0.0; + r->time_grouping.data = g; +} + +// resets when switches dimensions +// so, clear everything to restart +static inline void tg_ses_reset(RRDR *r) { + struct tg_ses *g = (struct tg_ses *)r->time_grouping.data; + g->level = 0.0; + g->count = 0; +} + +static inline void tg_ses_free(RRDR *r) { + onewayalloc_freez(r->internal.owa, r->time_grouping.data); + r->time_grouping.data = NULL; +} + +static inline void tg_ses_add(RRDR *r, NETDATA_DOUBLE value) { + struct tg_ses *g = (struct tg_ses *)r->time_grouping.data; + + if(unlikely(!g->count)) + g->level = value; + + g->level = g->alpha * value + g->alpha_other * g->level; + g->count++; +} + +static inline NETDATA_DOUBLE tg_ses_flush(RRDR *r, RRDR_VALUE_FLAGS *rrdr_value_options_ptr) { + struct tg_ses *g = (struct tg_ses *)r->time_grouping.data; + + if(unlikely(!g->count || !netdata_double_isnumber(g->level))) { + *rrdr_value_options_ptr |= RRDR_VALUE_EMPTY; + return 0.0; + } + + return g->level; +} + +#endif //NETDATA_API_QUERIES_SES_H diff --git a/src/web/api/queries/stddev/README.md b/src/web/api/queries/stddev/README.md new file mode 100644 index 000000000..76cfee1f1 --- /dev/null +++ b/src/web/api/queries/stddev/README.md @@ -0,0 +1,97 @@ +<!-- +title: "standard deviation (`stddev`)" +sidebar_label: "standard deviation (`stddev`)" +custom_edit_url: https://github.com/netdata/netdata/edit/master/src/web/api/queries/stddev/README.md +learn_status: "Published" +learn_topic_type: "References" +learn_rel_path: "Developers/Web/Api/Queries" +--> + +# standard deviation (`stddev`) + +The standard deviation is a measure that is used to quantify the amount of variation or dispersion +of a set of data values. + +A low standard deviation indicates that the data points tend to be close to the mean (also called the +expected value) of the set, while a high standard deviation indicates that the data points are spread +out over a wider range of values. + +## how to use + +Use it in alerts like this: + +``` + alarm: my_alert + on: my_chart +lookup: stddev -1m unaligned of my_dimension + warn: $this > 1000 +``` + +`stdev` does not change the units. For example, if the chart units is `requests/sec`, the standard +deviation will be again expressed in the same units. + +It can also be used in APIs and badges as `&group=stddev` in the URL. + +## Examples + +Examining last 1 minute `successful` web server responses: + +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&dimensions=success&group=min&after=-60&label=min) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&dimensions=success&group=average&after=-60&label=average&value_color=yellow) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&dimensions=success&group=stddev&after=-60&label=standard+deviation&value_color=orange) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&dimensions=success&group=max&after=-60&label=max) + +## References + +Check <https://en.wikipedia.org/wiki/Standard_deviation>. + +--- + +# Coefficient of variation (`cv`) + +> This query is also available as `rsd`. + +The coefficient of variation (`cv`), also known as relative standard deviation (`rsd`), +is a standardized measure of dispersion of a probability distribution or frequency distribution. + +It is defined as the ratio of the **standard deviation** to the **mean**. + +In simple terms, it gives the percentage of change. So, if the average value of a metric is 1000 +and its standard deviation is 100 (meaning that it variates from 900 to 1100), then `cv` is 10%. + +This is an easy way to check the % variation, without using absolute values. + +For example, you may trigger an alert if your web server requests/sec `cv` is above 20 (`%`) +over the last minute. So if your web server was serving 1000 reqs/sec over the last minute, +it will trigger the alert if had spikes below 800/sec or above 1200/sec. + +## how to use + +Use it in alerts like this: + +``` + alarm: my_alert + on: my_chart +lookup: cv -1m unaligned of my_dimension + units: % + warn: $this > 20 +``` + +The units reported by `cv` is always `%`. + +It can also be used in APIs and badges as `&group=cv` in the URL. + +## Examples + +Examining last 1 minute `successful` web server responses: + +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&dimensions=success&group=min&after=-60&label=min) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&dimensions=success&group=average&after=-60&label=average&value_color=yellow) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&dimensions=success&group=cv&after=-60&label=coefficient+of+variation&value_color=orange&units=pcent) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&dimensions=success&group=max&after=-60&label=max) + +## References + +Check <https://en.wikipedia.org/wiki/Coefficient_of_variation>. + + diff --git a/src/web/api/queries/stddev/stddev.c b/src/web/api/queries/stddev/stddev.c new file mode 100644 index 000000000..8f5431194 --- /dev/null +++ b/src/web/api/queries/stddev/stddev.c @@ -0,0 +1,61 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#include "stddev.h" + + +// ---------------------------------------------------------------------------- +// stddev + +/* + * Mean = average + * +NETDATA_DOUBLE grouping_flush_mean(RRDR *r, RRDR_VALUE_FLAGS *rrdr_value_options_ptr) { + struct grouping_stddev *g = (struct grouping_stddev *)r->grouping.grouping_data; + + NETDATA_DOUBLE value; + + if(unlikely(!g->count)) { + value = 0.0; + *rrdr_value_options_ptr |= RRDR_VALUE_EMPTY; + } + else { + value = mean(g); + + if(!isnormal(value)) { + value = 0.0; + *rrdr_value_options_ptr |= RRDR_VALUE_EMPTY; + } + } + + grouping_reset_stddev(r); + + return value; +} + */ + +/* + * It is not advised to use this version of variance directly + * +NETDATA_DOUBLE grouping_flush_variance(RRDR *r, RRDR_VALUE_FLAGS *rrdr_value_options_ptr) { + struct grouping_stddev *g = (struct grouping_stddev *)r->grouping.grouping_data; + + NETDATA_DOUBLE value; + + if(unlikely(!g->count)) { + value = 0.0; + *rrdr_value_options_ptr |= RRDR_VALUE_EMPTY; + } + else { + value = variance(g); + + if(!isnormal(value)) { + value = 0.0; + *rrdr_value_options_ptr |= RRDR_VALUE_EMPTY; + } + } + + grouping_reset_stddev(r); + + return value; +} +*/
\ No newline at end of file diff --git a/src/web/api/queries/stddev/stddev.h b/src/web/api/queries/stddev/stddev.h new file mode 100644 index 000000000..f7a1a06c3 --- /dev/null +++ b/src/web/api/queries/stddev/stddev.h @@ -0,0 +1,120 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_API_QUERIES_STDDEV_H +#define NETDATA_API_QUERIES_STDDEV_H + +#include "../query.h" +#include "../rrdr.h" + +// this implementation comes from: +// https://www.johndcook.com/blog/standard_deviation/ + +struct tg_stddev { + long count; + NETDATA_DOUBLE m_oldM, m_newM, m_oldS, m_newS; +}; + +static inline void tg_stddev_create(RRDR *r, const char *options __maybe_unused) { + r->time_grouping.data = onewayalloc_callocz(r->internal.owa, 1, sizeof(struct tg_stddev)); +} + +// resets when switches dimensions +// so, clear everything to restart +static inline void tg_stddev_reset(RRDR *r) { + struct tg_stddev *g = (struct tg_stddev *)r->time_grouping.data; + g->count = 0; +} + +static inline void tg_stddev_free(RRDR *r) { + onewayalloc_freez(r->internal.owa, r->time_grouping.data); + r->time_grouping.data = NULL; +} + +static inline void tg_stddev_add(RRDR *r, NETDATA_DOUBLE value) { + struct tg_stddev *g = (struct tg_stddev *)r->time_grouping.data; + + g->count++; + + // See Knuth TAOCP vol 2, 3rd edition, page 232 + if (g->count == 1) { + g->m_oldM = g->m_newM = value; + g->m_oldS = 0.0; + } + else { + g->m_newM = g->m_oldM + (value - g->m_oldM) / g->count; + g->m_newS = g->m_oldS + (value - g->m_oldM) * (value - g->m_newM); + + // set up for next iteration + g->m_oldM = g->m_newM; + g->m_oldS = g->m_newS; + } +} + +static inline NETDATA_DOUBLE tg_stddev_mean(struct tg_stddev *g) { + return (g->count > 0) ? g->m_newM : 0.0; +} + +static inline NETDATA_DOUBLE tg_stddev_variance(struct tg_stddev *g) { + return ( (g->count > 1) ? g->m_newS/(NETDATA_DOUBLE)(g->count - 1) : 0.0 ); +} +static inline NETDATA_DOUBLE tg_stddev_stddev(struct tg_stddev *g) { + return sqrtndd(tg_stddev_variance(g)); +} + +static inline NETDATA_DOUBLE tg_stddev_flush(RRDR *r, RRDR_VALUE_FLAGS *rrdr_value_options_ptr) { + struct tg_stddev *g = (struct tg_stddev *)r->time_grouping.data; + + NETDATA_DOUBLE value; + + if(likely(g->count > 1)) { + value = tg_stddev_stddev(g); + + if(!netdata_double_isnumber(value)) { + value = 0.0; + *rrdr_value_options_ptr |= RRDR_VALUE_EMPTY; + } + } + else if(g->count == 1) { + value = 0.0; + } + else { + value = 0.0; + *rrdr_value_options_ptr |= RRDR_VALUE_EMPTY; + } + + tg_stddev_reset(r); + + return value; +} + +// https://en.wikipedia.org/wiki/Coefficient_of_variation +static inline NETDATA_DOUBLE tg_stddev_coefficient_of_variation_flush(RRDR *r, RRDR_VALUE_FLAGS *rrdr_value_options_ptr) { + struct tg_stddev *g = (struct tg_stddev *)r->time_grouping.data; + + NETDATA_DOUBLE value; + + if(likely(g->count > 1)) { + NETDATA_DOUBLE m = tg_stddev_mean(g); + value = 100.0 * tg_stddev_stddev(g) / ((m < 0)? -m : m); + + if(unlikely(!netdata_double_isnumber(value))) { + value = 0.0; + *rrdr_value_options_ptr |= RRDR_VALUE_EMPTY; + } + } + else if(g->count == 1) { + // one value collected + value = 0.0; + } + else { + // no values collected + value = 0.0; + *rrdr_value_options_ptr |= RRDR_VALUE_EMPTY; + } + + tg_stddev_reset(r); + + return value; +} + +#endif //NETDATA_API_QUERIES_STDDEV_H diff --git a/src/web/api/queries/sum/README.md b/src/web/api/queries/sum/README.md new file mode 100644 index 000000000..dd29b9c5b --- /dev/null +++ b/src/web/api/queries/sum/README.md @@ -0,0 +1,45 @@ +<!-- +title: "Sum" +sidebar_label: "Sum" +custom_edit_url: https://github.com/netdata/netdata/edit/master/src/web/api/queries/sum/README.md +learn_status: "Published" +learn_topic_type: "References" +learn_rel_path: "Developers/Web/Api/Queries" +--> + +# Sum + +This module sums all the values in the time-frame requested. + +You can use `sum` to find the volume of something over a period. + +## how to use + +Use it in alarms like this: + +``` + alarm: my_alarm + on: my_chart +lookup: sum -1m unaligned of my_dimension + warn: $this > 1000 +``` + +`sum` does not change the units. For example, if the chart units is `requests/sec`, the result +will be again expressed in the same units. + +It can also be used in APIs and badges as `&group=sum` in the URL. + +## Examples + +Examining last 1 minute `successful` web server responses: + +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=min&after=-60&label=min) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=average&after=-60&label=average) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=max&after=-60&label=max) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=sum&after=-60&label=1m+sum&value_color=orange&units=requests) + +## References + +- <https://en.wikipedia.org/wiki/Summation>. + + diff --git a/src/web/api/queries/sum/sum.c b/src/web/api/queries/sum/sum.c new file mode 100644 index 000000000..cf4484217 --- /dev/null +++ b/src/web/api/queries/sum/sum.c @@ -0,0 +1,9 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#include "sum.h" + +// ---------------------------------------------------------------------------- +// sum + + + diff --git a/src/web/api/queries/sum/sum.h b/src/web/api/queries/sum/sum.h new file mode 100644 index 000000000..5e07f45d6 --- /dev/null +++ b/src/web/api/queries/sum/sum.h @@ -0,0 +1,56 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_API_QUERY_SUM_H +#define NETDATA_API_QUERY_SUM_H + +#include "../query.h" +#include "../rrdr.h" + +struct tg_sum { + NETDATA_DOUBLE sum; + size_t count; +}; + +static inline void tg_sum_create(RRDR *r, const char *options __maybe_unused) { + r->time_grouping.data = onewayalloc_callocz(r->internal.owa, 1, sizeof(struct tg_sum)); +} + +// resets when switches dimensions +// so, clear everything to restart +static inline void tg_sum_reset(RRDR *r) { + struct tg_sum *g = (struct tg_sum *)r->time_grouping.data; + g->sum = 0; + g->count = 0; +} + +static inline void tg_sum_free(RRDR *r) { + onewayalloc_freez(r->internal.owa, r->time_grouping.data); + r->time_grouping.data = NULL; +} + +static inline void tg_sum_add(RRDR *r, NETDATA_DOUBLE value) { + struct tg_sum *g = (struct tg_sum *)r->time_grouping.data; + g->sum += value; + g->count++; +} + +static inline NETDATA_DOUBLE tg_sum_flush(RRDR *r, RRDR_VALUE_FLAGS *rrdr_value_options_ptr) { + struct tg_sum *g = (struct tg_sum *)r->time_grouping.data; + + NETDATA_DOUBLE value; + + if(unlikely(!g->count)) { + value = 0.0; + *rrdr_value_options_ptr |= RRDR_VALUE_EMPTY; + } + else { + value = g->sum; + } + + g->sum = 0.0; + g->count = 0; + + return value; +} + +#endif //NETDATA_API_QUERY_SUM_H diff --git a/src/web/api/queries/trimmed_mean/README.md b/src/web/api/queries/trimmed_mean/README.md new file mode 100644 index 000000000..969023292 --- /dev/null +++ b/src/web/api/queries/trimmed_mean/README.md @@ -0,0 +1,60 @@ +<!-- +title: "Trimmed Mean" +sidebar_label: "Trimmed Mean" +description: "Use trimmed-mean in API queries and health entities to find the average value from a sample, eliminating any unwanted spikes in the returned metrics." +custom_edit_url: https://github.com/netdata/netdata/edit/master/src/web/api/queries/trimmed_mean/README.md +learn_status: "Published" +learn_topic_type: "References" +learn_rel_path: "Developers/Web/Api/Queries" +--> + +# Trimmed Mean + +The trimmed mean is the average value of a series excluding the smallest and biggest points. + +Netdata applies linear interpolation on the last point, if the percentage requested to be excluded does not give a +round number of points. + +The following percentile aliases are defined: + +- `trimmed-mean1` +- `trimmed-mean2` +- `trimmed-mean3` +- `trimmed-mean5` +- `trimmed-mean10` +- `trimmed-mean15` +- `trimmed-mean20` +- `trimmed-mean25` + +The default `trimmed-mean` is an alias for `trimmed-mean5`. +Any percentage may be requested using the `group_options` query parameter. + +## how to use + +Use it in alerts like this: + +``` + alarm: my_alert + on: my_chart +lookup: trimmed-mean5 -1m unaligned of my_dimension + warn: $this > 1000 +``` + +`trimmed-mean` does not change the units. For example, if the chart units is `requests/sec`, the result +will be again expressed in the same units. + +It can also be used in APIs and badges as `&group=trimmed-mean` in the URL and the additional parameter `group_options` +may be used to request any percentage (e.g. `&group=trimmed-mean&group_options=29`). + +## Examples + +Examining last 1 minute `successful` web server responses: + +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=min&after=-60&label=min) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=average&after=-60&label=average) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=trimmed-mean5&after=-60&label=trimmed-mean5&value_color=orange) +- ![](https://registry.my-netdata.io/api/v1/badge.svg?chart=web_log_nginx.response_statuses&options=unaligned&dimensions=success&group=max&after=-60&label=max) + +## References + +- <https://en.wikipedia.org/wiki/Truncated_mean>. diff --git a/src/web/api/queries/trimmed_mean/trimmed_mean.c b/src/web/api/queries/trimmed_mean/trimmed_mean.c new file mode 100644 index 000000000..c50db7ed6 --- /dev/null +++ b/src/web/api/queries/trimmed_mean/trimmed_mean.c @@ -0,0 +1,7 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#include "trimmed_mean.h" + +// ---------------------------------------------------------------------------- +// median + diff --git a/src/web/api/queries/trimmed_mean/trimmed_mean.h b/src/web/api/queries/trimmed_mean/trimmed_mean.h new file mode 100644 index 000000000..3c09015bf --- /dev/null +++ b/src/web/api/queries/trimmed_mean/trimmed_mean.h @@ -0,0 +1,169 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_API_QUERIES_TRIMMED_MEAN_H +#define NETDATA_API_QUERIES_TRIMMED_MEAN_H + +#include "../query.h" +#include "../rrdr.h" + +struct tg_trimmed_mean { + size_t series_size; + size_t next_pos; + NETDATA_DOUBLE percent; + + NETDATA_DOUBLE *series; +}; + +static inline void tg_trimmed_mean_create_internal(RRDR *r, const char *options, NETDATA_DOUBLE def) { + long entries = r->view.group; + if(entries < 10) entries = 10; + + struct tg_trimmed_mean *g = (struct tg_trimmed_mean *)onewayalloc_callocz(r->internal.owa, 1, sizeof(struct tg_trimmed_mean)); + g->series = onewayalloc_mallocz(r->internal.owa, entries * sizeof(NETDATA_DOUBLE)); + g->series_size = (size_t)entries; + + g->percent = def; + if(options && *options) { + g->percent = str2ndd(options, NULL); + if(!netdata_double_isnumber(g->percent)) g->percent = 0.0; + if(g->percent < 0.0) g->percent = 0.0; + if(g->percent > 50.0) g->percent = 50.0; + } + + g->percent = 1.0 - ((g->percent / 100.0) * 2.0); + r->time_grouping.data = g; +} + +static inline void tg_trimmed_mean_create_1(RRDR *r, const char *options) { + tg_trimmed_mean_create_internal(r, options, 1.0); +} +static inline void tg_trimmed_mean_create_2(RRDR *r, const char *options) { + tg_trimmed_mean_create_internal(r, options, 2.0); +} +static inline void tg_trimmed_mean_create_3(RRDR *r, const char *options) { + tg_trimmed_mean_create_internal(r, options, 3.0); +} +static inline void tg_trimmed_mean_create_5(RRDR *r, const char *options) { + tg_trimmed_mean_create_internal(r, options, 5.0); +} +static inline void tg_trimmed_mean_create_10(RRDR *r, const char *options) { + tg_trimmed_mean_create_internal(r, options, 10.0); +} +static inline void tg_trimmed_mean_create_15(RRDR *r, const char *options) { + tg_trimmed_mean_create_internal(r, options, 15.0); +} +static inline void tg_trimmed_mean_create_20(RRDR *r, const char *options) { + tg_trimmed_mean_create_internal(r, options, 20.0); +} +static inline void tg_trimmed_mean_create_25(RRDR *r, const char *options) { + tg_trimmed_mean_create_internal(r, options, 25.0); +} + +// resets when switches dimensions +// so, clear everything to restart +static inline void tg_trimmed_mean_reset(RRDR *r) { + struct tg_trimmed_mean *g = (struct tg_trimmed_mean *)r->time_grouping.data; + g->next_pos = 0; +} + +static inline void tg_trimmed_mean_free(RRDR *r) { + struct tg_trimmed_mean *g = (struct tg_trimmed_mean *)r->time_grouping.data; + if(g) onewayalloc_freez(r->internal.owa, g->series); + + onewayalloc_freez(r->internal.owa, r->time_grouping.data); + r->time_grouping.data = NULL; +} + +static inline void tg_trimmed_mean_add(RRDR *r, NETDATA_DOUBLE value) { + struct tg_trimmed_mean *g = (struct tg_trimmed_mean *)r->time_grouping.data; + + if(unlikely(g->next_pos >= g->series_size)) { + g->series = onewayalloc_doublesize( r->internal.owa, g->series, g->series_size * sizeof(NETDATA_DOUBLE)); + g->series_size *= 2; + } + + g->series[g->next_pos++] = value; +} + +static inline NETDATA_DOUBLE tg_trimmed_mean_flush(RRDR *r, RRDR_VALUE_FLAGS *rrdr_value_options_ptr) { + struct tg_trimmed_mean *g = (struct tg_trimmed_mean *)r->time_grouping.data; + + NETDATA_DOUBLE value; + size_t available_slots = g->next_pos; + + if(unlikely(!available_slots)) { + value = 0.0; + *rrdr_value_options_ptr |= RRDR_VALUE_EMPTY; + } + else if(available_slots == 1) { + value = g->series[0]; + } + else { + sort_series(g->series, available_slots); + + NETDATA_DOUBLE min = g->series[0]; + NETDATA_DOUBLE max = g->series[available_slots - 1]; + + if (min != max) { + size_t slots_to_use = (size_t)((NETDATA_DOUBLE)available_slots * g->percent); + if(!slots_to_use) slots_to_use = 1; + + NETDATA_DOUBLE percent_to_use = (NETDATA_DOUBLE)slots_to_use / (NETDATA_DOUBLE)available_slots; + NETDATA_DOUBLE percent_delta = g->percent - percent_to_use; + + NETDATA_DOUBLE percent_interpolation_slot = 0.0; + NETDATA_DOUBLE percent_last_slot = 0.0; + if(percent_delta > 0.0) { + NETDATA_DOUBLE percent_to_use_plus_1_slot = (NETDATA_DOUBLE)(slots_to_use + 1) / (NETDATA_DOUBLE)available_slots; + NETDATA_DOUBLE percent_1slot = percent_to_use_plus_1_slot - percent_to_use; + + percent_interpolation_slot = percent_delta / percent_1slot; + percent_last_slot = 1 - percent_interpolation_slot; + } + + int start_slot, stop_slot, step, last_slot, interpolation_slot; + if(min >= 0.0 && max >= 0.0) { + start_slot = (int)((available_slots - slots_to_use) / 2); + stop_slot = start_slot + (int)slots_to_use; + last_slot = stop_slot - 1; + interpolation_slot = stop_slot; + step = 1; + } + else { + start_slot = (int)available_slots - 1 - (int)((available_slots - slots_to_use) / 2); + stop_slot = start_slot - (int)slots_to_use; + last_slot = stop_slot + 1; + interpolation_slot = stop_slot; + step = -1; + } + + value = 0.0; + for(int slot = start_slot; slot != stop_slot ; slot += step) + value += g->series[slot]; + + size_t counted = slots_to_use; + if(percent_interpolation_slot > 0.0 && interpolation_slot >= 0 && interpolation_slot < (int)available_slots) { + value += g->series[interpolation_slot] * percent_interpolation_slot; + value += g->series[last_slot] * percent_last_slot; + counted++; + } + + value = value / (NETDATA_DOUBLE)counted; + } + else + value = min; + } + + if(unlikely(!netdata_double_isnumber(value))) { + value = 0.0; + *rrdr_value_options_ptr |= RRDR_VALUE_EMPTY; + } + + //log_series_to_stderr(g->series, g->next_pos, value, "trimmed_mean"); + + g->next_pos = 0; + + return value; +} + +#endif //NETDATA_API_QUERIES_TRIMMED_MEAN_H diff --git a/src/web/api/queries/weights.c b/src/web/api/queries/weights.c new file mode 100644 index 000000000..44928fea8 --- /dev/null +++ b/src/web/api/queries/weights.c @@ -0,0 +1,2105 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#include "daemon/common.h" +#include "database/KolmogorovSmirnovDist.h" + +#define MAX_POINTS 10000 +int enable_metric_correlations = CONFIG_BOOLEAN_YES; +int metric_correlations_version = 1; +WEIGHTS_METHOD default_metric_correlations_method = WEIGHTS_METHOD_MC_KS2; + +typedef struct weights_stats { + NETDATA_DOUBLE max_base_high_ratio; + size_t db_points; + size_t result_points; + size_t db_queries; + size_t db_points_per_tier[RRD_STORAGE_TIERS]; + size_t binary_searches; +} WEIGHTS_STATS; + +// ---------------------------------------------------------------------------- +// parse and render metric correlations methods + +static struct { + const char *name; + WEIGHTS_METHOD value; +} weights_methods[] = { + { "ks2" , WEIGHTS_METHOD_MC_KS2} + , { "volume" , WEIGHTS_METHOD_MC_VOLUME} + , { "anomaly-rate" , WEIGHTS_METHOD_ANOMALY_RATE} + , { "value" , WEIGHTS_METHOD_VALUE} + , { NULL , 0 } +}; + +WEIGHTS_METHOD weights_string_to_method(const char *method) { + for(int i = 0; weights_methods[i].name ;i++) + if(strcmp(method, weights_methods[i].name) == 0) + return weights_methods[i].value; + + return default_metric_correlations_method; +} + +const char *weights_method_to_string(WEIGHTS_METHOD method) { + for(int i = 0; weights_methods[i].name ;i++) + if(weights_methods[i].value == method) + return weights_methods[i].name; + + return "unknown"; +} + +// ---------------------------------------------------------------------------- +// The results per dimension are aggregated into a dictionary + +typedef enum { + RESULT_IS_BASE_HIGH_RATIO = (1 << 0), + RESULT_IS_PERCENTAGE_OF_TIME = (1 << 1), +} RESULT_FLAGS; + +struct register_result { + RESULT_FLAGS flags; + RRDHOST *host; + RRDCONTEXT_ACQUIRED *rca; + RRDINSTANCE_ACQUIRED *ria; + RRDMETRIC_ACQUIRED *rma; + NETDATA_DOUBLE value; + STORAGE_POINT highlighted; + STORAGE_POINT baseline; + usec_t duration_ut; +}; + +static DICTIONARY *register_result_init() { + DICTIONARY *results = dictionary_create_advanced(DICT_OPTION_SINGLE_THREADED | DICT_OPTION_FIXED_SIZE, NULL, sizeof(struct register_result)); + return results; +} + +static void register_result_destroy(DICTIONARY *results) { + dictionary_destroy(results); +} + +static void register_result(DICTIONARY *results, RRDHOST *host, RRDCONTEXT_ACQUIRED *rca, RRDINSTANCE_ACQUIRED *ria, + RRDMETRIC_ACQUIRED *rma, NETDATA_DOUBLE value, RESULT_FLAGS flags, + STORAGE_POINT *highlighted, STORAGE_POINT *baseline, WEIGHTS_STATS *stats, + bool register_zero, usec_t duration_ut) { + + if(!netdata_double_isnumber(value)) return; + + // make it positive + NETDATA_DOUBLE v = fabsndd(value); + + // no need to store zero scored values + if(unlikely(fpclassify(v) == FP_ZERO && !register_zero)) + return; + + // keep track of the max of the baseline / highlight ratio + if((flags & RESULT_IS_BASE_HIGH_RATIO) && v > stats->max_base_high_ratio) + stats->max_base_high_ratio = v; + + struct register_result t = { + .flags = flags, + .host = host, + .rca = rca, + .ria = ria, + .rma = rma, + .value = v, + .duration_ut = duration_ut, + }; + + if(highlighted) + t.highlighted = *highlighted; + + if(baseline) + t.baseline = *baseline; + + // we can use the pointer address or RMA as a unique key for each metric + char buf[20 + 1]; + ssize_t len = snprintfz(buf, sizeof(buf) - 1, "%p", rma); + dictionary_set_advanced(results, buf, len, &t, sizeof(struct register_result), NULL); +} + +// ---------------------------------------------------------------------------- +// Generation of JSON output for the results + +static void results_header_to_json(DICTIONARY *results __maybe_unused, BUFFER *wb, + time_t after, time_t before, + time_t baseline_after, time_t baseline_before, + size_t points, WEIGHTS_METHOD method, + RRDR_TIME_GROUPING group, RRDR_OPTIONS options, uint32_t shifts, + size_t examined_dimensions __maybe_unused, usec_t duration, + WEIGHTS_STATS *stats) { + + buffer_json_member_add_time_t(wb, "after", after); + buffer_json_member_add_time_t(wb, "before", before); + buffer_json_member_add_time_t(wb, "duration", before - after); + buffer_json_member_add_uint64(wb, "points", points); + + if(method == WEIGHTS_METHOD_MC_KS2 || method == WEIGHTS_METHOD_MC_VOLUME) { + buffer_json_member_add_time_t(wb, "baseline_after", baseline_after); + buffer_json_member_add_time_t(wb, "baseline_before", baseline_before); + buffer_json_member_add_time_t(wb, "baseline_duration", baseline_before - baseline_after); + buffer_json_member_add_uint64(wb, "baseline_points", points << shifts); + } + + buffer_json_member_add_object(wb, "statistics"); + { + buffer_json_member_add_double(wb, "query_time_ms", (double) duration / (double) USEC_PER_MS); + buffer_json_member_add_uint64(wb, "db_queries", stats->db_queries); + buffer_json_member_add_uint64(wb, "query_result_points", stats->result_points); + buffer_json_member_add_uint64(wb, "binary_searches", stats->binary_searches); + buffer_json_member_add_uint64(wb, "db_points_read", stats->db_points); + + buffer_json_member_add_array(wb, "db_points_per_tier"); + { + for (size_t tier = 0; tier < storage_tiers; tier++) + buffer_json_add_array_item_uint64(wb, stats->db_points_per_tier[tier]); + } + buffer_json_array_close(wb); + } + buffer_json_object_close(wb); + + buffer_json_member_add_string(wb, "group", time_grouping_tostring(group)); + buffer_json_member_add_string(wb, "method", weights_method_to_string(method)); + rrdr_options_to_buffer_json_array(wb, "options", options); +} + +static size_t registered_results_to_json_charts(DICTIONARY *results, BUFFER *wb, + time_t after, time_t before, + time_t baseline_after, time_t baseline_before, + size_t points, WEIGHTS_METHOD method, + RRDR_TIME_GROUPING group, RRDR_OPTIONS options, uint32_t shifts, + size_t examined_dimensions, usec_t duration, + WEIGHTS_STATS *stats) { + + buffer_json_initialize(wb, "\"", "\"", 0, true, (options & RRDR_OPTION_MINIFY) ? BUFFER_JSON_OPTIONS_MINIFY : BUFFER_JSON_OPTIONS_DEFAULT); + + results_header_to_json(results, wb, after, before, baseline_after, baseline_before, + points, method, group, options, shifts, examined_dimensions, duration, stats); + + buffer_json_member_add_object(wb, "correlated_charts"); + + size_t charts = 0, total_dimensions = 0; + struct register_result *t; + RRDINSTANCE_ACQUIRED *last_ria = NULL; // never access this - we use it only for comparison + dfe_start_read(results, t) { + if(t->ria != last_ria) { + last_ria = t->ria; + + if(charts) { + buffer_json_object_close(wb); // dimensions + buffer_json_object_close(wb); // chart:id + } + + buffer_json_member_add_object(wb, rrdinstance_acquired_id(t->ria)); + buffer_json_member_add_string(wb, "context", rrdcontext_acquired_id(t->rca)); + buffer_json_member_add_object(wb, "dimensions"); + charts++; + } + buffer_json_member_add_double(wb, rrdmetric_acquired_name(t->rma), t->value); + total_dimensions++; + } + dfe_done(t); + + // close dimensions and chart + if (total_dimensions) { + buffer_json_object_close(wb); // dimensions + buffer_json_object_close(wb); // chart:id + } + + buffer_json_object_close(wb); + + buffer_json_member_add_uint64(wb, "correlated_dimensions", total_dimensions); + buffer_json_member_add_uint64(wb, "total_dimensions_count", examined_dimensions); + buffer_json_finalize(wb); + + return total_dimensions; +} + +static size_t registered_results_to_json_contexts(DICTIONARY *results, BUFFER *wb, + time_t after, time_t before, + time_t baseline_after, time_t baseline_before, + size_t points, WEIGHTS_METHOD method, + RRDR_TIME_GROUPING group, RRDR_OPTIONS options, uint32_t shifts, + size_t examined_dimensions, usec_t duration, + WEIGHTS_STATS *stats) { + + buffer_json_initialize(wb, "\"", "\"", 0, true, (options & RRDR_OPTION_MINIFY) ? BUFFER_JSON_OPTIONS_MINIFY : BUFFER_JSON_OPTIONS_DEFAULT); + + results_header_to_json(results, wb, after, before, baseline_after, baseline_before, + points, method, group, options, shifts, examined_dimensions, duration, stats); + + buffer_json_member_add_object(wb, "contexts"); + + size_t contexts = 0, charts = 0, total_dimensions = 0, context_dims = 0, chart_dims = 0; + NETDATA_DOUBLE contexts_total_weight = 0.0, charts_total_weight = 0.0; + struct register_result *t; + RRDCONTEXT_ACQUIRED *last_rca = NULL; + RRDINSTANCE_ACQUIRED *last_ria = NULL; + dfe_start_read(results, t) { + + if(t->rca != last_rca) { + last_rca = t->rca; + + if(contexts) { + buffer_json_object_close(wb); // dimensions + buffer_json_member_add_double(wb, "weight", charts_total_weight / (double) chart_dims); + buffer_json_object_close(wb); // chart:id + buffer_json_object_close(wb); // charts + buffer_json_member_add_double(wb, "weight", contexts_total_weight / (double) context_dims); + buffer_json_object_close(wb); // context + } + + buffer_json_member_add_object(wb, rrdcontext_acquired_id(t->rca)); + buffer_json_member_add_object(wb, "charts"); + + contexts++; + charts = 0; + context_dims = 0; + contexts_total_weight = 0.0; + + last_ria = NULL; + } + + if(t->ria != last_ria) { + last_ria = t->ria; + + if(charts) { + buffer_json_object_close(wb); // dimensions + buffer_json_member_add_double(wb, "weight", charts_total_weight / (double) chart_dims); + buffer_json_object_close(wb); // chart:id + } + + buffer_json_member_add_object(wb, rrdinstance_acquired_id(t->ria)); + buffer_json_member_add_object(wb, "dimensions"); + + charts++; + chart_dims = 0; + charts_total_weight = 0.0; + } + + buffer_json_member_add_double(wb, rrdmetric_acquired_name(t->rma), t->value); + charts_total_weight += t->value; + contexts_total_weight += t->value; + chart_dims++; + context_dims++; + total_dimensions++; + } + dfe_done(t); + + // close dimensions and chart + if (total_dimensions) { + buffer_json_object_close(wb); // dimensions + buffer_json_member_add_double(wb, "weight", charts_total_weight / (double) chart_dims); + buffer_json_object_close(wb); // chart:id + buffer_json_object_close(wb); // charts + buffer_json_member_add_double(wb, "weight", contexts_total_weight / (double) context_dims); + buffer_json_object_close(wb); // context + } + + buffer_json_object_close(wb); + + buffer_json_member_add_uint64(wb, "correlated_dimensions", total_dimensions); + buffer_json_member_add_uint64(wb, "total_dimensions_count", examined_dimensions); + buffer_json_finalize(wb); + + return total_dimensions; +} + +struct query_weights_data { + QUERY_WEIGHTS_REQUEST *qwr; + + SIMPLE_PATTERN *scope_nodes_sp; + SIMPLE_PATTERN *scope_contexts_sp; + SIMPLE_PATTERN *nodes_sp; + SIMPLE_PATTERN *contexts_sp; + SIMPLE_PATTERN *instances_sp; + SIMPLE_PATTERN *dimensions_sp; + SIMPLE_PATTERN *labels_sp; + SIMPLE_PATTERN *alerts_sp; + + usec_t timeout_us; + bool timed_out; + bool interrupted; + + struct query_timings timings; + + size_t examined_dimensions; + bool register_zero; + + DICTIONARY *results; + WEIGHTS_STATS stats; + + uint32_t shifts; + + struct query_versions versions; +}; + +#define AGGREGATED_WEIGHT_EMPTY (struct aggregated_weight) { \ + .min = NAN, \ + .max = NAN, \ + .sum = NAN, \ + .count = 0, \ + .hsp = STORAGE_POINT_UNSET, \ + .bsp = STORAGE_POINT_UNSET, \ +} + +#define merge_into_aw(aw, t) do { \ + if(!(aw).count) { \ + (aw).count = 1; \ + (aw).min = (aw).max = (aw).sum = (t)->value; \ + (aw).hsp = (t)->highlighted; \ + if(baseline) \ + (aw).bsp = (t)->baseline; \ + } \ + else { \ + (aw).count++; \ + (aw).sum += (t)->value; \ + if((t)->value < (aw).min) \ + (aw).min = (t)->value; \ + if((t)->value > (aw).max) \ + (aw).max = (t)->value; \ + storage_point_merge_to((aw).hsp, (t)->highlighted); \ + if(baseline) \ + storage_point_merge_to((aw).bsp, (t)->baseline); \ + } \ +} while(0) + +static void results_header_to_json_v2(DICTIONARY *results __maybe_unused, BUFFER *wb, struct query_weights_data *qwd, + time_t after, time_t before, + time_t baseline_after, time_t baseline_before, + size_t points, WEIGHTS_METHOD method, + RRDR_TIME_GROUPING group, RRDR_OPTIONS options, uint32_t shifts, + size_t examined_dimensions __maybe_unused, usec_t duration __maybe_unused, + WEIGHTS_STATS *stats, bool group_by) { + + buffer_json_member_add_object(wb, "request"); + buffer_json_member_add_string(wb, "method", weights_method_to_string(method)); + rrdr_options_to_buffer_json_array(wb, "options", options); + + buffer_json_member_add_object(wb, "scope"); + buffer_json_member_add_string(wb, "scope_nodes", qwd->qwr->scope_nodes ? qwd->qwr->scope_nodes : "*"); + buffer_json_member_add_string(wb, "scope_contexts", qwd->qwr->scope_contexts ? qwd->qwr->scope_contexts : "*"); + buffer_json_object_close(wb); + + buffer_json_member_add_object(wb, "selectors"); + buffer_json_member_add_string(wb, "nodes", qwd->qwr->nodes ? qwd->qwr->nodes : "*"); + buffer_json_member_add_string(wb, "contexts", qwd->qwr->contexts ? qwd->qwr->contexts : "*"); + buffer_json_member_add_string(wb, "instances", qwd->qwr->instances ? qwd->qwr->instances : "*"); + buffer_json_member_add_string(wb, "dimensions", qwd->qwr->dimensions ? qwd->qwr->dimensions : "*"); + buffer_json_member_add_string(wb, "labels", qwd->qwr->labels ? qwd->qwr->labels : "*"); + buffer_json_member_add_string(wb, "alerts", qwd->qwr->alerts ? qwd->qwr->alerts : "*"); + buffer_json_object_close(wb); + + buffer_json_member_add_object(wb, "window"); + buffer_json_member_add_time_t(wb, "after", qwd->qwr->after); + buffer_json_member_add_time_t(wb, "before", qwd->qwr->before); + buffer_json_member_add_uint64(wb, "points", qwd->qwr->points); + if(qwd->qwr->options & RRDR_OPTION_SELECTED_TIER) + buffer_json_member_add_uint64(wb, "tier", qwd->qwr->tier); + else + buffer_json_member_add_string(wb, "tier", NULL); + buffer_json_object_close(wb); + + if(method == WEIGHTS_METHOD_MC_KS2 || method == WEIGHTS_METHOD_MC_VOLUME) { + buffer_json_member_add_object(wb, "baseline"); + buffer_json_member_add_time_t(wb, "baseline_after", qwd->qwr->baseline_after); + buffer_json_member_add_time_t(wb, "baseline_before", qwd->qwr->baseline_before); + buffer_json_object_close(wb); + } + + buffer_json_member_add_object(wb, "aggregations"); + buffer_json_member_add_object(wb, "time"); + buffer_json_member_add_string(wb, "time_group", time_grouping_tostring(qwd->qwr->time_group_method)); + buffer_json_member_add_string(wb, "time_group_options", qwd->qwr->time_group_options); + buffer_json_object_close(wb); // time + + buffer_json_member_add_array(wb, "metrics"); + buffer_json_add_array_item_object(wb); + { + buffer_json_member_add_array(wb, "group_by"); + buffer_json_group_by_to_array(wb, qwd->qwr->group_by.group_by); + buffer_json_array_close(wb); + +// buffer_json_member_add_array(wb, "group_by_label"); +// buffer_json_array_close(wb); + + buffer_json_member_add_string(wb, "aggregation", group_by_aggregate_function_to_string(qwd->qwr->group_by.aggregation)); + } + buffer_json_object_close(wb); // 1st group by + buffer_json_array_close(wb); // array + buffer_json_object_close(wb); // aggregations + + buffer_json_member_add_uint64(wb, "timeout", qwd->qwr->timeout_ms); + buffer_json_object_close(wb); // request + + buffer_json_member_add_object(wb, "view"); + buffer_json_member_add_string(wb, "format", (group_by)?"grouped":"full"); + buffer_json_member_add_string(wb, "time_group", time_grouping_tostring(group)); + + buffer_json_member_add_object(wb, "window"); + buffer_json_member_add_time_t(wb, "after", after); + buffer_json_member_add_time_t(wb, "before", before); + buffer_json_member_add_time_t(wb, "duration", before - after); + buffer_json_member_add_uint64(wb, "points", points); + buffer_json_object_close(wb); + + if(method == WEIGHTS_METHOD_MC_KS2 || method == WEIGHTS_METHOD_MC_VOLUME) { + buffer_json_member_add_object(wb, "baseline"); + buffer_json_member_add_time_t(wb, "after", baseline_after); + buffer_json_member_add_time_t(wb, "before", baseline_before); + buffer_json_member_add_time_t(wb, "duration", baseline_before - baseline_after); + buffer_json_member_add_uint64(wb, "points", points << shifts); + buffer_json_object_close(wb); + } + + buffer_json_object_close(wb); // view + + buffer_json_member_add_object(wb, "db"); + { + buffer_json_member_add_uint64(wb, "db_queries", stats->db_queries); + buffer_json_member_add_uint64(wb, "query_result_points", stats->result_points); + buffer_json_member_add_uint64(wb, "binary_searches", stats->binary_searches); + buffer_json_member_add_uint64(wb, "db_points_read", stats->db_points); + + buffer_json_member_add_array(wb, "db_points_per_tier"); + { + for (size_t tier = 0; tier < storage_tiers; tier++) + buffer_json_add_array_item_uint64(wb, stats->db_points_per_tier[tier]); + } + buffer_json_array_close(wb); + } + buffer_json_object_close(wb); // db +} + +typedef enum { + WPT_DIMENSION = 0, + WPT_INSTANCE = 1, + WPT_CONTEXT = 2, + WPT_NODE = 3, + WPT_GROUP = 4, +} WEIGHTS_POINT_TYPE; + +struct aggregated_weight { + const char *name; + NETDATA_DOUBLE min; + NETDATA_DOUBLE max; + NETDATA_DOUBLE sum; + size_t count; + STORAGE_POINT hsp; + STORAGE_POINT bsp; +}; + +static inline void storage_point_to_json(BUFFER *wb, WEIGHTS_POINT_TYPE type, ssize_t di, ssize_t ii, ssize_t ci, ssize_t ni, struct aggregated_weight *aw, RRDR_OPTIONS options __maybe_unused, bool baseline) { + if(type != WPT_GROUP) { + buffer_json_add_array_item_array(wb); + buffer_json_add_array_item_uint64(wb, type); // "type" + buffer_json_add_array_item_int64(wb, ni); + if (type != WPT_NODE) { + buffer_json_add_array_item_int64(wb, ci); + if (type != WPT_CONTEXT) { + buffer_json_add_array_item_int64(wb, ii); + if (type != WPT_INSTANCE) + buffer_json_add_array_item_int64(wb, di); + else + buffer_json_add_array_item_string(wb, NULL); + } + else { + buffer_json_add_array_item_string(wb, NULL); + buffer_json_add_array_item_string(wb, NULL); + } + } + else { + buffer_json_add_array_item_string(wb, NULL); + buffer_json_add_array_item_string(wb, NULL); + buffer_json_add_array_item_string(wb, NULL); + } + buffer_json_add_array_item_double(wb, (aw->count) ? aw->sum / (NETDATA_DOUBLE)aw->count : 0.0); // "weight" + } + else { + buffer_json_member_add_array(wb, "v"); + buffer_json_add_array_item_array(wb); + buffer_json_add_array_item_double(wb, aw->min); // "min" + buffer_json_add_array_item_double(wb, (aw->count) ? aw->sum / (NETDATA_DOUBLE)aw->count : 0.0); // "avg" + buffer_json_add_array_item_double(wb, aw->max); // "max" + buffer_json_add_array_item_double(wb, aw->sum); // "sum" + buffer_json_add_array_item_uint64(wb, aw->count); // "count" + buffer_json_array_close(wb); + } + + buffer_json_add_array_item_array(wb); + buffer_json_add_array_item_double(wb, aw->hsp.min); // "min" + buffer_json_add_array_item_double(wb, (aw->hsp.count) ? aw->hsp.sum / (NETDATA_DOUBLE) aw->hsp.count : 0.0); // "avg" + buffer_json_add_array_item_double(wb, aw->hsp.max); // "max" + buffer_json_add_array_item_double(wb, aw->hsp.sum); // "sum" + buffer_json_add_array_item_uint64(wb, aw->hsp.count); // "count" + buffer_json_add_array_item_uint64(wb, aw->hsp.anomaly_count); // "anomaly_count" + buffer_json_array_close(wb); + + if(baseline) { + buffer_json_add_array_item_array(wb); + buffer_json_add_array_item_double(wb, aw->bsp.min); // "min" + buffer_json_add_array_item_double(wb, (aw->bsp.count) ? aw->bsp.sum / (NETDATA_DOUBLE) aw->bsp.count : 0.0); // "avg" + buffer_json_add_array_item_double(wb, aw->bsp.max); // "max" + buffer_json_add_array_item_double(wb, aw->bsp.sum); // "sum" + buffer_json_add_array_item_uint64(wb, aw->bsp.count); // "count" + buffer_json_add_array_item_uint64(wb, aw->bsp.anomaly_count); // "anomaly_count" + buffer_json_array_close(wb); + } + + buffer_json_array_close(wb); +} + +static void multinode_data_schema(BUFFER *wb, RRDR_OPTIONS options __maybe_unused, const char *key, bool baseline, bool group_by) { + buffer_json_member_add_object(wb, key); // schema + + buffer_json_member_add_string(wb, "type", "array"); + buffer_json_member_add_array(wb, "items"); + + if(group_by) { + buffer_json_add_array_item_object(wb); + { + buffer_json_member_add_string(wb, "name", "weight"); + buffer_json_member_add_string(wb, "type", "array"); + buffer_json_member_add_array(wb, "labels"); + { + buffer_json_add_array_item_string(wb, "min"); + buffer_json_add_array_item_string(wb, "avg"); + buffer_json_add_array_item_string(wb, "max"); + buffer_json_add_array_item_string(wb, "sum"); + buffer_json_add_array_item_string(wb, "count"); + } + buffer_json_array_close(wb); + } + buffer_json_object_close(wb); + } + else { + buffer_json_add_array_item_object(wb); + buffer_json_member_add_string(wb, "name", "row_type"); + buffer_json_member_add_string(wb, "type", "integer"); + buffer_json_member_add_array(wb, "value"); + buffer_json_add_array_item_string(wb, "dimension"); + buffer_json_add_array_item_string(wb, "instance"); + buffer_json_add_array_item_string(wb, "context"); + buffer_json_add_array_item_string(wb, "node"); + buffer_json_array_close(wb); + buffer_json_object_close(wb); + + buffer_json_add_array_item_object(wb); + { + buffer_json_member_add_string(wb, "name", "ni"); + buffer_json_member_add_string(wb, "type", "integer"); + buffer_json_member_add_string(wb, "dictionary", "nodes"); + } + buffer_json_object_close(wb); + + buffer_json_add_array_item_object(wb); + { + buffer_json_member_add_string(wb, "name", "ci"); + buffer_json_member_add_string(wb, "type", "integer"); + buffer_json_member_add_string(wb, "dictionary", "contexts"); + } + buffer_json_object_close(wb); + + buffer_json_add_array_item_object(wb); + { + buffer_json_member_add_string(wb, "name", "ii"); + buffer_json_member_add_string(wb, "type", "integer"); + buffer_json_member_add_string(wb, "dictionary", "instances"); + } + buffer_json_object_close(wb); + + buffer_json_add_array_item_object(wb); + { + buffer_json_member_add_string(wb, "name", "di"); + buffer_json_member_add_string(wb, "type", "integer"); + buffer_json_member_add_string(wb, "dictionary", "dimensions"); + } + buffer_json_object_close(wb); + + buffer_json_add_array_item_object(wb); + { + buffer_json_member_add_string(wb, "name", "weight"); + buffer_json_member_add_string(wb, "type", "number"); + } + buffer_json_object_close(wb); + } + + buffer_json_add_array_item_object(wb); + { + buffer_json_member_add_string(wb, "name", "timeframe"); + buffer_json_member_add_string(wb, "type", "array"); + buffer_json_member_add_array(wb, "labels"); + { + buffer_json_add_array_item_string(wb, "min"); + buffer_json_add_array_item_string(wb, "avg"); + buffer_json_add_array_item_string(wb, "max"); + buffer_json_add_array_item_string(wb, "sum"); + buffer_json_add_array_item_string(wb, "count"); + buffer_json_add_array_item_string(wb, "anomaly_count"); + } + buffer_json_array_close(wb); + buffer_json_member_add_object(wb, "calculations"); + buffer_json_member_add_string(wb, "anomaly rate", "anomaly_count * 100 / count"); + buffer_json_object_close(wb); + } + buffer_json_object_close(wb); + + if(baseline) { + buffer_json_add_array_item_object(wb); + { + buffer_json_member_add_string(wb, "name", "baseline timeframe"); + buffer_json_member_add_string(wb, "type", "array"); + buffer_json_member_add_array(wb, "labels"); + { + buffer_json_add_array_item_string(wb, "min"); + buffer_json_add_array_item_string(wb, "avg"); + buffer_json_add_array_item_string(wb, "max"); + buffer_json_add_array_item_string(wb, "sum"); + buffer_json_add_array_item_string(wb, "count"); + buffer_json_add_array_item_string(wb, "anomaly_count"); + } + buffer_json_array_close(wb); + buffer_json_member_add_object(wb, "calculations"); + buffer_json_member_add_string(wb, "anomaly rate", "anomaly_count * 100 / count"); + buffer_json_object_close(wb); + } + buffer_json_object_close(wb); + } + + buffer_json_array_close(wb); // items + buffer_json_object_close(wb); // schema +} + +struct dict_unique_node { + bool existing; + bool exposed; + uint32_t i; + RRDHOST *host; + usec_t duration_ut; +}; + +struct dict_unique_name_units { + bool existing; + bool exposed; + uint32_t i; + const char *units; +}; + +struct dict_unique_id_name { + bool existing; + bool exposed; + uint32_t i; + const char *id; + const char *name; +}; + +static inline struct dict_unique_node *dict_unique_node_add(DICTIONARY *dict, RRDHOST *host, ssize_t *max_id) { + struct dict_unique_node *dun = dictionary_set(dict, host->machine_guid, NULL, sizeof(struct dict_unique_node)); + if(!dun->existing) { + dun->existing = true; + dun->host = host; + dun->i = *max_id; + (*max_id)++; + } + + return dun; +} + +static inline struct dict_unique_name_units *dict_unique_name_units_add(DICTIONARY *dict, const char *name, const char *units, ssize_t *max_id) { + struct dict_unique_name_units *dun = dictionary_set(dict, name, NULL, sizeof(struct dict_unique_name_units)); + if(!dun->existing) { + dun->units = units; + dun->existing = true; + dun->i = *max_id; + (*max_id)++; + } + + return dun; +} + +static inline struct dict_unique_id_name *dict_unique_id_name_add(DICTIONARY *dict, const char *id, const char *name, ssize_t *max_id) { + char key[1024 + 1]; + snprintfz(key, sizeof(key) - 1, "%s:%s", id, name); + struct dict_unique_id_name *dun = dictionary_set(dict, key, NULL, sizeof(struct dict_unique_id_name)); + if(!dun->existing) { + dun->existing = true; + dun->i = *max_id; + (*max_id)++; + dun->id = id; + dun->name = name; + } + + return dun; +} + +static size_t registered_results_to_json_multinode_no_group_by( + DICTIONARY *results, BUFFER *wb, + time_t after, time_t before, + time_t baseline_after, time_t baseline_before, + size_t points, WEIGHTS_METHOD method, + RRDR_TIME_GROUPING group, RRDR_OPTIONS options, uint32_t shifts, + size_t examined_dimensions, struct query_weights_data *qwd, + WEIGHTS_STATS *stats, + struct query_versions *versions) { + buffer_json_initialize(wb, "\"", "\"", 0, true, (options & RRDR_OPTION_MINIFY) ? BUFFER_JSON_OPTIONS_MINIFY : BUFFER_JSON_OPTIONS_DEFAULT); + buffer_json_member_add_uint64(wb, "api", 2); + + results_header_to_json_v2(results, wb, qwd, after, before, baseline_after, baseline_before, + points, method, group, options, shifts, examined_dimensions, + qwd->timings.executed_ut - qwd->timings.received_ut, stats, false); + + version_hashes_api_v2(wb, versions); + + bool baseline = method == WEIGHTS_METHOD_MC_KS2 || method == WEIGHTS_METHOD_MC_VOLUME; + multinode_data_schema(wb, options, "schema", baseline, false); + + DICTIONARY *dict_nodes = dictionary_create_advanced(DICT_OPTION_SINGLE_THREADED | DICT_OPTION_DONT_OVERWRITE_VALUE | DICT_OPTION_FIXED_SIZE, NULL, sizeof(struct dict_unique_node)); + DICTIONARY *dict_contexts = dictionary_create_advanced(DICT_OPTION_SINGLE_THREADED | DICT_OPTION_DONT_OVERWRITE_VALUE | DICT_OPTION_FIXED_SIZE, NULL, sizeof(struct dict_unique_name_units)); + DICTIONARY *dict_instances = dictionary_create_advanced(DICT_OPTION_SINGLE_THREADED | DICT_OPTION_DONT_OVERWRITE_VALUE | DICT_OPTION_FIXED_SIZE, NULL, sizeof(struct dict_unique_id_name)); + DICTIONARY *dict_dimensions = dictionary_create_advanced(DICT_OPTION_SINGLE_THREADED | DICT_OPTION_DONT_OVERWRITE_VALUE | DICT_OPTION_FIXED_SIZE, NULL, sizeof(struct dict_unique_id_name)); + + buffer_json_member_add_array(wb, "result"); + + struct aggregated_weight node_aw = AGGREGATED_WEIGHT_EMPTY, context_aw = AGGREGATED_WEIGHT_EMPTY, instance_aw = AGGREGATED_WEIGHT_EMPTY; + struct register_result *t; + RRDHOST *last_host = NULL; + RRDCONTEXT_ACQUIRED *last_rca = NULL; + RRDINSTANCE_ACQUIRED *last_ria = NULL; + struct dict_unique_name_units *context_dun = NULL; + struct dict_unique_node *node_dun = NULL; + struct dict_unique_id_name *instance_dun = NULL; + struct dict_unique_id_name *dimension_dun = NULL; + ssize_t di = -1, ii = -1, ci = -1, ni = -1; + ssize_t di_max = 0, ii_max = 0, ci_max = 0, ni_max = 0; + size_t total_dimensions = 0; + dfe_start_read(results, t) { + + // close instance + if(t->ria != last_ria && last_ria) { + storage_point_to_json(wb, WPT_INSTANCE, di, ii, ci, ni, &instance_aw, options, baseline); + instance_dun->exposed = true; + last_ria = NULL; + instance_aw = AGGREGATED_WEIGHT_EMPTY; + } + + // close context + if(t->rca != last_rca && last_rca) { + storage_point_to_json(wb, WPT_CONTEXT, di, ii, ci, ni, &context_aw, options, baseline); + context_dun->exposed = true; + last_rca = NULL; + context_aw = AGGREGATED_WEIGHT_EMPTY; + } + + // close node + if(t->host != last_host && last_host) { + storage_point_to_json(wb, WPT_NODE, di, ii, ci, ni, &node_aw, options, baseline); + node_dun->exposed = true; + last_host = NULL; + node_aw = AGGREGATED_WEIGHT_EMPTY; + } + + // open node + if(t->host != last_host) { + last_host = t->host; + node_dun = dict_unique_node_add(dict_nodes, t->host, &ni_max); + ni = node_dun->i; + } + + // open context + if(t->rca != last_rca) { + last_rca = t->rca; + context_dun = dict_unique_name_units_add(dict_contexts, rrdcontext_acquired_id(t->rca), + rrdcontext_acquired_units(t->rca), &ci_max); + ci = context_dun->i; + } + + // open instance + if(t->ria != last_ria) { + last_ria = t->ria; + instance_dun = dict_unique_id_name_add(dict_instances, rrdinstance_acquired_id(t->ria), rrdinstance_acquired_name(t->ria), &ii_max); + ii = instance_dun->i; + } + + dimension_dun = dict_unique_id_name_add(dict_dimensions, rrdmetric_acquired_id(t->rma), rrdmetric_acquired_name(t->rma), &di_max); + di = dimension_dun->i; + + struct aggregated_weight aw = { + .min = t->value, + .max = t->value, + .sum = t->value, + .count = 1, + .hsp = t->highlighted, + .bsp = t->baseline, + }; + + storage_point_to_json(wb, WPT_DIMENSION, di, ii, ci, ni, &aw, options, baseline); + node_dun->exposed = true; + context_dun->exposed = true; + instance_dun->exposed = true; + dimension_dun->exposed = true; + + merge_into_aw(instance_aw, t); + merge_into_aw(context_aw, t); + merge_into_aw(node_aw, t); + + node_dun->duration_ut += t->duration_ut; + total_dimensions++; + } + dfe_done(t); + + // close instance + if(last_ria) { + storage_point_to_json(wb, WPT_INSTANCE, di, ii, ci, ni, &instance_aw, options, baseline); + instance_dun->exposed = true; + } + + // close context + if(last_rca) { + storage_point_to_json(wb, WPT_CONTEXT, di, ii, ci, ni, &context_aw, options, baseline); + context_dun->exposed = true; + } + + // close node + if(last_host) { + storage_point_to_json(wb, WPT_NODE, di, ii, ci, ni, &node_aw, options, baseline); + node_dun->exposed = true; + } + + buffer_json_array_close(wb); // points + + buffer_json_member_add_object(wb, "dictionaries"); + buffer_json_member_add_array(wb, "nodes"); + { + struct dict_unique_node *dun; + dfe_start_read(dict_nodes, dun) { + if(!dun->exposed) + continue; + + buffer_json_add_array_item_object(wb); + buffer_json_node_add_v2(wb, dun->host, dun->i, dun->duration_ut, true); + buffer_json_object_close(wb); + } + dfe_done(dun); + } + buffer_json_array_close(wb); + + buffer_json_member_add_array(wb, "contexts"); + { + struct dict_unique_name_units *dun; + dfe_start_read(dict_contexts, dun) { + if(!dun->exposed) + continue; + + buffer_json_add_array_item_object(wb); + buffer_json_member_add_string(wb, "id", dun_dfe.name); + buffer_json_member_add_string(wb, "units", dun->units); + buffer_json_member_add_int64(wb, "ci", dun->i); + buffer_json_object_close(wb); + } + dfe_done(dun); + } + buffer_json_array_close(wb); + + buffer_json_member_add_array(wb, "instances"); + { + struct dict_unique_id_name *dun; + dfe_start_read(dict_instances, dun) { + if(!dun->exposed) + continue; + + buffer_json_add_array_item_object(wb); + buffer_json_member_add_string(wb, "id", dun->id); + if(dun->id != dun->name) + buffer_json_member_add_string(wb, "nm", dun->name); + buffer_json_member_add_int64(wb, "ii", dun->i); + buffer_json_object_close(wb); + } + dfe_done(dun); + } + buffer_json_array_close(wb); + + buffer_json_member_add_array(wb, "dimensions"); + { + struct dict_unique_id_name *dun; + dfe_start_read(dict_dimensions, dun) { + if(!dun->exposed) + continue; + + buffer_json_add_array_item_object(wb); + buffer_json_member_add_string(wb, "id", dun->id); + if(dun->id != dun->name) + buffer_json_member_add_string(wb, "nm", dun->name); + buffer_json_member_add_int64(wb, "di", dun->i); + buffer_json_object_close(wb); + } + dfe_done(dun); + } + buffer_json_array_close(wb); + + buffer_json_object_close(wb); //dictionaries + + buffer_json_agents_v2(wb, &qwd->timings, 0, false, true); + buffer_json_member_add_uint64(wb, "correlated_dimensions", total_dimensions); + buffer_json_member_add_uint64(wb, "total_dimensions_count", examined_dimensions); + buffer_json_finalize(wb); + + dictionary_destroy(dict_nodes); + dictionary_destroy(dict_contexts); + dictionary_destroy(dict_instances); + dictionary_destroy(dict_dimensions); + + return total_dimensions; +} + +static size_t registered_results_to_json_multinode_group_by( + DICTIONARY *results, BUFFER *wb, + time_t after, time_t before, + time_t baseline_after, time_t baseline_before, + size_t points, WEIGHTS_METHOD method, + RRDR_TIME_GROUPING group, RRDR_OPTIONS options, uint32_t shifts, + size_t examined_dimensions, struct query_weights_data *qwd, + WEIGHTS_STATS *stats, + struct query_versions *versions) { + buffer_json_initialize(wb, "\"", "\"", 0, true, (options & RRDR_OPTION_MINIFY) ? BUFFER_JSON_OPTIONS_MINIFY : BUFFER_JSON_OPTIONS_DEFAULT); + buffer_json_member_add_uint64(wb, "api", 2); + + results_header_to_json_v2(results, wb, qwd, after, before, baseline_after, baseline_before, + points, method, group, options, shifts, examined_dimensions, + qwd->timings.executed_ut - qwd->timings.received_ut, stats, true); + + version_hashes_api_v2(wb, versions); + + bool baseline = method == WEIGHTS_METHOD_MC_KS2 || method == WEIGHTS_METHOD_MC_VOLUME; + multinode_data_schema(wb, options, "v_schema", baseline, true); + + DICTIONARY *group_by = dictionary_create_advanced(DICT_OPTION_SINGLE_THREADED | DICT_OPTION_DONT_OVERWRITE_VALUE | DICT_OPTION_FIXED_SIZE, + NULL, sizeof(struct aggregated_weight)); + + struct register_result *t; + size_t total_dimensions = 0; + BUFFER *key = buffer_create(0, NULL); + BUFFER *name = buffer_create(0, NULL); + dfe_start_read(results, t) { + + buffer_flush(key); + buffer_flush(name); + + if(qwd->qwr->group_by.group_by & RRDR_GROUP_BY_DIMENSION) { + buffer_strcat(key, rrdmetric_acquired_name(t->rma)); + buffer_strcat(name, rrdmetric_acquired_name(t->rma)); + } + if(qwd->qwr->group_by.group_by & RRDR_GROUP_BY_INSTANCE) { + if(buffer_strlen(key)) { + buffer_fast_strcat(key, ",", 1); + buffer_fast_strcat(name, ",", 1); + } + + buffer_strcat(key, rrdinstance_acquired_id(t->ria)); + buffer_strcat(name, rrdinstance_acquired_name(t->ria)); + + if(!(qwd->qwr->group_by.group_by & RRDR_GROUP_BY_NODE)) { + buffer_fast_strcat(key, "@", 1); + buffer_fast_strcat(name, "@", 1); + buffer_strcat(key, t->host->machine_guid); + buffer_strcat(name, rrdhost_hostname(t->host)); + } + } + if(qwd->qwr->group_by.group_by & RRDR_GROUP_BY_NODE) { + if(buffer_strlen(key)) { + buffer_fast_strcat(key, ",", 1); + buffer_fast_strcat(name, ",", 1); + } + + buffer_strcat(key, t->host->machine_guid); + buffer_strcat(name, rrdhost_hostname(t->host)); + } + if(qwd->qwr->group_by.group_by & RRDR_GROUP_BY_CONTEXT) { + if(buffer_strlen(key)) { + buffer_fast_strcat(key, ",", 1); + buffer_fast_strcat(name, ",", 1); + } + + buffer_strcat(key, rrdcontext_acquired_id(t->rca)); + buffer_strcat(name, rrdcontext_acquired_id(t->rca)); + } + if(qwd->qwr->group_by.group_by & RRDR_GROUP_BY_UNITS) { + if(buffer_strlen(key)) { + buffer_fast_strcat(key, ",", 1); + buffer_fast_strcat(name, ",", 1); + } + + buffer_strcat(key, rrdcontext_acquired_units(t->rca)); + buffer_strcat(name, rrdcontext_acquired_units(t->rca)); + } + + struct aggregated_weight *aw = dictionary_set(group_by, buffer_tostring(key), NULL, sizeof(struct aggregated_weight)); + if(!aw->name) { + aw->name = strdupz(buffer_tostring(name)); + aw->min = aw->max = aw->sum = t->value; + aw->count = 1; + aw->hsp = t->highlighted; + aw->bsp = t->baseline; + } + else + merge_into_aw(*aw, t); + + total_dimensions++; + } + dfe_done(t); + buffer_free(key); key = NULL; + buffer_free(name); name = NULL; + + struct aggregated_weight *aw; + buffer_json_member_add_array(wb, "result"); + dfe_start_read(group_by, aw) { + const char *k = aw_dfe.name; + const char *n = aw->name; + + buffer_json_add_array_item_object(wb); + buffer_json_member_add_string(wb, "id", k); + + if(strcmp(k, n) != 0) + buffer_json_member_add_string(wb, "nm", n); + + storage_point_to_json(wb, WPT_GROUP, 0, 0, 0, 0, aw, options, baseline); + buffer_json_object_close(wb); + + freez((void *)aw->name); + } + dfe_done(aw); + buffer_json_array_close(wb); // result + + buffer_json_agents_v2(wb, &qwd->timings, 0, false, true); + buffer_json_member_add_uint64(wb, "correlated_dimensions", total_dimensions); + buffer_json_member_add_uint64(wb, "total_dimensions_count", examined_dimensions); + buffer_json_finalize(wb); + + dictionary_destroy(group_by); + + return total_dimensions; +} + +// ---------------------------------------------------------------------------- +// KS2 algorithm functions + +typedef long int DIFFS_NUMBERS; +#define DOUBLE_TO_INT_MULTIPLIER 100000 + +static inline int binary_search_bigger_than(const DIFFS_NUMBERS arr[], int left, int size, DIFFS_NUMBERS K) { + // binary search to find the index the smallest index + // of the first value in the array that is greater than K + + int right = size; + while(left < right) { + int middle = (int)(((unsigned int)(left + right)) >> 1); + + if(arr[middle] > K) + right = middle; + + else + left = middle + 1; + } + + return left; +} + +int compare_diffs(const void *left, const void *right) { + DIFFS_NUMBERS lt = *(DIFFS_NUMBERS *)left; + DIFFS_NUMBERS rt = *(DIFFS_NUMBERS *)right; + + // https://stackoverflow.com/a/3886497/1114110 + return (lt > rt) - (lt < rt); +} + +static size_t calculate_pairs_diff(DIFFS_NUMBERS *diffs, NETDATA_DOUBLE *arr, size_t size) { + NETDATA_DOUBLE *last = &arr[size - 1]; + size_t added = 0; + + while(last > arr) { + NETDATA_DOUBLE second = *last--; + NETDATA_DOUBLE first = *last; + *diffs++ = (DIFFS_NUMBERS)((first - second) * (NETDATA_DOUBLE)DOUBLE_TO_INT_MULTIPLIER); + added++; + } + + return added; +} + +static double ks_2samp( + DIFFS_NUMBERS baseline_diffs[], int base_size, + DIFFS_NUMBERS highlight_diffs[], int high_size, + uint32_t base_shifts) { + + qsort(baseline_diffs, base_size, sizeof(DIFFS_NUMBERS), compare_diffs); + qsort(highlight_diffs, high_size, sizeof(DIFFS_NUMBERS), compare_diffs); + + // Now we should be calculating this: + // + // For each number in the diffs arrays, we should find the index of the + // number bigger than them in both arrays and calculate the % of this index + // vs the total array size. Once we have the 2 percentages, we should find + // the min and max across the delta of all of them. + // + // It should look like this: + // + // base_pcent = binary_search_bigger_than(...) / base_size; + // high_pcent = binary_search_bigger_than(...) / high_size; + // delta = base_pcent - high_pcent; + // if(delta < min) min = delta; + // if(delta > max) max = delta; + // + // This would require a lot of multiplications and divisions. + // + // To speed it up, we do the binary search to find the index of each number + // but, then we divide the base index by the power of two number (shifts) it + // is bigger than high index. So the 2 indexes are now comparable. + // We also keep track of the original indexes with min and max, to properly + // calculate their percentages once the loops finish. + + + // initialize min and max using the first number of baseline_diffs + DIFFS_NUMBERS K = baseline_diffs[0]; + int base_idx = binary_search_bigger_than(baseline_diffs, 1, base_size, K); + int high_idx = binary_search_bigger_than(highlight_diffs, 0, high_size, K); + int delta = base_idx - (high_idx << base_shifts); + int min = delta, max = delta; + int base_min_idx = base_idx; + int base_max_idx = base_idx; + int high_min_idx = high_idx; + int high_max_idx = high_idx; + + // do the baseline_diffs starting from 1 (we did position 0 above) + for(int i = 1; i < base_size; i++) { + K = baseline_diffs[i]; + base_idx = binary_search_bigger_than(baseline_diffs, i + 1, base_size, K); // starting from i, since data1 is sorted + high_idx = binary_search_bigger_than(highlight_diffs, 0, high_size, K); + + delta = base_idx - (high_idx << base_shifts); + if(delta < min) { + min = delta; + base_min_idx = base_idx; + high_min_idx = high_idx; + } + else if(delta > max) { + max = delta; + base_max_idx = base_idx; + high_max_idx = high_idx; + } + } + + // do the highlight_diffs starting from 0 + for(int i = 0; i < high_size; i++) { + K = highlight_diffs[i]; + base_idx = binary_search_bigger_than(baseline_diffs, 0, base_size, K); + high_idx = binary_search_bigger_than(highlight_diffs, i + 1, high_size, K); // starting from i, since data2 is sorted + + delta = base_idx - (high_idx << base_shifts); + if(delta < min) { + min = delta; + base_min_idx = base_idx; + high_min_idx = high_idx; + } + else if(delta > max) { + max = delta; + base_max_idx = base_idx; + high_max_idx = high_idx; + } + } + + // now we have the min, max and their indexes + // properly calculate min and max as dmin and dmax + double dbase_size = (double)base_size; + double dhigh_size = (double)high_size; + double dmin = ((double)base_min_idx / dbase_size) - ((double)high_min_idx / dhigh_size); + double dmax = ((double)base_max_idx / dbase_size) - ((double)high_max_idx / dhigh_size); + + dmin = -dmin; + if(islessequal(dmin, 0.0)) dmin = 0.0; + else if(isgreaterequal(dmin, 1.0)) dmin = 1.0; + + double d; + if(isgreaterequal(dmin, dmax)) d = dmin; + else d = dmax; + + double en = round(dbase_size * dhigh_size / (dbase_size + dhigh_size)); + + // under these conditions, KSfbar() crashes + if(unlikely(isnan(en) || isinf(en) || en == 0.0 || isnan(d) || isinf(d))) + return NAN; + + return KSfbar((int)en, d); +} + +static double kstwo( + NETDATA_DOUBLE baseline[], int baseline_points, + NETDATA_DOUBLE highlight[], int highlight_points, + uint32_t base_shifts) { + + // -1 in size, since the calculate_pairs_diffs() returns one less point + DIFFS_NUMBERS baseline_diffs[baseline_points - 1]; + DIFFS_NUMBERS highlight_diffs[highlight_points - 1]; + + int base_size = (int)calculate_pairs_diff(baseline_diffs, baseline, baseline_points); + int high_size = (int)calculate_pairs_diff(highlight_diffs, highlight, highlight_points); + + if(unlikely(!base_size || !high_size)) + return NAN; + + if(unlikely(base_size != baseline_points - 1 || high_size != highlight_points - 1)) { + netdata_log_error("Metric correlations: internal error - calculate_pairs_diff() returns the wrong number of entries"); + return NAN; + } + + return ks_2samp(baseline_diffs, base_size, highlight_diffs, high_size, base_shifts); +} + +NETDATA_DOUBLE *rrd2rrdr_ks2( + ONEWAYALLOC *owa, RRDHOST *host, + RRDCONTEXT_ACQUIRED *rca, RRDINSTANCE_ACQUIRED *ria, RRDMETRIC_ACQUIRED *rma, + time_t after, time_t before, size_t points, RRDR_OPTIONS options, + RRDR_TIME_GROUPING time_group_method, const char *time_group_options, size_t tier, + WEIGHTS_STATS *stats, + size_t *entries, + STORAGE_POINT *sp + ) { + + NETDATA_DOUBLE *ret = NULL; + + QUERY_TARGET_REQUEST qtr = { + .version = 1, + .host = host, + .rca = rca, + .ria = ria, + .rma = rma, + .after = after, + .before = before, + .points = points, + .options = options, + .time_group_method = time_group_method, + .time_group_options = time_group_options, + .tier = tier, + .query_source = QUERY_SOURCE_API_WEIGHTS, + .priority = STORAGE_PRIORITY_SYNCHRONOUS, + }; + + QUERY_TARGET *qt = query_target_create(&qtr); + RRDR *r = rrd2rrdr(owa, qt); + if(!r) + goto cleanup; + + stats->db_queries++; + stats->result_points += r->stats.result_points_generated; + stats->db_points += r->stats.db_points_read; + for(size_t tr = 0; tr < storage_tiers ; tr++) + stats->db_points_per_tier[tr] += r->internal.qt->db.tiers[tr].points; + + if(r->d != 1 || r->internal.qt->query.used != 1) { + netdata_log_error("WEIGHTS: on query '%s' expected 1 dimension in RRDR but got %zu r->d and %zu qt->query.used", + r->internal.qt->id, r->d, (size_t)r->internal.qt->query.used); + goto cleanup; + } + + if(unlikely(r->od[0] & RRDR_DIMENSION_HIDDEN)) + goto cleanup; + + if(unlikely(!(r->od[0] & RRDR_DIMENSION_QUERIED))) + goto cleanup; + + if(unlikely(!(r->od[0] & RRDR_DIMENSION_NONZERO))) + goto cleanup; + + if(rrdr_rows(r) < 2) + goto cleanup; + + *entries = rrdr_rows(r); + ret = onewayalloc_mallocz(owa, sizeof(NETDATA_DOUBLE) * rrdr_rows(r)); + + if(sp) + *sp = r->internal.qt->query.array[0].query_points; + + // copy the points of the dimension to a contiguous array + // there is no need to check for empty values, since empty values are already zero + // https://github.com/netdata/netdata/blob/6e3144683a73a2024d51425b20ecfd569034c858/web/api/queries/average/average.c#L41-L43 + memcpy(ret, r->v, rrdr_rows(r) * sizeof(NETDATA_DOUBLE)); + +cleanup: + rrdr_free(owa, r); + query_target_release(qt); + return ret; +} + +static void rrdset_metric_correlations_ks2( + RRDHOST *host, + RRDCONTEXT_ACQUIRED *rca, RRDINSTANCE_ACQUIRED *ria, RRDMETRIC_ACQUIRED *rma, + DICTIONARY *results, + time_t baseline_after, time_t baseline_before, + time_t after, time_t before, + size_t points, RRDR_OPTIONS options, + RRDR_TIME_GROUPING time_group_method, const char *time_group_options, size_t tier, + uint32_t shifts, + WEIGHTS_STATS *stats, bool register_zero + ) { + + options |= RRDR_OPTION_NATURAL_POINTS; + + usec_t started_ut = now_monotonic_usec(); + ONEWAYALLOC *owa = onewayalloc_create(16 * 1024); + + size_t high_points = 0; + STORAGE_POINT highlighted_sp; + NETDATA_DOUBLE *highlight = rrd2rrdr_ks2( + owa, host, rca, ria, rma, after, before, points, + options, time_group_method, time_group_options, tier, stats, &high_points, &highlighted_sp); + + if(!highlight) + goto cleanup; + + size_t base_points = 0; + STORAGE_POINT baseline_sp; + NETDATA_DOUBLE *baseline = rrd2rrdr_ks2( + owa, host, rca, ria, rma, baseline_after, baseline_before, high_points << shifts, + options, time_group_method, time_group_options, tier, stats, &base_points, &baseline_sp); + + if(!baseline) + goto cleanup; + + stats->binary_searches += 2 * (base_points - 1) + 2 * (high_points - 1); + + double prob = kstwo(baseline, (int)base_points, highlight, (int)high_points, shifts); + if(!isnan(prob) && !isinf(prob)) { + + // these conditions should never happen, but still let's check + if(unlikely(prob < 0.0)) { + netdata_log_error("Metric correlations: kstwo() returned a negative number: %f", prob); + prob = -prob; + } + if(unlikely(prob > 1.0)) { + netdata_log_error("Metric correlations: kstwo() returned a number above 1.0: %f", prob); + prob = 1.0; + } + + usec_t ended_ut = now_monotonic_usec(); + + // to spread the results evenly, 0.0 needs to be the less correlated and 1.0 the most correlated + // so, we flip the result of kstwo() + register_result(results, host, rca, ria, rma, 1.0 - prob, RESULT_IS_BASE_HIGH_RATIO, &highlighted_sp, + &baseline_sp, stats, register_zero, ended_ut - started_ut); + } + +cleanup: + onewayalloc_destroy(owa); +} + +// ---------------------------------------------------------------------------- +// VOLUME algorithm functions + +static void merge_query_value_to_stats(QUERY_VALUE *qv, WEIGHTS_STATS *stats, size_t queries) { + stats->db_queries += queries; + stats->result_points += qv->result_points; + stats->db_points += qv->points_read; + for(size_t tier = 0; tier < storage_tiers ; tier++) + stats->db_points_per_tier[tier] += qv->storage_points_per_tier[tier]; +} + +static void rrdset_metric_correlations_volume( + RRDHOST *host, + RRDCONTEXT_ACQUIRED *rca, RRDINSTANCE_ACQUIRED *ria, RRDMETRIC_ACQUIRED *rma, + DICTIONARY *results, + time_t baseline_after, time_t baseline_before, + time_t after, time_t before, + RRDR_OPTIONS options, RRDR_TIME_GROUPING time_group_method, const char *time_group_options, + size_t tier, + WEIGHTS_STATS *stats, bool register_zero) { + + options |= RRDR_OPTION_MATCH_IDS | RRDR_OPTION_ABSOLUTE | RRDR_OPTION_NATURAL_POINTS; + + QUERY_VALUE baseline_average = rrdmetric2value(host, rca, ria, rma, baseline_after, baseline_before, + options, time_group_method, time_group_options, tier, 0, + QUERY_SOURCE_API_WEIGHTS, STORAGE_PRIORITY_SYNCHRONOUS); + merge_query_value_to_stats(&baseline_average, stats, 1); + + if(!netdata_double_isnumber(baseline_average.value)) { + // this means no data for the baseline window, but we may have data for the highlighted one - assume zero + baseline_average.value = 0.0; + } + + QUERY_VALUE highlight_average = rrdmetric2value(host, rca, ria, rma, after, before, + options, time_group_method, time_group_options, tier, 0, + QUERY_SOURCE_API_WEIGHTS, STORAGE_PRIORITY_SYNCHRONOUS); + merge_query_value_to_stats(&highlight_average, stats, 1); + + if(!netdata_double_isnumber(highlight_average.value)) + return; + + if(baseline_average.value == highlight_average.value) { + // they are the same - let's move on + return; + } + + if((options & RRDR_OPTION_ANOMALY_BIT) && highlight_average.value < baseline_average.value) { + // when working on anomaly bits, we are looking for an increase in the anomaly rate + return; + } + + char highlight_countif_options[50 + 1]; + snprintfz(highlight_countif_options, 50, "%s" NETDATA_DOUBLE_FORMAT, highlight_average.value < baseline_average.value ? "<" : ">", baseline_average.value); + QUERY_VALUE highlight_countif = rrdmetric2value(host, rca, ria, rma, after, before, + options, RRDR_GROUPING_COUNTIF, highlight_countif_options, tier, 0, + QUERY_SOURCE_API_WEIGHTS, STORAGE_PRIORITY_SYNCHRONOUS); + merge_query_value_to_stats(&highlight_countif, stats, 1); + + if(!netdata_double_isnumber(highlight_countif.value)) { + netdata_log_info("WEIGHTS: highlighted countif query failed, but highlighted average worked - strange..."); + return; + } + + // this represents the percentage of time + // the highlighted window was above/below the baseline window + // (above or below depending on their averages) + highlight_countif.value = highlight_countif.value / 100.0; // countif returns 0 - 100.0 + + RESULT_FLAGS flags; + NETDATA_DOUBLE pcent = NAN; + if(isgreater(baseline_average.value, 0.0) || isless(baseline_average.value, 0.0)) { + flags = RESULT_IS_BASE_HIGH_RATIO; + pcent = (highlight_average.value - baseline_average.value) / baseline_average.value * highlight_countif.value; + } + else { + flags = RESULT_IS_PERCENTAGE_OF_TIME; + pcent = highlight_countif.value; + } + + register_result(results, host, rca, ria, rma, pcent, flags, &highlight_average.sp, &baseline_average.sp, stats, + register_zero, baseline_average.duration_ut + highlight_average.duration_ut + highlight_countif.duration_ut); +} + +// ---------------------------------------------------------------------------- +// VALUE / ANOMALY RATE algorithm functions + +static void rrdset_weights_value( + RRDHOST *host, + RRDCONTEXT_ACQUIRED *rca, RRDINSTANCE_ACQUIRED *ria, RRDMETRIC_ACQUIRED *rma, + DICTIONARY *results, + time_t after, time_t before, + RRDR_OPTIONS options, RRDR_TIME_GROUPING time_group_method, const char *time_group_options, + size_t tier, + WEIGHTS_STATS *stats, bool register_zero) { + + options |= RRDR_OPTION_MATCH_IDS | RRDR_OPTION_NATURAL_POINTS; + + QUERY_VALUE qv = rrdmetric2value(host, rca, ria, rma, after, before, + options, time_group_method, time_group_options, tier, 0, + QUERY_SOURCE_API_WEIGHTS, STORAGE_PRIORITY_SYNCHRONOUS); + + merge_query_value_to_stats(&qv, stats, 1); + + if(netdata_double_isnumber(qv.value)) + register_result(results, host, rca, ria, rma, qv.value, 0, &qv.sp, NULL, stats, register_zero, qv.duration_ut); +} + +static void rrdset_weights_multi_dimensional_value(struct query_weights_data *qwd) { + QUERY_TARGET_REQUEST qtr = { + .version = 1, + .scope_nodes = qwd->qwr->scope_nodes, + .scope_contexts = qwd->qwr->scope_contexts, + .nodes = qwd->qwr->nodes, + .contexts = qwd->qwr->contexts, + .instances = qwd->qwr->instances, + .dimensions = qwd->qwr->dimensions, + .labels = qwd->qwr->labels, + .alerts = qwd->qwr->alerts, + .after = qwd->qwr->after, + .before = qwd->qwr->before, + .points = 1, + .options = qwd->qwr->options | RRDR_OPTION_NATURAL_POINTS, + .time_group_method = qwd->qwr->time_group_method, + .time_group_options = qwd->qwr->time_group_options, + .tier = qwd->qwr->tier, + .timeout_ms = qwd->qwr->timeout_ms, + .query_source = QUERY_SOURCE_API_WEIGHTS, + .priority = STORAGE_PRIORITY_NORMAL, + }; + + ONEWAYALLOC *owa = onewayalloc_create(16 * 1024); + QUERY_TARGET *qt = query_target_create(&qtr); + RRDR *r = rrd2rrdr(owa, qt); + + if(!r || rrdr_rows(r) != 1 || !r->d || r->d != r->internal.qt->query.used) + goto cleanup; + + QUERY_VALUE qv = { + .after = r->view.after, + .before = r->view.before, + .points_read = r->stats.db_points_read, + .result_points = r->stats.result_points_generated, + }; + + size_t queries = 0; + for(size_t d = 0; d < r->d ;d++) { + if(!rrdr_dimension_should_be_exposed(r->od[d], qwd->qwr->options)) + continue; + + long i = 0; // only one row + NETDATA_DOUBLE *cn = &r->v[ i * r->d ]; + NETDATA_DOUBLE *ar = &r->ar[ i * r->d ]; + + qv.value = cn[d]; + qv.anomaly_rate = ar[d]; + storage_point_merge_to(qv.sp, r->internal.qt->query.array[d].query_points); + + if(netdata_double_isnumber(qv.value)) { + QUERY_METRIC *qm = query_metric(r->internal.qt, d); + QUERY_DIMENSION *qd = query_dimension(r->internal.qt, qm->link.query_dimension_id); + QUERY_INSTANCE *qi = query_instance(r->internal.qt, qm->link.query_instance_id); + QUERY_CONTEXT *qc = query_context(r->internal.qt, qm->link.query_context_id); + QUERY_NODE *qn = query_node(r->internal.qt, qm->link.query_node_id); + + register_result(qwd->results, qn->rrdhost, qc->rca, qi->ria, qd->rma, qv.value, 0, &qv.sp, + NULL, &qwd->stats, qwd->register_zero, qm->duration_ut); + } + + queries++; + } + + merge_query_value_to_stats(&qv, &qwd->stats, queries); + +cleanup: + rrdr_free(owa, r); + query_target_release(qt); + onewayalloc_destroy(owa); +} + +// ---------------------------------------------------------------------------- + +int compare_netdata_doubles(const void *left, const void *right) { + NETDATA_DOUBLE lt = *(NETDATA_DOUBLE *)left; + NETDATA_DOUBLE rt = *(NETDATA_DOUBLE *)right; + + // https://stackoverflow.com/a/3886497/1114110 + return (lt > rt) - (lt < rt); +} + +static inline int binary_search_bigger_than_netdata_double(const NETDATA_DOUBLE arr[], int left, int size, NETDATA_DOUBLE K) { + // binary search to find the index the smallest index + // of the first value in the array that is greater than K + + int right = size; + while(left < right) { + int middle = (int)(((unsigned int)(left + right)) >> 1); + + if(arr[middle] > K) + right = middle; + + else + left = middle + 1; + } + + return left; +} + +// ---------------------------------------------------------------------------- +// spread the results evenly according to their value + +static size_t spread_results_evenly(DICTIONARY *results, WEIGHTS_STATS *stats) { + struct register_result *t; + + // count the dimensions + size_t dimensions = dictionary_entries(results); + if(!dimensions) return 0; + + if(stats->max_base_high_ratio == 0.0) + stats->max_base_high_ratio = 1.0; + + // create an array of the right size and copy all the values in it + NETDATA_DOUBLE slots[dimensions]; + dimensions = 0; + dfe_start_read(results, t) { + if(t->flags & RESULT_IS_PERCENTAGE_OF_TIME) + t->value = t->value * stats->max_base_high_ratio; + + slots[dimensions++] = t->value; + } + dfe_done(t); + + if(!dimensions) return 0; // Coverity fix + + // sort the array with the values of all dimensions + qsort(slots, dimensions, sizeof(NETDATA_DOUBLE), compare_netdata_doubles); + + // skip the duplicates in the sorted array + NETDATA_DOUBLE last_value = NAN; + size_t unique_values = 0; + for(size_t i = 0; i < dimensions ;i++) { + if(likely(slots[i] != last_value)) + slots[unique_values++] = last_value = slots[i]; + } + + // this cannot happen, but coverity thinks otherwise... + if(!unique_values) + unique_values = dimensions; + + // calculate the weight of each slot, using the number of unique values + NETDATA_DOUBLE slot_weight = 1.0 / (NETDATA_DOUBLE)unique_values; + + dfe_start_read(results, t) { + int slot = binary_search_bigger_than_netdata_double(slots, 0, (int)unique_values, t->value); + NETDATA_DOUBLE v = slot * slot_weight; + if(unlikely(v > 1.0)) v = 1.0; + v = 1.0 - v; + t->value = v; + } + dfe_done(t); + + return dimensions; +} + +// ---------------------------------------------------------------------------- +// The main function + +static ssize_t weights_for_rrdmetric(void *data, RRDHOST *host, RRDCONTEXT_ACQUIRED *rca, RRDINSTANCE_ACQUIRED *ria, RRDMETRIC_ACQUIRED *rma) { + struct query_weights_data *qwd = data; + QUERY_WEIGHTS_REQUEST *qwr = qwd->qwr; + + if(qwd->qwr->interrupt_callback && qwd->qwr->interrupt_callback(qwd->qwr->interrupt_callback_data)) { + qwd->interrupted = true; + return -1; + } + + qwd->examined_dimensions++; + + switch(qwr->method) { + case WEIGHTS_METHOD_VALUE: + rrdset_weights_value( + host, rca, ria, rma, + qwd->results, + qwr->after, qwr->before, + qwr->options, qwr->time_group_method, qwr->time_group_options, qwr->tier, + &qwd->stats, qwd->register_zero + ); + break; + + case WEIGHTS_METHOD_ANOMALY_RATE: + qwr->options |= RRDR_OPTION_ANOMALY_BIT; + rrdset_weights_value( + host, rca, ria, rma, + qwd->results, + qwr->after, qwr->before, + qwr->options, qwr->time_group_method, qwr->time_group_options, qwr->tier, + &qwd->stats, qwd->register_zero + ); + break; + + case WEIGHTS_METHOD_MC_VOLUME: + rrdset_metric_correlations_volume( + host, rca, ria, rma, + qwd->results, + qwr->baseline_after, qwr->baseline_before, + qwr->after, qwr->before, + qwr->options, qwr->time_group_method, qwr->time_group_options, qwr->tier, + &qwd->stats, qwd->register_zero + ); + break; + + default: + case WEIGHTS_METHOD_MC_KS2: + rrdset_metric_correlations_ks2( + host, rca, ria, rma, + qwd->results, + qwr->baseline_after, qwr->baseline_before, + qwr->after, qwr->before, qwr->points, + qwr->options, qwr->time_group_method, qwr->time_group_options, qwr->tier, qwd->shifts, + &qwd->stats, qwd->register_zero + ); + break; + } + + qwd->timings.executed_ut = now_monotonic_usec(); + if(qwd->timings.executed_ut - qwd->timings.received_ut > qwd->timeout_us) { + qwd->timed_out = true; + return -1; + } + + query_progress_done_step(qwr->transaction, 1); + + return 1; +} + +static ssize_t weights_do_context_callback(void *data, RRDCONTEXT_ACQUIRED *rca, bool queryable_context) { + if(!queryable_context) + return false; + + struct query_weights_data *qwd = data; + + bool has_retention = false; + switch(qwd->qwr->method) { + case WEIGHTS_METHOD_VALUE: + case WEIGHTS_METHOD_ANOMALY_RATE: + has_retention = rrdcontext_retention_match(rca, qwd->qwr->after, qwd->qwr->before); + break; + + case WEIGHTS_METHOD_MC_KS2: + case WEIGHTS_METHOD_MC_VOLUME: + has_retention = rrdcontext_retention_match(rca, qwd->qwr->after, qwd->qwr->before); + if(has_retention) + has_retention = rrdcontext_retention_match(rca, qwd->qwr->baseline_after, qwd->qwr->baseline_before); + break; + } + + if(!has_retention) + return 0; + + ssize_t ret = weights_foreach_rrdmetric_in_context(rca, + qwd->instances_sp, + NULL, + qwd->labels_sp, + qwd->alerts_sp, + qwd->dimensions_sp, + true, true, qwd->qwr->version, + weights_for_rrdmetric, qwd); + return ret; +} + +ssize_t weights_do_node_callback(void *data, RRDHOST *host, bool queryable) { + if(!queryable) + return 0; + + struct query_weights_data *qwd = data; + + ssize_t ret = query_scope_foreach_context(host, qwd->qwr->scope_contexts, + qwd->scope_contexts_sp, qwd->contexts_sp, + weights_do_context_callback, queryable, qwd); + + return ret; +} + +int web_api_v12_weights(BUFFER *wb, QUERY_WEIGHTS_REQUEST *qwr) { + + char *error = NULL; + int resp = HTTP_RESP_OK; + + // if the user didn't give a timeout + // assume 60 seconds + if(!qwr->timeout_ms) + qwr->timeout_ms = 5 * 60 * MSEC_PER_SEC; + + // if the timeout is less than 1 second + // make it at least 1 second + if(qwr->timeout_ms < (long)(1 * MSEC_PER_SEC)) + qwr->timeout_ms = 1 * MSEC_PER_SEC; + + struct query_weights_data qwd = { + .qwr = qwr, + + .scope_nodes_sp = string_to_simple_pattern(qwr->scope_nodes), + .scope_contexts_sp = string_to_simple_pattern(qwr->scope_contexts), + .nodes_sp = string_to_simple_pattern(qwr->nodes), + .contexts_sp = string_to_simple_pattern(qwr->contexts), + .instances_sp = string_to_simple_pattern(qwr->instances), + .dimensions_sp = string_to_simple_pattern(qwr->dimensions), + .labels_sp = string_to_simple_pattern(qwr->labels), + .alerts_sp = string_to_simple_pattern(qwr->alerts), + .timeout_us = qwr->timeout_ms * USEC_PER_MS, + .timed_out = false, + .examined_dimensions = 0, + .register_zero = true, + .results = register_result_init(), + .stats = {}, + .shifts = 0, + .timings = { + .received_ut = now_monotonic_usec(), + } + }; + + if(!rrdr_relative_window_to_absolute_query(&qwr->after, &qwr->before, NULL, false)) + buffer_no_cacheable(wb); + else + buffer_cacheable(wb); + + if (qwr->before <= qwr->after) { + resp = HTTP_RESP_BAD_REQUEST; + error = "Invalid selected time-range."; + goto cleanup; + } + + if(qwr->method == WEIGHTS_METHOD_MC_KS2 || qwr->method == WEIGHTS_METHOD_MC_VOLUME) { + if(!qwr->points) qwr->points = 500; + + if(qwr->baseline_before <= API_RELATIVE_TIME_MAX) + qwr->baseline_before += qwr->after; + + rrdr_relative_window_to_absolute_query(&qwr->baseline_after, &qwr->baseline_before, NULL, false); + + if (qwr->baseline_before <= qwr->baseline_after) { + resp = HTTP_RESP_BAD_REQUEST; + error = "Invalid baseline time-range."; + goto cleanup; + } + + // baseline should be a power of two multiple of highlight + long long base_delta = qwr->baseline_before - qwr->baseline_after; + long long high_delta = qwr->before - qwr->after; + uint32_t multiplier = (uint32_t)round((double)base_delta / (double)high_delta); + + // check if the multiplier is a power of two + // https://stackoverflow.com/a/600306/1114110 + if((multiplier & (multiplier - 1)) != 0) { + // it is not power of two + // let's find the closest power of two + // https://stackoverflow.com/a/466242/1114110 + multiplier--; + multiplier |= multiplier >> 1; + multiplier |= multiplier >> 2; + multiplier |= multiplier >> 4; + multiplier |= multiplier >> 8; + multiplier |= multiplier >> 16; + multiplier++; + } + + // convert the multiplier to the number of shifts + // we need to do, to divide baseline numbers to match + // the highlight ones + while(multiplier > 1) { + qwd.shifts++; + multiplier = multiplier >> 1; + } + + // if the baseline size will not comply to MAX_POINTS + // lower the window of the baseline + while(qwd.shifts && (qwr->points << qwd.shifts) > MAX_POINTS) + qwd.shifts--; + + // if the baseline size still does not comply to MAX_POINTS + // lower the resolution of the highlight and the baseline + while((qwr->points << qwd.shifts) > MAX_POINTS) + qwr->points = qwr->points >> 1; + + if(qwr->points < 15) { + resp = HTTP_RESP_BAD_REQUEST; + error = "Too few points available, at least 15 are needed."; + goto cleanup; + } + + // adjust the baseline to be multiplier times bigger than the highlight + qwr->baseline_after = qwr->baseline_before - (high_delta << qwd.shifts); + } + + if(qwr->options & RRDR_OPTION_NONZERO) { + qwd.register_zero = false; + + // remove it to run the queries without it + qwr->options &= ~RRDR_OPTION_NONZERO; + } + + if(qwr->host && qwr->version == 1) + weights_do_node_callback(&qwd, qwr->host, true); + else { + if((qwd.qwr->method == WEIGHTS_METHOD_VALUE || qwd.qwr->method == WEIGHTS_METHOD_ANOMALY_RATE) && (qwd.contexts_sp || qwd.scope_contexts_sp)) { + rrdset_weights_multi_dimensional_value(&qwd); + } + else { + query_scope_foreach_host(qwd.scope_nodes_sp, qwd.nodes_sp, + weights_do_node_callback, &qwd, + &qwd.versions, + NULL); + } + } + + if(!qwd.register_zero) { + // put it back, to show it in the response + qwr->options |= RRDR_OPTION_NONZERO; + } + + if(qwd.timed_out) { + error = "timed out"; + resp = HTTP_RESP_GATEWAY_TIMEOUT; + goto cleanup; + } + + if(qwd.interrupted) { + error = "interrupted"; + resp = HTTP_RESP_CLIENT_CLOSED_REQUEST; + goto cleanup; + } + + if(!qwd.register_zero) + qwr->options |= RRDR_OPTION_NONZERO; + + if(!(qwr->options & RRDR_OPTION_RETURN_RAW) && qwr->method != WEIGHTS_METHOD_VALUE) + spread_results_evenly(qwd.results, &qwd.stats); + + usec_t ended_usec = qwd.timings.executed_ut = now_monotonic_usec(); + + // generate the json output we need + buffer_flush(wb); + + size_t added_dimensions = 0; + switch(qwr->format) { + case WEIGHTS_FORMAT_CHARTS: + added_dimensions = + registered_results_to_json_charts( + qwd.results, wb, + qwr->after, qwr->before, + qwr->baseline_after, qwr->baseline_before, + qwr->points, qwr->method, qwr->time_group_method, qwr->options, qwd.shifts, + qwd.examined_dimensions, + ended_usec - qwd.timings.received_ut, &qwd.stats); + break; + + case WEIGHTS_FORMAT_CONTEXTS: + added_dimensions = + registered_results_to_json_contexts( + qwd.results, wb, + qwr->after, qwr->before, + qwr->baseline_after, qwr->baseline_before, + qwr->points, qwr->method, qwr->time_group_method, qwr->options, qwd.shifts, + qwd.examined_dimensions, + ended_usec - qwd.timings.received_ut, &qwd.stats); + break; + + default: + case WEIGHTS_FORMAT_MULTINODE: + // we don't support these groupings in weights + qwr->group_by.group_by &= ~(RRDR_GROUP_BY_LABEL|RRDR_GROUP_BY_SELECTED|RRDR_GROUP_BY_PERCENTAGE_OF_INSTANCE); + if(qwr->group_by.group_by == RRDR_GROUP_BY_NONE) { + added_dimensions = + registered_results_to_json_multinode_no_group_by( + qwd.results, wb, + qwr->after, qwr->before, + qwr->baseline_after, qwr->baseline_before, + qwr->points, qwr->method, qwr->time_group_method, qwr->options, qwd.shifts, + qwd.examined_dimensions, + &qwd, &qwd.stats, &qwd.versions); + } + else { + added_dimensions = + registered_results_to_json_multinode_group_by( + qwd.results, wb, + qwr->after, qwr->before, + qwr->baseline_after, qwr->baseline_before, + qwr->points, qwr->method, qwr->time_group_method, qwr->options, qwd.shifts, + qwd.examined_dimensions, + &qwd, &qwd.stats, &qwd.versions); + } + break; + } + + if(!added_dimensions && qwr->version < 2) { + error = "no results produced."; + resp = HTTP_RESP_NOT_FOUND; + } + +cleanup: + simple_pattern_free(qwd.scope_nodes_sp); + simple_pattern_free(qwd.scope_contexts_sp); + simple_pattern_free(qwd.nodes_sp); + simple_pattern_free(qwd.contexts_sp); + simple_pattern_free(qwd.instances_sp); + simple_pattern_free(qwd.dimensions_sp); + simple_pattern_free(qwd.labels_sp); + simple_pattern_free(qwd.alerts_sp); + + register_result_destroy(qwd.results); + + if(error) { + buffer_flush(wb); + buffer_sprintf(wb, "{\"error\": \"%s\" }", error); + } + + return resp; +} + +// ---------------------------------------------------------------------------- +// unittest + +/* + +Unit tests against the output of this: + +https://github.com/scipy/scipy/blob/4cf21e753cf937d1c6c2d2a0e372fbc1dbbeea81/scipy/stats/_stats_py.py#L7275-L7449 + +import matplotlib.pyplot as plt +import pandas as pd +import numpy as np +import scipy as sp +from scipy import stats + +data1 = np.array([ 1111, -2222, 33, 100, 100, 15555, -1, 19999, 888, 755, -1, -730 ]) +data2 = np.array([365, -123, 0]) +data1 = np.sort(data1) +data2 = np.sort(data2) +n1 = data1.shape[0] +n2 = data2.shape[0] +data_all = np.concatenate([data1, data2]) +cdf1 = np.searchsorted(data1, data_all, side='right') / n1 +cdf2 = np.searchsorted(data2, data_all, side='right') / n2 +print(data_all) +print("\ndata1", data1, cdf1) +print("\ndata2", data2, cdf2) +cddiffs = cdf1 - cdf2 +print("\ncddiffs", cddiffs) +minS = np.clip(-np.min(cddiffs), 0, 1) +maxS = np.max(cddiffs) +print("\nmin", minS) +print("max", maxS) +m, n = sorted([float(n1), float(n2)], reverse=True) +en = m * n / (m + n) +d = max(minS, maxS) +prob = stats.distributions.kstwo.sf(d, np.round(en)) +print("\nprob", prob) + +*/ + +static int double_expect(double v, const char *str, const char *descr) { + char buf[100 + 1]; + snprintfz(buf, sizeof(buf) - 1, "%0.6f", v); + int ret = strcmp(buf, str) ? 1 : 0; + + fprintf(stderr, "%s %s, expected %s, got %s\n", ret?"FAILED":"OK", descr, str, buf); + return ret; +} + +static int mc_unittest1(void) { + int bs = 3, hs = 3; + DIFFS_NUMBERS base[3] = { 1, 2, 3 }; + DIFFS_NUMBERS high[3] = { 3, 4, 6 }; + + double prob = ks_2samp(base, bs, high, hs, 0); + return double_expect(prob, "0.222222", "3x3"); +} + +static int mc_unittest2(void) { + int bs = 6, hs = 3; + DIFFS_NUMBERS base[6] = { 1, 2, 3, 10, 10, 15 }; + DIFFS_NUMBERS high[3] = { 3, 4, 6 }; + + double prob = ks_2samp(base, bs, high, hs, 1); + return double_expect(prob, "0.500000", "6x3"); +} + +static int mc_unittest3(void) { + int bs = 12, hs = 3; + DIFFS_NUMBERS base[12] = { 1, 2, 3, 10, 10, 15, 111, 19999, 8, 55, -1, -73 }; + DIFFS_NUMBERS high[3] = { 3, 4, 6 }; + + double prob = ks_2samp(base, bs, high, hs, 2); + return double_expect(prob, "0.347222", "12x3"); +} + +static int mc_unittest4(void) { + int bs = 12, hs = 3; + DIFFS_NUMBERS base[12] = { 1111, -2222, 33, 100, 100, 15555, -1, 19999, 888, 755, -1, -730 }; + DIFFS_NUMBERS high[3] = { 365, -123, 0 }; + + double prob = ks_2samp(base, bs, high, hs, 2); + return double_expect(prob, "0.777778", "12x3"); +} + +int mc_unittest(void) { + int errors = 0; + + errors += mc_unittest1(); + errors += mc_unittest2(); + errors += mc_unittest3(); + errors += mc_unittest4(); + + return errors; +} + diff --git a/src/web/api/queries/weights.h b/src/web/api/queries/weights.h new file mode 100644 index 000000000..be7e5a8b3 --- /dev/null +++ b/src/web/api/queries/weights.h @@ -0,0 +1,70 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +#ifndef NETDATA_API_WEIGHTS_H +#define NETDATA_API_WEIGHTS_H 1 + +#include "query.h" + +typedef enum { + WEIGHTS_METHOD_MC_KS2 = 1, + WEIGHTS_METHOD_MC_VOLUME = 2, + WEIGHTS_METHOD_ANOMALY_RATE = 3, + WEIGHTS_METHOD_VALUE = 4, +} WEIGHTS_METHOD; + +typedef enum { + WEIGHTS_FORMAT_CHARTS = 1, + WEIGHTS_FORMAT_CONTEXTS = 2, + WEIGHTS_FORMAT_MULTINODE = 3, +} WEIGHTS_FORMAT; + +extern int enable_metric_correlations; +extern int metric_correlations_version; +extern WEIGHTS_METHOD default_metric_correlations_method; + +typedef bool (*weights_interrupt_callback_t)(void *data); + +typedef struct query_weights_request { + size_t version; + RRDHOST *host; + const char *scope_nodes; + const char *scope_contexts; + const char *nodes; + const char *contexts; + const char *instances; + const char *dimensions; + const char *labels; + const char *alerts; + + struct { + RRDR_GROUP_BY group_by; + char *group_by_label; + RRDR_GROUP_BY_FUNCTION aggregation; + } group_by; + + WEIGHTS_METHOD method; + WEIGHTS_FORMAT format; + RRDR_TIME_GROUPING time_group_method; + const char *time_group_options; + time_t baseline_after; + time_t baseline_before; + time_t after; + time_t before; + size_t points; + RRDR_OPTIONS options; + size_t tier; + time_t timeout_ms; + + weights_interrupt_callback_t interrupt_callback; + void *interrupt_callback_data; + + nd_uuid_t *transaction; +} QUERY_WEIGHTS_REQUEST; + +int web_api_v12_weights(BUFFER *wb, QUERY_WEIGHTS_REQUEST *qwr); + +WEIGHTS_METHOD weights_string_to_method(const char *method); +const char *weights_method_to_string(WEIGHTS_METHOD method); +int mc_unittest(void); + +#endif //NETDATA_API_WEIGHTS_H |