summaryrefslogtreecommitdiffstats
path: root/libnetdata/adaptive_resortable_list/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'libnetdata/adaptive_resortable_list/README.md')
-rw-r--r--libnetdata/adaptive_resortable_list/README.md93
1 files changed, 93 insertions, 0 deletions
diff --git a/libnetdata/adaptive_resortable_list/README.md b/libnetdata/adaptive_resortable_list/README.md
new file mode 100644
index 000000000..0ba3ec9b5
--- /dev/null
+++ b/libnetdata/adaptive_resortable_list/README.md
@@ -0,0 +1,93 @@
+
+# Adaptive Re-sortable List (ARL)
+
+This library allows netdata to read a series of `name - value` pairs
+in the **fastest possible way**.
+
+ARLs are used all over netdata, as they are the most
+CPU utilization efficient way to process `/proc` files. They are used to
+process both vertical (csv like) and horizontal (one pair per line) `name - value` pairs.
+
+## How ARL works
+
+It maintains a linked list of all `NAME` (keywords), sorted in the
+order found in the data source. The linked list is kept
+sorted at all times - the data source may change at any time, the
+linked list will adapt at the next iteration.
+
+### Initialization
+
+During initialization (just once), the caller:
+
+- calls `arl_create()` to create the ARL
+
+- calls `arl_expect()` multiple times to register the expected keywords
+
+The library will call the `processor()` function (given to
+`arl_create()`), for each expected keyword found.
+The default `processor()` expects `dst` to be an `unsigned long long *`.
+
+Each `name` keyword may have a different `processor()` (by calling
+`arl_expect_custom()` instead of `arl_expect()`).
+
+### Data collection iterations
+
+For each iteration through the data source, the caller:
+
+- calls `arl_begin()` to initiate a data collection iteration.
+ This is to be called just ONCE every time the source is re-evaluated.
+
+- calls `arl_check()` for each entry read from the file.
+
+### Cleanup
+
+When the caller exits:
+
+- calls `arl_free()` to destroy this and free all memory.
+
+### Performance
+
+ARL maintains a list of `name` keywords found in the data source (even the ones
+that are not useful for data collection).
+
+If the data source maintains the same order on the `name-value` pairs, for each
+each call to `arl_check()` only an `strcmp()` is executed to verify the
+expected order has not changed, a counter is incremented and a pointer is changed.
+So, if the data source has 100 `name-value` pairs, and their order remains constant
+over time, 100 successful `strcmp()` are executed.
+
+In the unlikely event that an iteration sees the data source with a different order,
+for each out-of-order keyword, a full search of the remaining keywords is made. But
+this search uses 32bit hashes, not string comparisons, so it should also be fast.
+
+When all expectations are satisfied (even in the middle of an iteration),
+the call to `arl_check()` will return 1, to signal the caller to stop the loop,
+saving valuable CPU resources for the rest of the data source.
+
+In the following test we used alternative methods to process, **1M times**,
+a data source like `/proc/meminfo`, already tokenized, in memory,
+to extract the same number of expected metrics:
+
+test|code|string comparison|number parsing|duration
+:---:|:---:|:---:|:---:|:---:|
+1|if-else-if-else-if|`strcmp()`|`strtoull()`|4630.337 ms
+2|nested loops|inline `simple_hash()` and `strcmp()`|`strtoull()`|1597.481 ms
+3|nested loops|inline `simple_hash()` and `strcmp()`|`str2ull()`|923.523 ms
+4|if-else-if-else-if|inline `simple_hash()` and `strcmp()`|`strtoull()`| 854.574 ms
+5|if-else-if-else-if|statement expression `simple_hash()` and `strcmp()`|`strtoull()`|912.013 ms
+6|if-continue|inline `simple_hash()` and `strcmp()`|`strtoull()`|842.279 ms
+7|if-else-if-else-if|inline `simple_hash()` and `strcmp()`|`str2ull()`|602.837 ms
+8|ARL|ARL|`strtoull()`|350.360 ms
+9|ARL|ARL|`str2ull()`|157.237 ms
+
+Compared to unoptimized code (test No 1: 4.6sec):
+
+ - before ARL netdata was using test No **7** with hashing and a custom `str2ull()` to achieve 602ms.
+ - the current ARL implementation is test No **9** that needs only 157ms (29 times faster vs unoptimized code, about 4 times faster vs optimized code).
+
+[Check the source code of this test](../../tests/profile/benchmark-value-pairs.c).
+
+## Limitations
+
+Do not use ARL if the a name/keyword may appear more than once in the
+source data.