libnetdata/adaptive_resortable_list/README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103

<!--
title: "Adaptive Re-sortable List (ARL)"
custom_edit_url: https://github.com/netdata/netdata/edit/master/libnetdata/adaptive_resortable_list/README.md
sidebar_label: "Adaptive Re-sortable List (ARL)"
learn_status: "Published"
learn_topic_type: "Tasks"
learn_rel_path: "Developers/libnetdata libraries"
-->

# Adaptive Re-sortable List (ARL)

This library allows Netdata to read a series of `name - value` pairs
in the **fastest possible way**.

ARLs are used all over Netdata, as they are the most
CPU utilization efficient way to process `/proc` files. They are used to
process both vertical (csv like) and horizontal (one pair per line) `name - value` pairs.

## How ARL works

It maintains a linked list of all `NAME` (keywords), sorted in the
order found in the data source. The linked list is kept
sorted at all times - the data source may change at any time, the
linked list will adapt at the next iteration.

### Initialization

During initialization (just once), the caller:

-   calls `arl_create()` to create the ARL

-   calls `arl_expect()` multiple times to register the expected keywords

The library will call the `processor()` function (given to
`arl_create()`), for each expected keyword found.
The default `processor()` expects `dst` to be an `unsigned long long *`.

Each `name` keyword may have a different `processor()` (by calling
`arl_expect_custom()` instead of `arl_expect()`).

### Data collection iterations

For each iteration through the data source, the caller:

-   calls `arl_begin()` to initiate a data collection iteration.
     This is to be called just ONCE every time the source is re-evaluated.

-   calls `arl_check()` for each entry read from the file.

### Cleanup

When the caller exits:

-   calls `arl_free()` to destroy this and free all memory.

### Performance

ARL maintains a list of `name` keywords found in the data source (even the ones
that are not useful for data collection).

If the data source maintains the same order on the `name-value` pairs, for each
each call to `arl_check()` only an `strcmp()` is executed to verify the
expected order has not changed, a counter is incremented and a pointer is changed.
So, if the data source has 100 `name-value` pairs, and their order remains constant
over time, 100 successful `strcmp()` are executed.

In the unlikely event that an iteration sees the data source with a different order,
for each out-of-order keyword, a full search of the remaining keywords is made. But
this search uses 32bit hashes, not string comparisons, so it should also be fast. 

When all expectations are satisfied (even in the middle of an iteration),
the call to `arl_check()` will return 1, to signal the caller to stop the loop,
saving valuable CPU resources for the rest of the data source. 

In the following test we used alternative methods to process, **1M times**,
a data source like `/proc/meminfo`, already tokenized, in memory,
to extract the same number of expected metrics:

|test|code|string comparison|number parsing|duration|
|:--:|:--:|:---------------:|:------------:|:------:|
|1|if-else-if-else-if|`strcmp()`|`strtoull()`|4630.337 ms|
|2|nested loops|inline `simple_hash()` and `strcmp()`|`strtoull()`|1597.481 ms|
|3|nested loops|inline `simple_hash()` and `strcmp()`|`str2ull()`|923.523 ms|
|4|if-else-if-else-if|inline `simple_hash()` and `strcmp()`|`strtoull()`|854.574 ms|
|5|if-else-if-else-if|statement expression `simple_hash()` and `strcmp()`|`strtoull()`|912.013 ms|
|6|if-continue|inline `simple_hash()` and `strcmp()`|`strtoull()`|842.279 ms|
|7|if-else-if-else-if|inline `simple_hash()` and `strcmp()`|`str2ull()`|602.837 ms|
|8|ARL|ARL|`strtoull()`|350.360 ms|
|9|ARL|ARL|`str2ull()`|157.237 ms|

Compared to unoptimized code (test No 1: 4.6sec):

-   before ARL Netdata was using test No **7** with hashing and a custom `str2ull()` to achieve 602ms.
-   the current ARL implementation is test No **9** that needs only 157ms (29 times faster vs unoptimized code, about 4 times faster vs optimized code).

[Check the source code of this test](https://raw.githubusercontent.com/netdata/netdata/master/tests/profile/benchmark-value-pairs.c).

## Limitations

Do not use ARL if the a name/keyword may appear more than once in the
source data.