summaryrefslogtreecommitdiffstats
path: root/source/proposals/lookup_tables.rst
blob: f61d673d944caf002a5e3857cdfcf5f821e84c0d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
Lookup Tables
=============

**NOTE: this is proposed functionality, which is NOT YET IMPLEMENTED!**

**Lookup tables are a powerful construct to obtain "class" information
based on message content (e.g. to build log file names for different
server types, departments or remote offices).**

The base idea is to use a message variable as an index into a table
which then returns another value. For example, $fromhost-ip could be
used as an index, with the table value representing the type of server
or the department or remote office it is located in. A main point with
lookup tables is that the lookup is very fast. So while lookup tables
can be emulated with if-elseif constructs, they are generally much
faster. Also, it is possible to reload lookup tables during rsyslog
runtime without the need for a full restart.

The lookup tables itself exists in a separate configuration file (one
per table). This file is loaded on rsyslog startup and when a reload is
requested.

There are different types of lookup tables:

-  **string** - the value to be looked up is an arbitrary string. Only
   exact some strings match.
-  **array** - the value to be looked up is an integer number from a
   consecutive set. The set does not need to start at zero or one, but
   there must be no number missing. So, for example 5,6,7,8,9 would be a
   valid set of index values, while 1,2,4,5 would not be (due to missing
   2). A match happens if the requested number is present.
-  **sparseArray** - the value to be looked up is an integer value, but
   there may be gaps inside the set of values (usually there are large
   gaps). A typical use case would be the matching of IPv4 address
   information. A match happens on the first value that is less than or
   equal to the requested value.

Note that index integer numbers are represented by unsigned 32 bits.

Lookup tables can be access via the lookup() built-in function. The core
idea is to set a local variable to the lookup result and later on use
that local variable in templates.

More details on usage now follow.

Lookup Table File Format
------------------------

Lookup table files contain a single JSON object. This object contains of
a header and a table part.

Header
~~~~~~

The header is the top-level json. It has parameters "version", "nomatch",
and "type". The version parameter must be given and must always be one
for this version of rsyslog. The nomatch parameter is optional. If
specified, it contains the value to be used if lookup() is provided an
index value for which no entry exists. The default for "nomatch" is the
empty string. Type specifies the type of lookup to be done.

Table
~~~~~

This must be an array of elements, even if only a single value exists
(for obvious reasons, we do not expect this to occur often). Each array
element must contain two fields "index" and "value".

Example
~~~~~~~

This is a sample of how an ip-to-office mapping may look like:

::

    { "version":1, "nomatch":"unk", "type":"string",
      "table":[ {"index":"10.0.1.1", "value":"A" },
              {"index":"10.0.1.2", "value":"A" },
              {"index":"10.0.1.3", "value":"A" },
              {"index":"10.0.2.1", "value":"B" },
              {"index":"10.0.2.2", "value":"B" },
              {"index":"10.0.2.3", "value":"B" }
            ]
    }

Note: if a different IP comes in, the value "unk" is returned thanks to
the nomatch parameter in the first line.

RainerScript Statements
-----------------------

lookup\_table() Object
~~~~~~~~~~~~~~~~~~~~~~

This statement defines and initially loads a lookup table. Its format is
as follows:

::

    lookup_table(name="name" file="/path/to/file" reloadOnHUP="on|off")

Parameters
^^^^^^^^^^

-  **name** (mandatory)

   Defines the name of lookup table for further reference inside the
   configuration. Names must be unique. Note that it is possible, though
   not advisible, to have different names for the same file.
-  **file** (mandatory)

   Specifies the full path for the lookup table file. This file must be
   readable for the user rsyslog is run under (important when dropping
   privileges). It must point to a valid lookup table file as described
   above.
-  **reloadOnHUP** (optional, default "on")

   Specifies if the table shall automatically be reloaded as part of
   HUP processing. For static tables, the default is "off" and
   specifying "on" triggers an error message. Note that the default of
   "on" may be somewhat suboptimal performance-wise, but probably is
   what the user intuitively expects. Turn it off if you know that you
   do not need the automatic reload capability.

lookup() Function
~~~~~~~~~~~~~~~~~

This function is used to actually do the table lookup. Format:

::

    lookup("name", indexvalue)

Parameters
^^^^^^^^^^

-  **return value**

   The function returns the string that is associated with the given
   indexvalue. If the indexvalue is not present inside the lookup table,
   the "nomatch" string is returned (or an empty string if it is not
   defined).
-  **name** (constant string)

   The lookup table to be used. Note that this must be specified as a
   constant. In theory, variable table names could be made possible, but
   their runtime behaviour is not as good as for static names, and we do
   not (yet) see good use cases where dynamic table names could be
   useful.
-  **indexvalue** (expression)

   The value to be looked up. While this is an arbitrary RainerScript
   expression, it's final value is always converted to a string in order
   to conduct the lookup. For example, "lookup(table, 3+4)" would be
   exactly the same as "lookup(table, "7")". In most cases, indexvalue
   will probably be a single variable, but it could also be the result
   of all RainerScript-supported expression types (like string
   concatenation or substring extraction). Valid samples are
   "lookup(name, $fromhost-ip & $hostname)" or "lookup(name,
   substr($fromhost-ip, 0, 5))" as well as of course the usual
   "lookup(table, $fromhost-ip)".

load\_lookup\_table Statement
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Note: in the final implementation, this MAY be implemented as an
action. This is a low-level decesion that must be made during the detail
development process. Parameters and semantics will remain the same of
this happens.**

This statement is used to reload a lookup table. It will fail if the
table is static. While this statement is executed, lookups to this table
are temporarily blocked. So for large tables, there may be a slight
performance hit during the load phase. It is assume that always a
triggering condition is used to load the table.

::

    load_lookup_table(name="name" errOnFail="on|off" valueOnFail="value")

Parameters
^^^^^^^^^^

-  **name** (string)

   The lookup table to be used.
-  **errOnFail** (boolean, default "on")

   Specifies whether or not an error message is to be emitted if there
   are any problems reloading the lookup table.
-  **valueOnFail** (optional, string)

   This parameter affects processing if the lookup table cannot be
   loaded for some reason: If the parameter is not present, the previous
   table will be kept in use. If the parameter is given, the previous
   table will no longer be used, and instead an empty table be with
   nomath=valueOnFail be generated. In short, that means when the
   parameter is set and the reload fails, all matches will always return
   what is specified in valueOnFail.

Usage example
~~~~~~~~~~~~~

For clarity, we show only those parts of rsyslog.conf that affect lookup
tables. We use the remote office example that an example lookup table
file is given above for.

::

    lookup_table(name="ip2office" file="/path/to/ipoffice.lu"
                 reloadOnHUP="off")


    template(name="depfile" type="string"
             string="/var/log/%$usr.dep%/messages")

    set $usr.dep = lookup("ip2office", $fromhost-ip);
    action(type="omfile" dynfile="depfile")

    # support for reload "commands"
    if $fromhost-ip == "10.0.1.123"
       and $msg contains "reload office lookup table"
       then
       load_lookup_table(name="ip2office" errOnFail="on")

Note: for performance reasons, it makes sense to put the reload command
into a dedicated ruleset, bound to a specific listener - which than
should also be sufficiently secured, e.g. via TLS mutual auth.

Implementation Details
----------------------

The lookup table functionality is implemented via highly efficient
algorithms. The string lookup has O(log n) time complexity. The array
lookup is O(1). In case of sparseArray, we have O(log n).

To preserve space and, more important, increase cache hit performance,
equal data values are only stored once, no matter how often a lookup
index points to them.