summaryrefslogtreecommitdiffstats
path: root/source/proposals/lookup_tables.rst
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-15 16:27:18 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-15 16:27:18 +0000
commitf7f20c3f5e0be02585741f5f54d198689ccd7866 (patch)
tree190d5e080f6cbcc40560b0ceaccfd883cb3faa01 /source/proposals/lookup_tables.rst
parentInitial commit. (diff)
downloadrsyslog-doc-f7f20c3f5e0be02585741f5f54d198689ccd7866.tar.xz
rsyslog-doc-f7f20c3f5e0be02585741f5f54d198689ccd7866.zip
Adding upstream version 8.2402.0+dfsg.upstream/8.2402.0+dfsg
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'source/proposals/lookup_tables.rst')
-rw-r--r--source/proposals/lookup_tables.rst240
1 files changed, 240 insertions, 0 deletions
diff --git a/source/proposals/lookup_tables.rst b/source/proposals/lookup_tables.rst
new file mode 100644
index 0000000..f61d673
--- /dev/null
+++ b/source/proposals/lookup_tables.rst
@@ -0,0 +1,240 @@
+Lookup Tables
+=============
+
+**NOTE: this is proposed functionality, which is NOT YET IMPLEMENTED!**
+
+**Lookup tables are a powerful construct to obtain "class" information
+based on message content (e.g. to build log file names for different
+server types, departments or remote offices).**
+
+The base idea is to use a message variable as an index into a table
+which then returns another value. For example, $fromhost-ip could be
+used as an index, with the table value representing the type of server
+or the department or remote office it is located in. A main point with
+lookup tables is that the lookup is very fast. So while lookup tables
+can be emulated with if-elseif constructs, they are generally much
+faster. Also, it is possible to reload lookup tables during rsyslog
+runtime without the need for a full restart.
+
+The lookup tables itself exists in a separate configuration file (one
+per table). This file is loaded on rsyslog startup and when a reload is
+requested.
+
+There are different types of lookup tables:
+
+- **string** - the value to be looked up is an arbitrary string. Only
+ exact some strings match.
+- **array** - the value to be looked up is an integer number from a
+ consecutive set. The set does not need to start at zero or one, but
+ there must be no number missing. So, for example 5,6,7,8,9 would be a
+ valid set of index values, while 1,2,4,5 would not be (due to missing
+ 2). A match happens if the requested number is present.
+- **sparseArray** - the value to be looked up is an integer value, but
+ there may be gaps inside the set of values (usually there are large
+ gaps). A typical use case would be the matching of IPv4 address
+ information. A match happens on the first value that is less than or
+ equal to the requested value.
+
+Note that index integer numbers are represented by unsigned 32 bits.
+
+Lookup tables can be access via the lookup() built-in function. The core
+idea is to set a local variable to the lookup result and later on use
+that local variable in templates.
+
+More details on usage now follow.
+
+Lookup Table File Format
+------------------------
+
+Lookup table files contain a single JSON object. This object contains of
+a header and a table part.
+
+Header
+~~~~~~
+
+The header is the top-level json. It has parameters "version", "nomatch",
+and "type". The version parameter must be given and must always be one
+for this version of rsyslog. The nomatch parameter is optional. If
+specified, it contains the value to be used if lookup() is provided an
+index value for which no entry exists. The default for "nomatch" is the
+empty string. Type specifies the type of lookup to be done.
+
+Table
+~~~~~
+
+This must be an array of elements, even if only a single value exists
+(for obvious reasons, we do not expect this to occur often). Each array
+element must contain two fields "index" and "value".
+
+Example
+~~~~~~~
+
+This is a sample of how an ip-to-office mapping may look like:
+
+::
+
+ { "version":1, "nomatch":"unk", "type":"string",
+ "table":[ {"index":"10.0.1.1", "value":"A" },
+ {"index":"10.0.1.2", "value":"A" },
+ {"index":"10.0.1.3", "value":"A" },
+ {"index":"10.0.2.1", "value":"B" },
+ {"index":"10.0.2.2", "value":"B" },
+ {"index":"10.0.2.3", "value":"B" }
+ ]
+ }
+
+Note: if a different IP comes in, the value "unk" is returned thanks to
+the nomatch parameter in the first line.
+
+RainerScript Statements
+-----------------------
+
+lookup\_table() Object
+~~~~~~~~~~~~~~~~~~~~~~
+
+This statement defines and initially loads a lookup table. Its format is
+as follows:
+
+::
+
+ lookup_table(name="name" file="/path/to/file" reloadOnHUP="on|off")
+
+Parameters
+^^^^^^^^^^
+
+- **name** (mandatory)
+
+ Defines the name of lookup table for further reference inside the
+ configuration. Names must be unique. Note that it is possible, though
+ not advisible, to have different names for the same file.
+- **file** (mandatory)
+
+ Specifies the full path for the lookup table file. This file must be
+ readable for the user rsyslog is run under (important when dropping
+ privileges). It must point to a valid lookup table file as described
+ above.
+- **reloadOnHUP** (optional, default "on")
+
+ Specifies if the table shall automatically be reloaded as part of
+ HUP processing. For static tables, the default is "off" and
+ specifying "on" triggers an error message. Note that the default of
+ "on" may be somewhat suboptimal performance-wise, but probably is
+ what the user intuitively expects. Turn it off if you know that you
+ do not need the automatic reload capability.
+
+lookup() Function
+~~~~~~~~~~~~~~~~~
+
+This function is used to actually do the table lookup. Format:
+
+::
+
+ lookup("name", indexvalue)
+
+Parameters
+^^^^^^^^^^
+
+- **return value**
+
+ The function returns the string that is associated with the given
+ indexvalue. If the indexvalue is not present inside the lookup table,
+ the "nomatch" string is returned (or an empty string if it is not
+ defined).
+- **name** (constant string)
+
+ The lookup table to be used. Note that this must be specified as a
+ constant. In theory, variable table names could be made possible, but
+ their runtime behaviour is not as good as for static names, and we do
+ not (yet) see good use cases where dynamic table names could be
+ useful.
+- **indexvalue** (expression)
+
+ The value to be looked up. While this is an arbitrary RainerScript
+ expression, it's final value is always converted to a string in order
+ to conduct the lookup. For example, "lookup(table, 3+4)" would be
+ exactly the same as "lookup(table, "7")". In most cases, indexvalue
+ will probably be a single variable, but it could also be the result
+ of all RainerScript-supported expression types (like string
+ concatenation or substring extraction). Valid samples are
+ "lookup(name, $fromhost-ip & $hostname)" or "lookup(name,
+ substr($fromhost-ip, 0, 5))" as well as of course the usual
+ "lookup(table, $fromhost-ip)".
+
+load\_lookup\_table Statement
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+**Note: in the final implementation, this MAY be implemented as an
+action. This is a low-level decesion that must be made during the detail
+development process. Parameters and semantics will remain the same of
+this happens.**
+
+This statement is used to reload a lookup table. It will fail if the
+table is static. While this statement is executed, lookups to this table
+are temporarily blocked. So for large tables, there may be a slight
+performance hit during the load phase. It is assume that always a
+triggering condition is used to load the table.
+
+::
+
+ load_lookup_table(name="name" errOnFail="on|off" valueOnFail="value")
+
+Parameters
+^^^^^^^^^^
+
+- **name** (string)
+
+ The lookup table to be used.
+- **errOnFail** (boolean, default "on")
+
+ Specifies whether or not an error message is to be emitted if there
+ are any problems reloading the lookup table.
+- **valueOnFail** (optional, string)
+
+ This parameter affects processing if the lookup table cannot be
+ loaded for some reason: If the parameter is not present, the previous
+ table will be kept in use. If the parameter is given, the previous
+ table will no longer be used, and instead an empty table be with
+ nomath=valueOnFail be generated. In short, that means when the
+ parameter is set and the reload fails, all matches will always return
+ what is specified in valueOnFail.
+
+Usage example
+~~~~~~~~~~~~~
+
+For clarity, we show only those parts of rsyslog.conf that affect lookup
+tables. We use the remote office example that an example lookup table
+file is given above for.
+
+::
+
+ lookup_table(name="ip2office" file="/path/to/ipoffice.lu"
+ reloadOnHUP="off")
+
+
+ template(name="depfile" type="string"
+ string="/var/log/%$usr.dep%/messages")
+
+ set $usr.dep = lookup("ip2office", $fromhost-ip);
+ action(type="omfile" dynfile="depfile")
+
+ # support for reload "commands"
+ if $fromhost-ip == "10.0.1.123"
+ and $msg contains "reload office lookup table"
+ then
+ load_lookup_table(name="ip2office" errOnFail="on")
+
+Note: for performance reasons, it makes sense to put the reload command
+into a dedicated ruleset, bound to a specific listener - which than
+should also be sufficiently secured, e.g. via TLS mutual auth.
+
+Implementation Details
+----------------------
+
+The lookup table functionality is implemented via highly efficient
+algorithms. The string lookup has O(log n) time complexity. The array
+lookup is O(1). In case of sparseArray, we have O(log n).
+
+To preserve space and, more important, increase cache hit performance,
+equal data values are only stored once, no matter how often a lookup
+index points to them.
+