Adding upstream version 8.2402.0+dfsg.upstream/8.2402.0+dfsg

Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
author: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-15 16:27:18 +0000
committer: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-15 16:27:18 +0000
commit: f7f20c3f5e0be02585741f5f54d198689ccd7866 (patch)
tree: 190d5e080f6cbcc40560b0ceaccfd883cb3faa01 /source/concepts
parent: Initial commit. (diff)
download: rsyslog-doc-f7f20c3f5e0be02585741f5f54d198689ccd7866.tar.xz
rsyslog-doc-f7f20c3f5e0be02585741f5f54d198689ccd7866.zip
10 files changed, 1406 insertions, 0 deletions
diff --git a/source/concepts/index.rst b/source/concepts/index.rst
new file mode 100644
index 0000000..2119e40
--- /dev/null
+++ b/source/concepts/index.rst
@@ -0,0 +1,16 @@
+Concepts
+========
+
+This chapter describes important rsyslog concepts and objects. Where
+appropriate, it also refers to configurations settings to affect
+the respective objects.
+
+.. toctree::
+   :maxdepth: 2
+   
+   queues
+   janitor
+   messageparser
+   multi_ruleset
+   netstrm_drvr
+   
diff --git a/source/concepts/janitor.rst b/source/concepts/janitor.rst
new file mode 100644
index 0000000..52765ed
--- /dev/null
+++ b/source/concepts/janitor.rst
@@ -0,0 +1,41 @@
+The Janitor Process
+===================
+The janitor process carries out periodic cleanup tasks. For example,
+it is used by
+:doc:`omfile <../configuration/modules/omfile>`
+to close files after a timeout has expired.
+
+The janitor runs periodically. As such, all tasks carried out via the
+janitor will be activated based on the interval at which it runs. This
+means that all janitor-related times set are approximate and should be
+considered as "no earlier than" (NET). If, for example, you set a timeout
+to 5 minutes and the janitor is run in 10-minute intervals, the timeout
+may actually happen after 5 minutes, but it may also take up to 20
+minutes for it to be detected.
+
+In general (see note about HUP below), janitor based activities scheduled
+to occur after *n* minutes will occur after *n* and *(n + 2\*janitorInterval)*
+minutes.
+
+To reduce the potential delay caused by janitor invocation,
+:ref:`the interval at which the janitor runs can be be adjusted <global_janitorInterval>`\ .
+If high precision is
+required, it should be set to one minute. Janitor-based activities will
+still be NET times, but the time frame will be much smaller. In the
+example with the file timeout, it would be between 5 and 6 minutes if the
+janitor is run at a one-minute interval.
+
+Note that the more frequent the janitor is run, the more frequent the
+system needs to wakeup from potential low power state. This is no issue
+for data center machines (which usually always run at full speed), but it
+may be an issue for power-constrained environments like notebooks. For
+such systems, a higher janitor interval may make sense.
+
+As a special case, sending a HUP signal to rsyslog also activate the
+janitor process. This can lead to too-frequent wakeups of janitor-related
+services. However, we don't expect this to cause any issues. If it does,
+it could be solved by creating a separate thread for the janitor. But as
+this takes up some system resources and is not not considered useful, we
+have not implemented it that way. If the HUP/janitor interaction causes
+problems, let the rsyslog team know and we can change the implementation.
+
diff --git a/source/concepts/messageparser.rst b/source/concepts/messageparser.rst
new file mode 100644
index 0000000..f9886be
--- /dev/null
+++ b/source/concepts/messageparser.rst
@@ -0,0 +1,304 @@
+Message parsers in rsyslog
+==========================
+
+Written by `Rainer Gerhards <http://www.gerhards.net/rainer>`_
+(2009-11-06)
+
+Intro
+-----
+
+Message parsers are a feature of rsyslog 5.3.4 and above. In this
+article, I describe what message parsers are, what they can do and how
+they relate to the relevant standards. I will also describe what you can
+not do with time. Finally, I give some advice on implementing your own
+custom parser.
+
+What are message parsers?
+-------------------------
+
+Well, the quick answer is that message parsers are the component of
+rsyslog that parses the syslog message after it is being received. Prior
+to rsyslog 5.3.4, message parsers where built in into the rsyslog core
+itself and could not be modified (other than by modifying the rsyslog
+code).
+
+In 5.3.4, we changed that: message parsers are now loadable modules
+(just like input and output modules). That means that new message
+parsers can be added without modifying the rsyslog core, even without
+contributing something back to the project.
+
+But that doesn't answer what a message parser really is. What does it
+mean to "parse a message" and, maybe more importantly, what is a
+message? To answer these questions correctly, we need to dig down into
+the relevant standards. `RFC5424 <http://tools.ietf.org/html/rfc5424>`_
+specifies a layered architecture for the syslog protocol:
+
+.. figure:: rfc5424layers.png
+   :align: center
+   :alt: RFC5424 syslog protocol layers
+
+   RFC5424 syslog protocol layers
+
+For us important is the distinction between the syslog transport and the
+upper layers. The transport layer specifies how a stream of messages is
+assembled at the sender side and how this stream of messages is
+disassembled into the individual messages at the receiver side. In
+networking terminology, this is called "framing". The core idea is that
+each message is put into a so-called "frame", which then is transmitted
+over the communications link.
+
+The framing used is depending on the protocol. For example, in UDP the
+"frame"-equivalent is a packet that is being sent (this also means that
+no two messages can travel within a single UDP packet). In "plain tcp
+syslog", the industry standard, LF is used as a frame delimiter (which
+also means that no multi-line message can properly be transmitted, a
+"design" flaw in plain tcp syslog). In
+`RFC5425 <http://tools.ietf.org/html/rfc5425>`_ there is a header in
+front of each frame that contains the size of the message. With this
+framing, any message content can properly be transferred.
+
+And now comes the important part: **message parsers do NOT operate at
+the transport layer**, they operate, as their name implies, on messages.
+So we can not use message parsers to change the underlying framing. For
+example, if a sender splits (for whatever reason) a single message into
+two and encapsulates these into two frames, there is no way a message
+parser could undo that.
+
+A typical example may be a multi-line message: let's assume some
+originator has generated a message for the format "A\\nB" (where \\n
+means LF). If that message is being transmitted via plain tcp syslog,
+the frame delimiter is LF. So the sender will delimit the frame with LF,
+but otherwise send the message unmodified onto the wire (because that is
+how things are -unfortunately- done in plain tcp syslog...). So wire
+will see "A\\nB\\n". When this arrives at the receiver, the transport
+layer will undo the framing. When it sees the LF after A, it thinks it
+finds a valid frame delimiter (in fact, this is the correct view!). So
+the receive will extract one complete message A and one complete message
+B, not knowing that they once were both part of a large multi-line
+message. These two messages are then passed to the upper layers, where
+the message parsers receive them and extract information. However, the
+message parsers never know (or even have a chance to see) that A and B
+belonged together. Even further, in rsyslog there is no guarantee that A
+will be parsed before B - concurrent operations may cause the reverse
+order (and do so very validly).
+
+The important lesson is: **message parsers can not be used to fix a
+broken framing**. You need a full protocol implementation to do that,
+what is the domain of input and output modules.
+
+I have now told you what you can not do with message parsers. But what
+they are good for? Thankfully, broken framing is not the primary problem
+of the syslog world. A wealth of different formats is. Unfortunately,
+many real-world implementations violate the relevant standards in one
+way or another. That makes it often very hard to extract meaningful
+information from a message or to process messages from different sources
+by the same rules. In my article `syslog parsing in
+rsyslog <syslog_parsing.html>`_ I have elaborated on all the real-world
+evil that you can usually see. So I won't repeat that here. But in
+short, the real problem is not the framing, but how to make malformed
+messages well-looking.
+
+**This is what message parsers permit you to do: take a (well-known)
+malformed message, parse it according to its semantics and generate
+perfectly valid internal message representations from it.** So as long
+as messages are consistently in the same wrong format (and they usually
+are!), a message parser can look at that format, parse it, and make the
+message processable just like it were well formed in the first place.
+Plus, one can abuse the interface to do some other "interesting" tricks,
+but that would take us to far.
+
+While this functionality may not sound exciting, it actually solves a
+very big issue (that you only really understand if you have managed a
+system with various different syslog sources). Note that we were often
+able to process malformed messages in the past with the help of the
+property replacer and regular expressions. While this is nice, it has a
+performance hit. A message parser is a C code, compiled to native
+language, and thus typically much faster than any regular expression
+based method (depending, of course, on the quality of the
+implementation...).
+
+How are message parsers used?
+-----------------------------
+
+In a simplified view, rsyslog
+
+#. first receives messages (via the input module),
+#. *then parses them (at the message level!)* and
+#. then processes them (operating on the internal message
+   representation).
+
+Message parsers are utilized in the second step (written in italics).
+Thus, they take the raw message (NOT frame!) received from the remote
+system and create the internal structure out of it that the other parts
+of rsyslog need in order to perform their processing. Parsing is vital,
+because an unparsed message can not be processed in the third stage, the
+actual application-level processing (like forwarding or writing to
+files).
+
+Parser Chains and how they Operate
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Rsyslog chains parsers together to provide flexibility. A **parser
+chain** contains all parsers that can potentially be used to parse a
+message. It is assumed that there is some way a parser can detect if the
+message it is being presented is supported by it. If so, the parser will
+tell the rsyslog engine and parse the message. The rsyslog engine now
+calls each parser inside the chain (in sequence!) until the first parser
+is able to parse the message. After one parser has been found, the
+message is considered parsed and no others parsers are called on that
+message.
+
+Side-note: this method implies there are some "not-so-dirty" tricks
+available to modify the message by a parser module that declares itself
+as "unable to parse" but still does some message modification. This was
+not a primary design goal, but may be utilized, and the interface
+probably extended, to support generic filter modules. These would need
+to go to the root of the parser chain. As mentioned, the current system
+already supports this.
+
+The position inside the parser chain can be thought of as a priority:
+parser sitting earlier in the chain take precedence over those sitting
+later in it. So more specific parser should go earlier in the chain. A
+good example of how this works is the default parser set provided by
+rsyslog: rsyslog.rfc5424 and rsyslog.rfc3164, each one parses according
+to the rfc that has named it. RFC5424 was designed to be distinguishable
+from RFC3164 message by the sequence "1 " immediately after the
+so-called PRI-part (don't worry about these words, it is sufficient if
+you understand there is a well-defined sequence used to identify RFC5424
+messages). In contrary, RFC3164 actually permits everything as a valid
+message. Thus the RFC3164 parser will always parse a message, sometimes
+with quite unexpected outcome (there is a lot of guesswork involved in
+that parser, which unfortunately is unavoidable due to existing
+technology limits). So the default parser chain is to try the RFC5424
+parser first and after it the RFC3164 parser. If we have a
+5424-formatted message, that parser will identify and parse it and the
+rsyslog engine will stop processing. But if we receive a legacy syslog
+message, the RFC5424 will detect that it can not parse it, return this
+status to the engine which then calls the next parser inside the chain.
+That usually happens to be the RFC3164 parser, which will always process
+the message. But there could also be any other parser inside the chain,
+and then each one would be called unless one that is able to parse can
+be found.
+
+If we reversed the parser order, RFC5424 messages would incorrectly
+parsed. Why? Because the RFC3164 parser will always parse every message,
+so if it were asked first, it would parse (and misinterpret) the
+5424-formatted message, return it did so and the rsyslog engine would
+never call the 5424 parser. So order of sequence is very important.
+
+What happens if no parser in the chain could parse a message? Well, then
+we could not obtain the in-memory representation that is needed to
+further process the message. In that case, rsyslog has no other choice
+than to discard the message. If it does so, it will emit a warning
+message, but only in the first 1,000 incidents. This limit is a safety
+measure against message-loops, which otherwise could quickly result from
+a parser chain misconfiguration. **If you do not tolerate loss of
+unparsable messages, you must ensure that each message can be parsed.**
+You can easily achieve this by always using the "rsyslog-rfc3164" parser
+as the *last* parser inside parser chains. That may result in invalid
+parsing, but you will have a chance to see the invalid message (in debug
+mode, a warning message will be written to the debug log each time a
+message is dropped due to inability to parse it).
+
+Where are parser chains used?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+We now know what parser chains are and how they operate. The question is
+now how many parser chains can be active and how it is decided which
+parser chain is used on which message. This is controlled via
+:doc:`rsyslog's rulesets <multi_ruleset>`. In short, multiple rulesets can be
+defined and there always exist at least one ruleset.
+A parser chain is bound to a
+specific ruleset. This is done by virtue of defining parsers via the
+:doc:`$RulesetParser <../configuration/ruleset/rsconf1_rulesetparser>`
+configuration directive
+(for specifics, see there). If no such directive is specified, the
+default parser chain is used. As of this writing, the default parser
+chain always consists of "rsyslog.rfc5424", "rsyslog.rfc3164", in that
+order. As soon as a parser is configured, the default list is cleared
+and the new parser is added to the end of the (initially empty)
+ruleset's parser chain.
+
+The important point to know is that parser chains are defined on a
+per-ruleset basis.
+
+Can I use different parser chains for different devices?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The correct answer is: generally yes, but it depends. First of all,
+remember that input modules (and specific listeners) may be bound to
+specific rulesets. As parser chains "reside" in rulesets, binding to a
+ruleset also binds to the parser chain that is bound to that ruleset. As
+a number one prerequisite, the input module must support binding to
+different rulesets. Not all do, but their number is growing. For
+example, the important `imudp <imudp.html>`_ and `imtcp <imtcp.html>`_
+input modules support that functionality. Those that do not (for example
+`im3195 <im3195>`_) can only utilize the default ruleset and thus the
+parser chain defined in that ruleset.
+
+If you do not know if the input module in question supports ruleset
+binding, check its documentation page. Those that support it have the
+required directives.
+
+Note that it is currently under evaluation if rsyslog will support
+binding parser chains to specific inputs directly, without depending on
+the ruleset. There are some concerns that this may not be necessary but
+adds considerable complexity to the configuration. So this may or may
+not be possible in the future. In any case, if we decide to add it,
+input modules need to support it, so this functionality would require
+some time to implement.
+
+The cookbook recipe for using different parsers for different devices
+is given as an actual in-depth example in the
+`$RulesetParser` configuration directive
+doc page. In short, it is accomplished by defining specific rulesets for
+the required parser chains, defining different listener ports for each
+of the devices with different format and binding these listeners to the
+correct ruleset (and thus parser chains). Using that approach, a variety
+of different message formats can be supported via a single rsyslog
+instance.
+
+Which message parsers are available
+-----------------------------------
+
+As of this writing, there exist only two message parsers, one for
+RFC5424 format and one for legacy syslog (loosely described in
+`RFC3164 <http://tools.ietf.org/html/rfc3164>`_). These parsers are
+built-in and must not be explicitly loaded. However, message parsers can
+be added with relative ease by anyone knowing to code in C. Then, they
+can be loaded via $ModLoad just like any other loadable module. It is
+expected that the rsyslog project will be contributed additional message
+parsers over time, so that at some point there hopefully is a rich
+choice of them (I intend to add a browsable repository as soon as new
+parsers pop up).
+
+How to write a message parser?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+As a prerequisite, you need to know the exact format that the device is
+sending. Then, you need moderate C coding skills, and a little bit of
+rsyslog internals. I guess the rsyslog specific part should not be that
+hard, as almost all information can be gained from the existing parsers.
+They are rather simple in structure and can be found under the "./tools"
+directory. They are named pmrfc3164.c and pmrfc5424.c. You need to
+follow the usual loadable module guidelines. It is my expectation that
+writing a parser should typically not take longer than a single day,
+with maybe a day more to get acquainted with rsyslog. Of course, I am
+not sure if the number is actually right.
+
+If you can not program or have no time to do it, Adiscon can also write
+a message parser for you as part of the `rsyslog professional services
+offering <http://www.rsyslog.com/professional-services>`_.
+
+Conclusion
+----------
+
+Malformed syslog messages are a pain and unfortunately often seen in
+practice. Message parsers provide a fast and efficient solution for this
+problem. Different parsers can be defined for different devices, and
+they all convert message information into rsyslog's well-defined
+internal format. Message parsers were first introduced in rsyslog 5.3.4
+and also offer some interesting ideas that may be explored in the future
+- up to full message normalization capabilities. It is strongly
+recommended that anyone with a heterogeneous environment take a look at
+message parser capabilities.
diff --git a/source/concepts/multi_ruleset.rst b/source/concepts/multi_ruleset.rst
new file mode 100644
index 0000000..512558e
--- /dev/null
+++ b/source/concepts/multi_ruleset.rst
@@ -0,0 +1,320 @@
+Multiple Rulesets in rsyslog
+============================
+
+Starting with version 4.5.0 and 5.1.1,
+`rsyslog <http://www.rsyslog.com>`_ supports multiple rulesets within a
+single configuration. This is especially useful for routing the
+reception of remote messages to a set of specific rules. Note that the
+input module must support binding to non-standard rulesets, so the
+functionality may not be available with all inputs.
+
+In this document, I am using :doc:`imtcp <../configuration/modules/imtcp>`, an input module that
+supports binding to non-standard rulesets since rsyslog started to
+support them.
+
+What is a Ruleset?
+------------------
+
+If you have worked with (r)syslog.conf, you know that it is made up of
+what I call rules (others tend to call them selectors, a sysklogd term).
+Each rule consist of a filter and one or more actions to be carried out
+when the filter evaluates to true. A filter may be as simple as a
+traditional syslog priority based filter (like "\*.\*" or "mail.info" or
+a as complex as a script-like expression. Details on that are covered in
+the config file documentation. After the filter come action specifiers,
+and an action is something that does something to a message, e.g. write
+it to a file or forward it to a remote logging server.
+
+A traditional configuration file is made up of one or more of these
+rules. When a new message arrives, its processing starts with the first
+rule (in order of appearance in rsyslog.conf) and continues for each
+rule until either all rules have been processed or a so-called "discard"
+action happens, in which case processing stops and the message is thrown
+away (what also happens after the last rule has been processed).
+
+The **multi-ruleset** support now permits to specify more than one such
+rule sequence. You can think of a traditional config file just as a
+single default rule set, which is automatically bound to each of the
+inputs. This is even what actually happens. When rsyslog.conf is
+processed, the config file parser looks for the directive
+
+::
+
+    ruleset(name="rulesetname")
+
+Where name is any name the user likes (but must not start with
+"RSYSLOG\_", which is the name space reserved for rsyslog use). If it
+finds this directive, it begins a new rule set (if the name was not yet
+know) or switches to an already-existing one (if the name was known).
+All rules defined between this $RuleSet directive and the next one are
+appended to the named ruleset. Note that the reserved name
+"RSYSLOG\_DefaultRuleset" is used to specify rsyslogd's default ruleset.
+You can use that name wherever you can use a ruleset name, including
+when binding an input to it.
+
+Inside a ruleset, messages are processed as described above: they start
+with the first rule and rules are processed in the order of appearance
+of the configuration file until either there are no more rules or the
+discard action is executed. Note that with multiple rulesets no longer
+**all** rsyslog.conf rules are executed but **only** those that are
+contained within the specific ruleset.
+
+Inputs must explicitly bind to rulesets. If they don't, the default
+ruleset is bound.
+
+This brings up the next question:
+
+What does "To bind to a Ruleset" mean?
+--------------------------------------
+
+This term is used in the same sense as "to bind an IP address to an
+interface": it means that a specific input, or part of an input (like a
+tcp listener) will use a specific ruleset to "pass its messages to". So
+when a new message arrives, it will be processed via the bound ruleset.
+Rules from all other rulesets are irrelevant and will never be processed.
+
+This makes multiple rulesets very handy to process local and remote
+message via separate means: bind the respective receivers to different
+rule sets, and you do not need to separate the messages by any other
+method.
+
+Binding to rulesets is input-specific. For imtcp, this is done via the
+
+::
+
+    input(type="imptcp" port="514" ruleset="rulesetname")
+
+directive. Note that "rulesetname" must be the name of a ruleset that is
+already defined at the time the bind directive is given. There are many
+ways to make sure this happens, but I personally think that it is best
+to define all rule sets at the top of rsyslog.conf and define the inputs
+at the bottom. This kind of reverses the traditional recommended
+ordering, but seems to be a really useful and straightforward way of
+doing things.
+
+Why are rulesets important for different parser configurations?
+---------------------------------------------------------------
+
+Custom message parsers, used to handle different (and potentially
+otherwise-invalid) message formats, can be bound to rulesets. So
+multiple rulesets can be a very useful way to handle devices sending
+messages in different malformed formats in a consistent way.
+Unfortunately, this is not uncommon in the syslog world. An in-depth
+explanation with configuration sample can be found at the
+:doc:`$RulesetParser <../configuration/ruleset/rsconf1_rulesetparser>` configuration directive.
+
+Can I use a different Ruleset as the default?
+---------------------------------------------
+
+This is possible by using the
+
+::
+
+    $DefaultRuleset <name>
+
+Directive. Please note, however, that this directive is actually global:
+that is, it does not modify the ruleset to which the next input is bound
+but rather provides a system-wide default rule set for those inputs that
+did not explicitly bind to one. As such, the directive can not be used
+as a work-around to bind inputs to non-default rulesets that do not
+support ruleset binding.
+
+Rulesets and Queues
+-------------------
+
+By default, rulesets do not have their own queue. It must be activated
+via the $RulesetCreateMainQueue directive, or if using rainerscript
+format, by specifying queue parameters on the ruleset directive, e.g.
+
+::
+
+   ruleset(name="whatever" queue.type="fixedArray" queue. ...)
+
+See :doc:`http://www.rsyslog.com/doc/master/rainerscript/queue\_parameters.html <../rainerscript/queue_parameters>`
+for more details.
+
+Please note that when a ruleset uses its own queue, processing of the ruleset
+happens **asynchronously** to the rest of processing. As such, any modifications
+made to the message object (e.g. message or local variables that are set) or
+discarding of the message object **have no effect outside that ruleset**. So
+if you want to modify the message object inside the ruleset, you **cannot**
+define a queue for it. Most importantly, you cannot call it and expect the
+modified properties to be present when the call returns. Even more so, the
+call will most probably return before the message is even begun to be processed
+by the ruleset in question.
+
+Note that in RainerScript format specifying any "queue.\*" can cause the
+creation of a dedicated queue and as such asynchronous processing. This is
+because queue parameters cannot be specified without a queue. Note, though,
+that the actual creation is **guaranteed** only if "queue.type" is specified
+as above. So if you intentionally want to assign a separate queue to the
+ruleset, do so as shown above.
+
+Examples
+--------
+
+Split local and remote logging
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Let's say you have a pretty standard system that logs its local messages
+to the usual bunch of files that are specified in the default
+rsyslog.conf. As an example, your rsyslog.conf might look like this:
+
+::
+
+    # ... module loading ...
+    # The authpriv file has restricted access.
+    authpriv.*  /var/log/secure
+    # Log all the mail messages in one place.
+    mail.*      /var/log/maillog
+    # Log cron stuff
+    cron.*      /var/log/cron
+    # Everybody gets emergency messages
+    *.emerg     *
+    ... more ...
+
+Now, you want to add receive messages from a remote system and log these
+to a special file, but you do not want to have these messages written to
+the files specified above. The traditional approach is to add a rule in
+front of all others that filters on the message, processes it and then
+discards it:
+
+::
+
+    # ... module loading ...
+    # process remote messages
+    if $fromhost-ip == '192.0.2.1' then {
+            action(type="omfile" file="/var/log/remotefile02")
+            stop
+        }
+
+
+    # only messages not from 192.0.2.1 make it past this point
+
+    # The authpriv file has restricted access.
+    authpriv.*                            /var/log/secure
+    # Log all the mail messages in one place.
+    mail.*                                /var/log/maillog
+    # Log cron stuff
+    cron.*                                /var/log/cron
+    # Everybody gets emergency messages
+    *.emerg                               *
+    ... more ...
+
+Note that "stop" is the discard action!. Also note that we assume that
+192.0.2.1 is the sole remote sender (to keep it simple).
+
+With multiple rulesets, we can simply define a dedicated ruleset for the
+remote reception case and bind it to the receiver. This may be written
+as follows:
+
+::
+
+    # ... module loading ...
+    # process remote messages
+    # define new ruleset and add rules to it:
+    ruleset(name="remote"){
+        action(type="omfile" file="/var/log/remotefile")
+    }
+    # only messages not from 192.0.2.1 make it past this point
+
+    # bind ruleset to tcp listener and activate it:
+    input(type="imptcp" port="10514" ruleset="remote")
+
+Split local and remote logging for three different ports
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This example is almost like the first one, but it extends it a little
+bit. While it is very similar, I hope it is different enough to provide
+a useful example why you may want to have more than two rulesets.
+
+Again, we would like to use the "regular" log files for local logging,
+only. But this time we set up three syslog/tcp listeners, each one
+listening to a different port (in this example 10514, 10515, and 10516).
+Logs received from these receivers shall go into different files. Also,
+logs received from 10516 (and only from that port!) with "mail.\*"
+priority, shall be written into a specific file and **not** be written to
+10516's general log file.
+
+This is the config:
+
+::
+
+    # ... module loading ...
+    # process remote messages
+
+    ruleset(name="remote10514"){
+        action(type="omfile" file="/var/log/remote10514")
+    }
+
+    ruleset(name="remote10515"){
+        action(type="omfile" file="/var/log/remote10515")
+    }
+
+    ruleset(name="remote10516"){
+        if prifilt("mail.*") then {
+            /var/log/mail10516
+            stop
+            # note that the stop-command will prevent this message from 
+            # being written to the remote10516 file - as usual...   
+        }
+        /var/log/remote10516
+    }
+
+
+    # and now define listeners bound to the relevant ruleset
+    input(type="imptcp" port="10514" ruleset="remote10514")
+    input(type="imptcp" port="10515" ruleset="remote10515")
+    input(type="imptcp" port="10516" ruleset="remote10516")
+
+Performance
+-----------
+
+Fewer Filters
+~~~~~~~~~~~~~
+
+No rule processing can be faster than not processing a rule at all. As
+such, it is useful for a high performance system to identify disjunct
+actions and try to split these off to different rule sets. In the
+example section, we had a case where three different tcp listeners need
+to write to three different files. This is a perfect example of where
+multiple rule sets are easier to use and offer more performance. The
+performance is better simply because there is no need to check the
+reception service - instead messages are automatically pushed to the
+right rule set and can be processed by very simple rules (maybe even
+with "\*.\*"-filters, the fastest ones available).
+
+Partitioning of Input Data
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Starting with rsyslog 5.3.4, rulesets permit higher concurrency. They
+offer the ability to run on their own "main" queue. What that means is
+that a own queue is associated with a specific rule set. That means that
+inputs bound to that ruleset do no longer need to compete with each
+other when they enqueue a data element into the queue. Instead, enqueue
+operations can be completed in parallel.
+
+An example: let us assume we have three TCP listeners. Without rulesets,
+each of them needs to insert messages into the main message queue. So if
+each of them wants to submit a newly arrived message into the queue at
+the same time, only one can do so while the others need to wait. With
+multiple rulesets, its own queue can be created for each ruleset. If now
+each listener is bound to its own ruleset, concurrent message submission
+is possible. On a machine with a sufficiently large number of cores,
+this can result in dramatic performance improvement.
+
+It is highly advised that high-performance systems define a dedicated
+ruleset, with a dedicated queue for each of the inputs.
+
+By default, rulesets do **not** have their own queue. It must be
+activated via the
+:doc:`$RulesetCreateMainQueue <../configuration/ruleset/rsconf1_rulesetcreatemainqueue>`
+directive.
+
+See Also
+--------
+.. toctree::
+   :maxdepth: 1
+   
+   ../historical/multi_ruleset_legacy_format_samples
+   
diff --git a/source/concepts/netstrm_drvr.rst b/source/concepts/netstrm_drvr.rst
new file mode 100644
index 0000000..182fcdf
--- /dev/null
+++ b/source/concepts/netstrm_drvr.rst
@@ -0,0 +1,21 @@
+NetStream Drivers
+=================
+
+Network stream drivers are a layer between various parts of rsyslogd
+(e.g. the imtcp module) and the transport layer. They provide sequenced
+delivery, authentication and confidentiality to the upper layers.
+Drivers implement different capabilities.
+
+Users need to know about netstream drivers because they need to
+configure the proper driver, and proper driver properties, to achieve
+desired results (e.g. a :doc:`../tutorials/tls`).
+
+Current Network Stream Drivers
+------------------------------
+
+.. toctree::
+   :maxdepth: 2
+   
+   ns_ptcp
+   ns_gtls
+   ns_ossl
diff --git a/source/concepts/ns_gtls.rst b/source/concepts/ns_gtls.rst
new file mode 100644
index 0000000..17fe29c
--- /dev/null
+++ b/source/concepts/ns_gtls.rst
@@ -0,0 +1,91 @@
+gtls Network Stream Driver
+==========================
+
+This network stream driver implements a TLS
+protected transport via the `GnuTLS
+library <http://www.gnu.org/software/gnutls/>`_.
+
+**Available since:** 3.19.0 (suggested minimum 3.19.8 and above)
+
+Supported Driver Modes
+======================
+
+-  **0** - unencrypted transmission (just like `ptcp <ns_ptcp.html>`_ driver)
+-  **1** - TLS-protected operation
+
+.. note::
+
+   Mode 0 does not provide any benefit over the ptcp driver. This
+   mode exists for technical reasons, but should not be used. It may be
+   removed in the future.
+
+Supported Authentication Modes
+==============================
+
+-  **anon** - anonymous authentication as described in IETF's
+   draft-ietf-syslog-transport-tls-12 Internet draft
+
+-  **x509/fingerprint** - certificate fingerprint authentication as
+   described in IETF's draft-ietf-syslog-transport-tls-12 Internet draft.
+   The fingerprint must be provided as the SHA1 or the SHA256 hex string of
+   the certificate. Multiple values must be separated by comma (,).
+   A valid configuration would be e.G.
+   ::
+
+      StreamDriverPermittedPeers="SHA256:10:C4:26:1D:CB:3C:AB:12:DB:1A:F0:47:37:AE:6D:D2:DE:66:B5:71:B7:2E:5B:BB:AE:0C:7E:7F:5F:0D:E9:64,SHA1:DD:23:E3:E7:70:F5:B4:13:44:16:78:A5:5A:8C:39:48:53:A6:DD:25"
+
+-  **x509/certvalid** - certificate validation only
+
+-  **x509/name** - certificate validation and subject name authentication as
+   described in IETF's draft-ietf-syslog-transport-tls-12 Internet draft
+
+.. note::
+
+   "anon" does not permit to authenticate the remote peer. As such,
+   this mode is vulnerable to man in the middle attacks as well as
+   unauthorized access. It is recommended NOT to use this mode.
+   A certificate/key does not need to be configured in this authmode.
+
+.. note::
+
+   **Anon mode changes in:** v8.190 (or above)
+
+   -  Anonymous Ciphers (DH and ECDH) are available in ANON mode.
+	Note: ECDH is not available on GnuTLS Version below 3.x.
+   -  Server does not require a certificate anymore in anon mode.
+   -  If Server has a certificate and the Client does not, the highest possible
+      ciphers will be selected.
+   -  If both Server and Client do not have a certificate, the highest available
+      anon cipher will be used.
+
+x509/certvalid is a nonstandard mode. It validates the remote peers
+certificate, but does not check the subject name. This is weak
+authentication that may be useful in scenarios where multiple devices
+are deployed and it is sufficient proof of authenticity when their
+certificates are signed by the CA the server trusts. This is better than
+anon authentication, but still not recommended. **Known Problems**
+
+
+CheckExtendedKeyPurpose
+=======================
+
+-  **off** - by default this binary argument is turned off, which means
+   that Extended Key Usage extension of GNUTls certificates is ignored 
+   in cert validation.
+
+-  **on** - if you turn this option on, it will check that peer's certificate
+   contains the value for GNUTLS_KP_TLS_WWW_SERVER or GNUTLS_KP_TLS_WWW_CLIENT
+   respectively, depending whether we are on sending or receiving end of a
+   connection. 
+
+PrioritizeSAN
+=============
+
+-  **off** - by default this binary argument is turned off, which means
+   that validation of names in certificates goes per older RFC 5280 and either
+   Subject Alternative Name or Common Name match is good and connection is
+   allowed.
+
+-  **on** - if you turn this option on, it will perform stricter name checking
+   as per newer RFC 6125, where, if any SAN is found, contents of CN are 
+   completely ignored and name validity is decided based on SAN only. 
diff --git a/source/concepts/ns_ossl.rst b/source/concepts/ns_ossl.rst
new file mode 100644
index 0000000..cbd002e
--- /dev/null
+++ b/source/concepts/ns_ossl.rst
@@ -0,0 +1,76 @@
+*****************************
+openssl Network Stream Driver
+*****************************
+
+===========================  ===========================================================================
+**Driver Name:**             **ossl**
+**Author:**                  Andre Lorbach <alorbach@adiscon.com>
+**Available since:**         8.36.0
+===========================  ===========================================================================
+
+
+Purpose
+=======
+
+This network stream driver implements a TLS protected transport
+via the `OpenSSL library <https://www.openssl.org/>`_.
+
+
+Supported Driver Modes
+======================
+
+-  **0** - unencrypted transmission (just like `ptcp <ns_ptcp.html>`_ driver)
+-  **1** - TLS-protected operation
+
+.. note::
+
+   Mode 0 does not provide any benefit over the ptcp driver. This
+   mode exists for technical reasons, but should not be used. It may be
+   removed in the future.
+
+
+Supported Authentication Modes
+==============================
+
+-  **anon** - anonymous authentication as described in IETF's
+   draft-ietf-syslog-transport-tls-12 Internet draft
+
+-  **x509/fingerprint** - certificate fingerprint authentication as
+   described in IETF's draft-ietf-syslog-transport-tls-12 Internet draft.
+   The fingerprint must be provided as the SHA1 or the SHA256 hex string of
+   the certificate. Multiple values must be separated by comma (,).
+   A valid configuration would be e.G.
+   ::
+
+      StreamDriverPermittedPeers="SHA256:10:C4:26:1D:CB:3C:AB:12:DB:1A:F0:47:37:AE:6D:D2:DE:66:B5:71:B7:2E:5B:BB:AE:0C:7E:7F:5F:0D:E9:64,SHA1:DD:23:E3:E7:70:F5:B4:13:44:16:78:A5:5A:8C:39:48:53:A6:DD:25"
+
+-  **x509/certvalid** - certificate validation only. x509/certvalid is
+   a nonstandard mode. It validates the remote peers certificate, but
+   does not check the subject name. This is weak authentication that may
+   be useful in scenarios where multiple devices are deployed and it is
+   sufficient proof of authenticity when their certificates are signed by
+   the CA the server trusts. This is better than anon authentication, but
+   still not recommended. **Known Problems**
+
+-  **x509/name** - certificate validation and subject name authentication as
+   described in IETF's draft-ietf-syslog-transport-tls-12 Internet draft
+
+.. note::
+
+   "anon" does not permit to authenticate the remote peer. As such,
+   this mode is vulnerable to man in the middle attacks as well as
+   unauthorized access. It is recommended NOT to use this mode.
+   A certificate / key does not need to be configured in this authmode.
+
+.. note::
+
+   **Anon mode changes in:** v8.190 (or above)
+
+   -  Anonymous Ciphers (DH and ECDH) are available in ANON mode.
+   -  Server does not require a certificate anymore in anon mode.
+   -  If Server has a certificate and the Client does not, the highest possible
+      ciphers will be selected.
+   -  If both Server and Client do not have a certificate, the highest available
+      anon cipher will be used.
+
+
diff --git a/source/concepts/ns_ptcp.rst b/source/concepts/ns_ptcp.rst
new file mode 100644
index 0000000..66d7062
--- /dev/null
+++ b/source/concepts/ns_ptcp.rst
@@ -0,0 +1,13 @@
+ptcp Network Stream Driver
+==========================
+
+This network stream driver implement a plain tcp
+transport without security properties.
+
+Supported Driver Modes
+
+-  0 - unencrypted transmission
+
+Supported Authentication Modes
+
+-  "anon" - no authentication
diff --git a/source/concepts/queues.rst b/source/concepts/queues.rst
new file mode 100644
index 0000000..bb5647c
--- /dev/null
+++ b/source/concepts/queues.rst
@@ -0,0 +1,524 @@
+Understanding rsyslog Queues
+============================
+
+Rsyslog uses queues whenever two activities need to be loosely coupled.
+With a queue, one part of the system "produces" something while another
+part "consumes" this something. The "something" is most often syslog
+messages, but queues may also be used for other purposes.
+
+This document provides a good insight into technical details, operation
+modes and implications. In addition to it, an :doc:`rsyslog queue concepts
+overview <../whitepapers/queues_analogy>` document exists which tries to explain
+queues with the help of some analogies. This may probably be a better
+place to start reading about queues. I assume that once you have
+understood that document, the material here will be much easier to grasp
+and look much more natural.
+
+The most prominent example is the main message queue. Whenever rsyslog
+receives a message (e.g. locally, via UDP, TCP or in whatever else way),
+it places these messages into the main message queue. Later, it is
+dequeued by the rule processor, which then evaluates which actions are
+to be carried out. In front of each action, there is also a queue, which
+potentially de-couples the filter processing from the actual action
+(e.g. writing to file, database or forwarding to another host).
+
+Where are Queues Used?
+----------------------
+
+Currently, queues are used for the main message queue and for the
+actions.
+
+There is a single main message queue inside rsyslog. Each input module
+delivers messages to it. The main message queue worker filters messages
+based on rules specified in rsyslog.conf and dispatches them to the
+individual action queues. Once a message is in an action queue, it is
+deleted from the main message queue.
+
+There are multiple action queues, one for each configured action. By
+default, these queues operate in direct (non-queueing) mode. Action
+queues are fully configurable and thus can be changed to whatever is
+best for the given use case.
+
+Future versions of rsyslog will most probably utilize queues at other
+places, too.
+
+Wherever "*<object>*\ "  is used in the config file statements,
+substitute "*<object>*\ " with either "MainMsg" or "Action". The former
+will set main message queue parameters, the later parameters for the
+next action that will be created. Action queue parameters can not be
+modified once the action has been specified. For example, to tell the
+main message queue to save its content on shutdown, use
+*$MainMsgQueueSaveOnShutdown on*".
+
+If the same parameter is specified multiple times before a queue is
+created, the last one specified takes precedence. The main message queue
+is created after parsing the config file and all of its potential
+includes. An action queue is created each time an action selector is
+specified. Action queue parameters are reset to default after an action
+queue has been created (to provide a clean environment for the next
+action).
+
+Not all queues necessarily support the full set of queue configuration
+parameters, because not all are applicable. For example, disk queues
+always have exactly one worker thread. This cannot be overridden by
+configuration parameters. Tries to do so are ignored.
+
+Queue Modes
+-----------
+
+Rsyslog supports different queue modes, some with submodes. Each of them
+has specific advantages and disadvantages. Selecting the right queue
+mode is quite important when tuning rsyslogd. The queue mode (aka
+"type") is set via the "*$<object>QueueType*\ " config directive.
+
+Direct Queues
+~~~~~~~~~~~~~
+
+Direct queues are **non**-queuing queues. A queue in direct mode does
+neither queue nor buffer any of the queue elements but rather passes the
+element directly (and immediately) from the producer to the consumer.
+This sounds strange, but there is a good reason for this queue type.
+
+Direct mode queues allow to use queues generically, even in places where
+queuing is not always desired. A good example is the queue in front of
+output actions. While it makes perfect sense to buffer forwarding
+actions or database writes, it makes only limited sense to build up a
+queue in front of simple local file writes. Yet, rsyslog still has a
+queue in front of every action. So for file writes, the queue mode can
+simply be set to "direct", in which case no queuing happens.
+
+Please note that a direct queue also is the only queue type that passes
+back the execution return code (success/failure) from the consumer to
+the producer. This, for example, is needed for the backup action logic.
+Consequently, backup actions require the to-be-checked action to use a
+"direct" mode queue.
+
+To create a direct queue, use the "*$<object>QueueType Direct*\ " config
+directive.
+
+Disk Queues
+~~~~~~~~~~~
+
+Disk queues use disk drives for buffering. The important fact is that
+they always use the disk and do not buffer anything in memory. Thus, the
+queue is ultra-reliable, but by far the slowest mode. For regular use
+cases, this queue mode is not recommended. It is useful if log data is
+so important that it must not be lost, even in extreme cases.
+
+When a disk queue is written, it is done in chunks. Each chunk receives
+its individual file. Files are named with a prefix (set via the
+"*$<object>QueueFilename*\ " config directive) and followed by a 7-digit
+number (starting at one and incremented for each file). Chunks are 10mb
+by default, a different size can be set via
+the"*$<object>QueueMaxFileSize*\ " config directive. Note that the size
+limit is not a sharp one: rsyslog always writes one complete queue
+entry, even if it violates the size limit. So chunks are actually a
+little bit (usually less than 1k) larger then the configured size. Each
+chunk also has a different size for the same reason. If you observe
+different chunk sizes, you can relax: this is not a problem.
+
+Writing in chunks is used so that processed data can quickly be deleted
+and is free for other uses - while at the same time keeping no
+artificial upper limit on disk space used. If a disk quota is set
+(instructions further below), be sure that the quota/chunk size allows
+at least two chunks to be written. Rsyslog currently does not check that
+and will fail miserably if a single chunk is over the quota.
+
+Creating new chunks costs performance but provides quicker ability to
+free disk space. The 10mb default is considered a good compromise
+between these two. However, it may make sense to adapt these settings to
+local policies. For example, if a disk queue is written on a dedicated
+200gb disk, it may make sense to use a 2gb (or even larger) chunk size.
+
+Please note, however, that the disk queue by default does not update its
+housekeeping structures every time it writes to disk. This is for
+performance reasons. In the event of failure, data will still be lost
+(except when manually is mangled with the file structures). However,
+disk queues can be set to write bookkeeping information on checkpoints
+(every n records), so that this can be made ultra-reliable, too. If the
+checkpoint interval is set to one, no data can be lost, but the queue is
+exceptionally slow.
+
+Each queue can be placed on a different disk for best performance and/or
+isolation. This is currently selected by specifying different
+*$WorkDirectory* config directives before the queue creation statement.
+
+To create a disk queue, use the "*$<object>QueueType Disk*\ " config
+directive. Checkpoint intervals can be specified via
+"*$<object>QueueCheckpointInterval*\ ", with 0 meaning no checkpoints.
+Note that disk-based queues can be made very reliable by issuing a
+(f)sync after each write operation. Starting with version 4.3.2, this
+can be requested via "*<object>QueueSyncQueueFiles on/off* with the
+default being off. Activating this option has a performance penalty, so
+it should not be turned on without reason.
+
+If you happen to lose or otherwise need the housekeeping structures and 
+have all yours queue chunks you can use perl script included in rsyslog
+package to generate it. 
+Usage: recover_qi.pl -w *$WorkDirectory* -f QueueFileName -d 8 > QueueFileName.qi
+
+
+In-Memory Queues
+~~~~~~~~~~~~~~~~
+
+In-memory queue mode is what most people have on their mind when they
+think about computing queues. Here, the enqueued data elements are held
+in memory. Consequently, in-memory queues are very fast. But of course,
+they do not survive any program or operating system abort (what usually
+is tolerable and unlikely). Be sure to use an UPS if you use in-memory
+mode and your log data is important to you. Note that even in-memory
+queues may hold data for an infinite amount of time when e.g. an output
+destination system is down and there is no reason to move the data out
+of memory (lying around in memory for an extended period of time is NOT
+a reason). Pure in-memory queues can't even store queue elements
+anywhere else than in core memory.
+
+There exist two different in-memory queue modes: LinkedList and
+FixedArray. Both are quite similar from the user's point of view, but
+utilize different algorithms.
+
+A FixedArray queue uses a fixed, pre-allocated array that holds pointers
+to queue elements. The majority of space is taken up by the actual user
+data elements, to which the pointers in the array point. The pointer
+array itself is comparatively small. However, it has a certain memory
+footprint even if the queue is empty. As there is no need to dynamically
+allocate any housekeeping structures, FixedArray offers the best run
+time performance (uses the least CPU cycle). FixedArray is best if there
+is a relatively low number of queue elements expected and performance is
+desired. It is the default mode for the main message queue (with a limit
+of 10,000 elements).
+
+A LinkedList queue is quite the opposite. All housekeeping structures
+are dynamically allocated (in a linked list, as its name implies). This
+requires somewhat more runtime processing overhead, but ensures that
+memory is only allocated in cases where it is needed. LinkedList queues
+are especially well-suited for queues where only occasionally a
+than-high number of elements need to be queued. A use case may be
+occasional message burst. Memory permitting, it could be limited to e.g.
+200,000 elements which would take up only memory if in use. A FixedArray
+queue may have a too large static memory footprint in such cases.
+
+**In general, it is advised to use LinkedList mode if in doubt**. The
+processing overhead compared to FixedArray is low and may be outweighed by
+the reduction in memory use. Paging in most-often-unused pointer array
+pages can be much slower than dynamically allocating them.
+
+To create an in-memory queue, use the "*$<object>QueueType
+LinkedList*\ " or  "*$<object>QueueType FixedArray*\ " config directive.
+
+Disk-Assisted Memory Queues
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If a disk queue name is defined for in-memory queues (via
+*$<object>QueueFileName*), they automatically become "disk-assisted"
+(DA). In that mode, data is written to disk (and read back) on an
+as-needed basis.
+
+Actually, the regular memory queue (called the "primary queue") and a
+disk queue (called the "DA queue") work in tandem in this mode. Most
+importantly, the disk queue is activated if the primary queue is full or
+needs to be persisted on shutdown. Disk-assisted queues combine the
+advantages of pure memory queues with those of  pure disk queues. Under
+normal operations, they are very fast and messages will never touch the
+disk. But if there is need to, an unlimited amount of messages can be
+buffered (actually limited by free disk space only) and data can be
+persisted between rsyslogd runs.
+
+With a DA-queue, both disk-specific and in-memory specific configuration
+parameters can be set. From the user's point of view, think of a DA
+queue like a "super-queue" which does all within a single queue [from
+the code perspective, there is some specific handling for this case, so
+it is actually much like a single object].
+
+DA queues are typically used to de-couple potentially long-running and
+unreliable actions (to make them reliable). For example, it is
+recommended to use a disk-assisted linked list in-memory queue in front
+of each database and "send via tcp" action. Doing so makes these actions
+reliable and de-couples their potential low execution speed from the
+rest of your rules (e.g. the local file writes). There is a howto on
+`massive database inserts <rsyslog_high_database_rate.html>`_ which
+nicely describes this use case. It may even be a good read if you do not
+intend to use databases.
+
+With DA queues, we do not simply write out everything to disk and then
+run as a disk queue once the in-memory queue is full. A much smarter
+algorithm is used, which involves a "high watermark" and a "low
+watermark". Both specify numbers of queued items. If the queue size
+reaches high watermark elements, the queue begins to write data elements
+to disk. It does so until it reaches the low water mark elements. At
+this point, it stops writing until either high water mark is reached
+again or the on-disk queue becomes empty, in which case the queue
+reverts back to in-memory mode, only. While holding at the low
+watermark, new elements are actually enqueued in memory. They are
+eventually written to disk, but only if the high water mark is ever
+reached again. If it isn't, these items never touch the disk. So even
+when a queue runs disk-assisted, there is in-memory data present (this
+is a big difference to pure disk queues!).
+
+This algorithm prevents unnecessary disk writes, but also leaves some
+additional buffer space for message bursts. Remember that creating disk
+files and writing to them is a lengthy operation. It is too lengthy to
+e.g. block receiving UDP messages. Doing so would result in message
+loss. Thus, the queue initiates DA mode, but still is able to receive
+messages and enqueue them - as long as the maximum queue size is not
+reached. The number of elements between the high water mark and the
+maximum queue size serves as this "emergency buffer". Size it according
+to your needs, if traffic is very bursty you will probably need a large
+buffer here. Keep in mind, though, that under normal operations these
+queue elements will probably never be used. Setting the high water mark
+too low will cause disk-assistance to be turned on more often than
+actually needed.
+
+The water marks can be set via the "*$<object>QueueHighWatermark*\ "
+and  "*$<object>QueueLowWatermark*\ " configuration file directives.
+Note that these are actual numbers, not percentages. Be sure they make
+sense (also in respect to "*$<object>QueueSize*\ "). Rsyslogd does
+perform some checks on the numbers provided, and issues warning when
+numbers are "suspicious".
+
+Limiting the Queue Size
+-----------------------
+
+All queues, including disk queues, have a limit of the number of
+elements they can enqueue. This is set via the "*$<object>QueueSize*\ "
+config parameter. Note that the size is specified in number of enqueued
+elements, not their actual memory size. Memory size limits can not be
+set. A conservative assumption is that a single syslog messages takes up
+512 bytes on average (in-memory, NOT on the wire, this \*is\* a
+difference).
+
+Disk assisted queues are special in that they do **not** have any size
+limit. The enqueue an unlimited amount of elements. To prevent running
+out of space, disk and disk-assisted queues can be size-limited via the
+"*$<object>QueueMaxDiskSpace*\ " configuration parameter. If it is not
+set, the limit is only available free space (and reaching this limit is
+currently not very gracefully handled, so avoid running into it!). If a
+limit is set, the queue can not grow larger than it. Note, however, that
+the limit is approximate. The engine always writes complete records. As
+such, it is possible that slightly more than the set limit is used
+(usually less than 1k, given the average message size). Keeping strictly
+on the limit would be a performance hurt, and thus the design decision
+was to favour performance. If you don't like that policy, simply specify
+a slightly lower limit (e.g. 999,999K instead of 1G).
+
+In general, it is a good idea to limit the physical disk space even if
+you dedicate a whole disk to rsyslog. That way, you prevent it from
+running out of space (future version will have an auto-size-limit logic,
+that then kicks in in such situations).
+
+Worker Thread Pools
+-------------------
+
+Each queue (except in "direct" mode) has an associated pool of worker
+threads. Worker threads carry out the action to be performed on the data
+elements enqueued. As an actual sample, the main message queue's worker
+task is to apply filter logic to each incoming message and enqueue them
+to the relevant output queues (actions).
+
+Worker threads are started and stopped on an as-needed basis. On a
+system without activity, there may be no worker at all running. One is
+automatically started when a message comes in. Similarly, additional
+workers are started if the queue grows above a specific size. The
+"*$<object>QueueWorkerThreadMinimumMessages*\ "  config parameter
+controls worker startup. If it is set to the minimum number of elements
+that must be enqueued in order to justify a new worker startup. For
+example, let's assume it is set to 100. As long as no more than 100
+messages are in the queue, a single worker will be used. When more than
+100 messages arrive, a new worker thread is automatically started.
+Similarly, a third worker will be started when there are at least 300
+messages, a forth when reaching 400 and so on.
+
+It, however, does not make sense to have too many worker threads running
+in parallel. Thus, the upper limit can be set via
+"*$<object>QueueWorkerThreads*\ ". If it, for example, is set to four,
+no more than four workers will ever be started, no matter how many
+elements are enqueued.
+
+Worker threads that have been started are kept running until an
+inactivity timeout happens. The timeout can be set via
+"*$<object>QueueWorkerTimeoutThreadShutdown*\ " and is specified in
+milliseconds. If you do not like to keep the workers running, simply set
+it to 0, which means immediate timeout and thus immediate shutdown. But
+consider that creating threads involves some overhead, and this is why
+we keep them running. If you would like to never shutdown any worker
+threads, specify -1 for this parameter.
+
+Discarding Messages
+~~~~~~~~~~~~~~~~~~~
+
+If the queue reaches the so called "discard watermark" (a number of
+queued elements), less important messages can automatically be
+discarded. This is in an effort to save queue space for more important
+messages, which you even less like to lose. Please note that whenever
+there are more than "discard watermark" messages, both newly incoming as
+well as already enqueued low-priority messages are discarded. The
+algorithm discards messages newly coming in and those at the front of
+the queue.
+
+The discard watermark is a last resort setting. It should be set
+sufficiently high, but low enough to allow for large message burst.
+Please note that it take effect immediately and thus shows effect
+promptly - but that doesn't help if the burst mainly consist of
+high-priority messages...
+
+The discard watermark is set via the "*$<object>QueueDiscardMark*\ "
+directive. The priority of messages to be discarded is set via
+"*$<object>QueueDiscardSeverity*\ ". This directive accepts both the
+usual textual severity as well as a numerical one. To understand it, you
+must be aware of the numerical severity values. They are defined in RFC
+3164:
+
+        ==== ========
+        Code Severity
+        ==== ========
+        0    Emergency: system is unusable
+        1    Alert: action must be taken immediately
+        2    Critical: critical conditions
+        3    Error: error conditions
+        4    Warning: warning conditions
+        5    Notice: normal but significant condition
+        6    Informational: informational messages
+        7    Debug: debug-level messages
+        ==== ========
+
+Anything of the specified severity and (numerically) above it is
+discarded. To turn message discarding off, simply specify the discard
+watermark to be higher than the queue size. An alternative is to specify
+the numerical value 8 as DiscardSeverity. This is also the default
+setting to prevent unintentional message loss. So if you would like to
+use message discarding, you need to set"
+*$<object>QueueDiscardSeverity*" to an actual value.
+
+An interesting application is with disk-assisted queues: if the discard
+watermark is set lower than the high watermark, message discarding will
+start before the queue becomes disk-assisted. This may be a good thing
+if you would like to switch to disk-assisted mode only in cases where it
+is absolutely unavoidable and you prefer to discard less important
+messages first.
+
+Filled-Up Queues
+----------------
+
+If the queue has either reached its configured maximum number of entries
+or disk space, it is finally full. If so, rsyslogd throttles the data
+element submitter. If that, for example, is a reliable input (TCP, local
+log socket), that will slow down the message originator which is a good
+resolution for this scenario.
+
+During throttling, a disk-assisted queue continues to write to disk and
+messages are also discarded based on severity as well as regular
+dequeuing and processing continues. So chances are good the situation
+will be resolved by simply throttling. Note, though, that throttling is
+highly undesirable for unreliable sources, like UDP message reception.
+So it is not a good thing to run into throttling mode at all.
+
+We can not hold processing infinitely, not even when throttling. For
+example, throttling the local log socket too long would cause the
+system at whole come to a standstill. To prevent this, rsyslogd times
+out after a configured period ("*$<object>QueueTimeoutEnqueue*\ ",
+specified in milliseconds) if no space becomes available. As a last
+resort, it then discards the newly arrived message.
+
+If you do not like throttling, set the timeout to 0 - the message will
+then immediately be discarded. If you use a high timeout, be sure you
+know what you do. If a high main message queue enqueue timeout is set,
+it can lead to something like a complete hang of the system. The same
+problem does not apply to action queues.
+
+Rate Limiting
+~~~~~~~~~~~~~
+
+Rate limiting provides a way to prevent rsyslogd from processing things
+too fast. It can, for example, prevent overrunning a receiver system.
+
+Currently, there are only limited rate-limiting features available. The
+"*$<object>QueueDequeueSlowdown*\ "  directive allows to specify how
+long (in microseconds) dequeueing should be delayed. While simple, it
+still is powerful. For example, using a DequeueSlowdown delay of 1,000
+microseconds on a UDP send action ensures that no more than 1,000
+messages can be sent within a second (actually less, as there is also
+some time needed for the processing itself).
+
+Processing Timeframes
+~~~~~~~~~~~~~~~~~~~~~
+
+Queues can be set to dequeue (process) messages only during certain
+timeframes. This is useful if you, for example, would like to transfer
+the bulk of messages only during off-peak hours, e.g. when you have only
+limited bandwidth on the network path to the central server.
+
+Currently, only a single timeframe is supported and, even worse, it can
+only be specified by the hour. It is not hard to extend rsyslog's
+capabilities in this regard - it was just not requested so far. So if
+you need more fine-grained control, let us know and we'll probably
+implement it. There are two configuration directives, both should be
+used together or results are unpredictable:"
+*$<object>QueueDequeueTimeBegin <hour>*"
+and "*$<object>QueueDequeueTimeEnd <hour>*\ ". The hour parameter must
+be specified in 24-hour format (so 10pm is 22). A use case for this
+parameter can be found in the `rsyslog
+wiki <http://wiki.rsyslog.com/index.php/OffPeakHours>`_.
+
+Performance
+~~~~~~~~~~~
+
+The locking involved with maintaining the queue has a potentially large
+performance impact. How large this is, and if it exists at all, depends
+much on the configuration and actual use case. However, the queue is
+able to work on so-called "batches" when dequeueing data elements. With
+batches, multiple data elements are dequeued at once (with a single
+locking call). The queue dequeues all available elements up to a
+configured upper limit (*<object>DequeueBatchSize <number>*). It is
+important to note that the actual upper limit is dictated by
+availability. The queue engine will never wait for a batch to fill. So
+even if a high upper limit is configured, batches may consist of fewer
+elements, even just one, if there are no more elements waiting in the
+queue.
+
+Batching can improve performance considerably. Note, however, that it
+affects the order in which messages are passed to the queue worker
+threads, as each worker now receive as batch of messages. Also, the
+larger the batch size and the higher the maximum number of permitted
+worker threads, the more main memory is needed. For a busy server, large
+batch sizes (around 1,000 or even more elements) may be useful. Please
+note that with batching, the main memory must hold BatchSize \*
+NumOfWorkers objects in memory (worst-case scenario), even if running in
+disk-only mode. So if you use the default 5 workers at the main message
+queue and set the batch size to 1,000, you need to be prepared that the
+main message queue holds up to 5,000 messages in main memory **in
+addition** to the configured queue size limits!
+
+The queue object's default maximum batch size is eight, but there exists
+different defaults for the actual parts of rsyslog processing that
+utilize queues. So you need to check these object's defaults.
+
+Terminating Queues
+~~~~~~~~~~~~~~~~~~
+
+Terminating a process sounds easy, but can be complex. Terminating a
+running queue is in fact the most complex operation a queue object can
+perform. You don't see that from a user's point of view, but its quite
+hard work for the developer to do everything in the right order.
+
+The complexity arises when the queue has still data enqueued when it
+finishes. Rsyslog tries to preserve as much of it as possible. As a
+first measure, there is a regular queue time out
+("*$<object>QueueTimeoutShutdown*\ ", specified in milliseconds): the
+queue workers are given that time period to finish processing the queue.
+
+If after that period there is still data in the queue, workers are
+instructed to finish the current data element and then terminate. This
+essentially means any other data is lost. There is another timeout
+("*$<object>QueueTimeoutActionCompletion*\ ", also specified in
+milliseconds) that specifies how long the workers have to finish the
+current element. If that timeout expires, any remaining workers are
+cancelled and the queue is brought down.
+
+If you do not like to lose data on shutdown, the
+"*$<object>QueueSaveOnShutdown*\ " parameter can be set to "on". This
+requires either a disk or disk-assisted queue. If set, rsyslogd ensures
+that any queue elements are saved to disk before it terminates. This
+includes data elements there were begun being processed by workers that
+needed to be cancelled due to too-long processing. For a large queue,
+this operation may be lengthy. No timeout applies to a required shutdown
+save.
diff --git a/source/concepts/rfc5424layers.png b/source/concepts/rfc5424layers.png
new file mode 100644
index 0000000..70192cc
--- /dev/null
+++ b/source/concepts/rfc5424layers.png
author	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-15 16:27:18 +0000
committer	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-15 16:27:18 +0000
commit	f7f20c3f5e0be02585741f5f54d198689ccd7866 (patch)
tree	190d5e080f6cbcc40560b0ceaccfd883cb3faa01 /source/concepts
parent	Initial commit. (diff)
download	rsyslog-doc-f7f20c3f5e0be02585741f5f54d198689ccd7866.tar.xz rsyslog-doc-f7f20c3f5e0be02585741f5f54d198689ccd7866.zip