summaryrefslogtreecommitdiffstats
path: root/source/whitepapers/queues_analogy.rst
blob: 7d5fbacf0d342dbeea80478f19781048b41cdf84 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
Turning Lanes and Rsyslog Queues
================================

If there is a single object absolutely vital to understanding the way
rsyslog works, this object is queues. Queues offer a variety of
services, including support for multithreading. While there is elaborate
in-depth documentation on the ins and outs of :doc:`rsyslog queues
<../concepts/queues>`, some of the concepts are hard to grasp even for
experienced people. I think this is because rsyslog uses a very high
layer of abstraction which includes things that look quite unnatural,
like queues that do **not** actually queue...

With this document, I take a different approach: I will not describe
every specific detail of queue operation but hope to be able to provide
the core idea of how queues are used in rsyslog by using an analogy. I
will compare the rsyslog data flow with real-life traffic flowing at an
intersection.

But first let's set the stage for the rsyslog part. The graphic below
describes the data flow inside rsyslog:

.. figure:: dataflow.png
   :align: center
   :alt: rsyslog data flow

   rsyslog data flow

Note that there is a `video
tutorial <http://www.rsyslog.com/Article350.phtml>`_ available on the
data flow. It is not perfect, but may aid in understanding this picture.

For our needs, the important fact to know is that messages enter rsyslog
on "the left side" (for example, via UDP), are preprocessed, put
into the so-called main queue, taken off that queue, filtered and are
placed into one or several action queues (depending on filter results).
They leave rsyslog on "the right side" where output modules (like the
file or database writer) consume them.

So there are always **two** stages where a message (conceptually) is
queued - first in the main queue and later on in *n* action specific
queues (with *n* being the number of actions that the message in
question needs to be processed by, what is being decided by the "Filter
Engine"). As such, a message will be in at least two queues during its
lifetime (with the exception of messages being discarded by the queue
itself, but for the purpose of this document, we will ignore that
possibility).

Also, it is vitally important to understand that **each** action has a
queue sitting in front of it. If you have dug into the details of
rsyslog configuration, you have probably seen that a queue mode can be
set for each action. And the default queue mode is the so-called "direct
mode", in which "the queue does not actually enqueue data". That sounds
silly, but is not. It is an important abstraction that helps keep the
code clean.

To understand this, we first need to look at who is the active
component. In our data flow, the active part always sits to the left of
the object. For example, the "Preprocessor" is being called by the
inputs and calls itself into the main message queue. That is, the queue
receiver is called, it is passive. One might think that the "Parser &
Filter Engine" is an active component that actively pulls messages from
the queue. This is wrong! Actually, it is the queue that has a pool of
worker threads, and these workers pull data from the queue and then call
the passively waiting Parser and Filter Engine with those messages. So
the main message queue is the active part, the Parser and Filter Engine
is passive.

Let's now try an analogy analogy for this part: Think about a TV show.
The show is produced in some TV studio, from there sent (actively) to a
radio tower. The radio tower passively receives from the studio and then
actively sends out a signal, which is passively received by your TV set.
In our simplified view, we have the following picture:

.. figure:: queue_analogy_tv.png
   :align: center
   :alt: rsyslog queues and TV analogy

   rsyslog queues and TV analogy

The lower part of the picture lists the equivalent rsyslog entities, in
an abstracted way. Every queue has a producer (in the above sample the
input) and a consumer (in the above sample the Parser and Filter
Engine). Their active and passive functions are equivalent to the TV
entities that are listed on top of the rsyslog entity. For example, a
rsyslog consumer can never actively initiate reception of a message in
the same way a TV set cannot actively "initiate" a TV show - both can
only "handle" (display or process) what is sent to them.

Now let's look at the action queues: here, the active part, the
producer, is the Parser and Filter Engine. The passive part is the
Action Processor. The latter does any processing that is necessary to
call the output plugin, in particular it processes the template to
create the plugin calling parameters (either a string or vector of
arguments). From the action queue's point of view, Action Processor and
Output form a single entity. Again, the TV set analogy holds. The Output
**does not** actively ask the queue for data, but rather passively waits
until the queue itself pushes some data to it.

Armed with this knowledge, we can now look at the way action queue modes
work. My analogy here is a junction, as shown below (note that the
colors in the pictures below are **not** related to the colors in the
pictures above!):

.. figure:: direct_queue0.png
   :align: center
   :alt: 

This is a very simple real-life traffic case: one road joins another. We
look at traffic on the straight road, here shown by blue and green
arrows. Traffic in the opposing direction is shown in blue. Traffic
flows without any delays as long as nobody takes turns. To be more
precise, if the opposing traffic takes a (right) turn, traffic still
continues to flow without delay. However, if a car in the red traffic
flow intends to do a (left, then) turn, the situation changes:

.. figure:: direct_queue1.png
   :align: center
   :alt: 

The turning car is represented by the green arrow. It cannot turn unless
there is a gap in the "blue traffic stream". And as this car blocks the
roadway, the remaining traffic (now shown in red, which should indicate
the block condition), must wait until the "green" car has made its turn.
So a queue will build up on that lane, waiting for the turn to be
completed. Note that in the examples below I do not care that much about
the properties of the opposing traffic. That is, because its structure
is not really important for what I intend to show. Think about the blue
arrow as being a traffic stream that most of the time blocks
left-turners, but from time to time has a gap that is sufficiently large
for a left-turn to complete.

Our road network designers know that this may be unfortunate, and for
more important roads and junctions, they came up with the concept of
turning lanes:

.. figure:: direct_queue2.png
   :align: center
   :alt: 

Now, the car taking the turn can wait in a special area, the turning
lane. As such, the "straight" traffic is no longer blocked and can flow
in parallel to the turning lane (indicated by a now-green-again arrow).

However, the turning lane offers only finite space. So if too many cars
intend to take a left turn, and there is no gap in the "blue" traffic,
we end up with this well-known situation:

.. figure:: direct_queue3.png
   :align: center
   :alt: 

The turning lane is now filled up, resulting in a tailback of cars
intending to left turn on the main driving lane. The end result is that
"straight" traffic is again being blocked, just as in our initial
problem case without the turning lane. In essence, the turning lane has
provided some relief, but only for a limited amount of cars. Street
system designers now try to weight cost vs. benefit and create (costly)
turning lanes that are sufficiently large to prevent traffic jams in
most, but not all cases.

**Now let's dig a bit into the mathematical properties of turning
lanes.** We assume that cars all have the same length. So, units of
cars, the length is always one (which is nice, as we don't need to care
about that factor any longer ;)). A turning lane has finite capacity of
*n* cars. As long as the number of cars wanting to take a turn is less
than or equal to *n*, "straight traffic" is not blocked (or the other way
round, traffic is blocked if at least *n + 1* cars want to take a
turn!). We can now find an optimal value for *n*: it is a function of
the probability that a car wants to turn and the cost of the turning
lane (as well as the probability there is a gap in the "blue" traffic,
but we ignore this in our simple sample). If we start from some finite
upper bound of *n*, we can decrease *n* to a point where it reaches
zero. But let's first look at *n = 1*, in which case exactly one car can
wait on the turning lane. More than one car, and the rest of the traffic
is blocked. Our everyday logic indicates that this is actually the
lowest boundary for *n*.

In an abstract view, however, *n* can be zero and that works nicely.
There still can be *n* cars at any given time on the turning lane, it
just happens that this means there can be no car at all on it. And, as
usual, if we have at least *n + 1* cars wanting to turn, the main
traffic flow is blocked. True, but *n + 1 = 0 + 1 = 1* so as soon as
there is any car wanting to take a turn, the main traffic flow is
blocked (remember, in all cases, I assume no sufficiently large gaps in
the opposing traffic).

This is the situation our everyday perception calls "road without
turning lane". In my math model, it is a "road with turning lane of size
0". The subtle difference is important: my math model guarantees that,
in an abstract sense, there always is a turning lane, it may just be too
short. But it exists, even though we don't see it. And now I can claim
that even in my small home village, all roads have turning lanes, which
is rather impressive, isn't it? ;)

**And now we finally have arrived at rsyslog's queues!** Rsyslog action
queues exists for all actions just like all roads in my village have
turning lanes! And as in this real-life sample, it may be hard to see
the action queues for that reason. In rsyslog, the "direct" queue mode
is the equivalent to the 0-sized turning lane. And actions queues are
the equivalent to turning lanes in general, with our real-life *n* being
the maximum queue size. The main traffic line (which sometimes is
blocked) is the equivalent to the main message queue. And the periods
without gaps in the opposing traffic are equivalent to execution time of
an action. In a rough sketch, the rsyslog main and action queues look
like in the following picture.

.. figure:: direct_queue_rsyslog.png
   :align: center
   :alt: 

We need to read this picture from right to left (otherwise I would need
to redo all the graphics ;)). In action 3, you see a 0-sized turning
lane, aka an action queue in "direct" mode. All other queues are run in
non-direct modes, but with different sizes greater than 0.

Let us first use our car analogy: Assume we are in a car on the main
lane that wants to take turn into the "action 4" road. We pass action 1,
where a number of cars wait in the turning lane and we pass action 2,
which has a slightly smaller, but still not filled up turning lane. So
we pass that without delay, too. Then we come to "action 3", which has
no turning lane. Unfortunately, the car in front of us wants to turn
left into that road, so it blocks the main lane. So, this time we need
to wait. An observer standing on the sidewalk may see that while we need
to wait, there are still some cars in the "action 4" turning lane. As
such, even though no new cars can arrive on the main lane, cars still
turn into the "action 4" lane. In other words, an observer standing in
"action 4" road is unable to see that traffic on the main lane is
blocked.

Now on to rsyslog: Other than in the real-world traffic example,
messages in rsyslog can - at more or less the same time - "take turns"
into several roads at once. This is done by duplicating the message if
the road has a non-zero-sized "turning lane" - or in rsyslog terms a
queue that is running in any non-direct mode. If so, a deep copy of the
message object is made, that placed into the action queue and then the
initial message proceeds on the "main lane". The action queue then
pushes the duplicates through action processing. This is also the reason
why a discard action inside a non-direct queue does not seem to have an
effect. Actually, it discards the copy that was just created, but the
original message object continues to flow.

In action 1, we have some entries in the action queue, as we have in
action 2 (where the queue is slightly shorter). As we have seen, new
messages pass action one and two almost instantaneously. However, when a
messages reaches action 3, its flow is blocked. Now, message processing
must wait for the action to complete. Processing flow in a direct mode
queue is something like a U-turn:

.. figure:: direct_queue_directq.png
   :align: center
   :alt: message processing in an rsyslog action queue in direct mode

   message processing in an rsyslog action queue in direct mode

The message starts to execute the action and once this is done,
processing flow continues. In a real-life analogy, this may be the route
of a delivery man who needs to drop a parcel in a side street before he
continues driving on the main route. As a side-note, think of what
happens with the rest of the delivery route, at least for today, if the
delivery truck has a serious accident in the side street. The rest of
the parcels won't be delivered today, will they? This is exactly how the
discard action works. It drops the message object inside the action and
thus the message will no longer be available for further delivery - but
as I said, only if the discard is done in a direct mode queue (I am
stressing this example because it often causes a lot of confusion).

Back to the overall scenario. We have seen that messages need to wait
for action 3 to complete. Does this necessarily mean that at the same
time no messages can be processed in action 4? Well, it depends. As in
the real-life scenario, action 4 will continue to receive traffic as
long as its action queue ("turn lane") is not drained. In our drawing,
it is not. So action 4 will be executed while messages still wait for
action 3 to be completed.

Now look at the overall picture from a slightly different angle:

.. figure:: direct_queue_rsyslog2.png
   :align: center
   :alt: message processing in an rsyslog action queue in direct mode

   message processing in an rsyslog action queue in direct mode

The number of all connected green and red arrows is four - one each for
action 1, 2 and 4 (this one is dotted as action 4 was a special case)
and one for the "main lane" as well as action 3 (this one contains the
sole red arrow). **This number is the lower bound for the number of
threads in rsyslog's output system ("right-hand part" of the main
message queue)!** Each of the connected arrows is a continuous thread
and each "turn lane" is a place where processing is forked onto a new
thread. Also, note that in action 3 the processing is carried out on the
main thread, but not in the non-direct queue modes.

I have said this is "the lower bound for the number of threads...". This
is with good reason: the main queue may have more than one worker thread
(individual action queues currently do not support this, but could do in
the future - there are good reasons for that, too but exploring why
would finally take us away from what we intend to see). Note that you
configure an upper bound for the number of main message queue worker
threads. The actual number varies depending on a lot of operational
variables, most importantly the number of messages inside the queue. The
number *t\_m* of actually running threads is within the integer-interval
[0,confLimit] (with confLimit being the operator configured limit, which
defaults to 5). Output plugins may have more than one thread created by
themselves. It is quite unusual for an output plugin to create such
threads and so I assume we do not have any of these. Then, the overall
number of threads in rsyslog's filtering and output system is *t\_total
= t\_m + number of actions in non-direct modes*. Add the number of
inputs configured to that and you have the total number of threads
running in rsyslog at a given time (assuming again that inputs utilize
only one thread per plugin, a not-so-safe assumption).

A quick side-note: I gave the lower bound for *t\_m* as zero, which is
somewhat in contrast to what I wrote at the beginning of the last paragraph.
Zero is actually correct, because rsyslog stops all worker threads when
there is no work to do. This is also true for the action queues. So the
ultimate lower bound for a rsyslog output system without any work to
carry out actually is zero. But this bound will never be reached when
there is continuous flow of activity. And, if you are curious: if the
number of workers is zero, the worker wakeup process is actually handled
within the threading context of the "left-hand-side" (or producer) of
the queue. After being started, the worker begins to play the active
queue component again. All of this, of course, can be overridden with
configuration directives.

When looking at the threading model, one can simply add n lanes to the
main lane but otherwise retain the traffic analogy. This is a very good
description of the actual process (think what this means to the "turning
lanes"; hint: there still is only one per action!).

**Let's try to do a warp-up:** I have hopefully been able to show that
in rsyslog, an action queue "sits in front of" each output plugin.
Messages are received and flow, from input to output, over various
stages and two level of queues to the outputs. Actions queues are always
present, but may not easily be visible when in direct mode (where no
actual queuing takes place). The "road junction with turning lane"
analogy well describes the way - and intent - of the various queue
levels in rsyslog.

On the output side, the queue is the active component, **not** the
consumer. As such, the consumer cannot ask the queue for anything (like
n number of messages) but rather is activated by the queue itself. As
such, a queue somewhat resembles a "living thing" whereas the outputs
are just tools that this "living thing" uses.

**Note that I left out a couple of subtleties**, especially when it
comes to error handling and terminating a queue (you hopefully have now
at least a rough idea why I say "terminating **a queue**" and not
"terminating an action" - *who is the "living thing"?*). An action
returns a status to the queue, but it is the queue that ultimately
decides which messages can finally be considered processed and which
not. Please note that the queue may even cancel an output right in the
middle of its action. This happens, if configured, if an output needs
more than a configured maximum processing time and is a guard condition
to prevent slow outputs from deferring a rsyslog restart for too long.
Especially in this case re-queuing and cleanup is not trivial. Also,
note that I did not discuss disk-assisted queue modes. The basic rules
apply, but there are some additional constraints, especially in regard
to the threading model. Transitioning between actual disk-assisted mode
and pure-in-memory-mode (which is done automatically when needed) is
also far from trivial and a real joy for an implementer to work on ;).

If you have not done so before, it may be worth reading
:doc:`Understanding rsyslog Queues <../concepts/queues>`, which most
importantly lists all the knobs you can turn to tweak queue operation.