diff options
Diffstat (limited to 'source/whitepapers/queues_analogy.rst')
-rw-r--r-- | source/whitepapers/queues_analogy.rst | 364 |
1 files changed, 364 insertions, 0 deletions
diff --git a/source/whitepapers/queues_analogy.rst b/source/whitepapers/queues_analogy.rst new file mode 100644 index 0000000..7d5fbac --- /dev/null +++ b/source/whitepapers/queues_analogy.rst @@ -0,0 +1,364 @@ +Turning Lanes and Rsyslog Queues +================================ + +If there is a single object absolutely vital to understanding the way +rsyslog works, this object is queues. Queues offer a variety of +services, including support for multithreading. While there is elaborate +in-depth documentation on the ins and outs of :doc:`rsyslog queues +<../concepts/queues>`, some of the concepts are hard to grasp even for +experienced people. I think this is because rsyslog uses a very high +layer of abstraction which includes things that look quite unnatural, +like queues that do **not** actually queue... + +With this document, I take a different approach: I will not describe +every specific detail of queue operation but hope to be able to provide +the core idea of how queues are used in rsyslog by using an analogy. I +will compare the rsyslog data flow with real-life traffic flowing at an +intersection. + +But first let's set the stage for the rsyslog part. The graphic below +describes the data flow inside rsyslog: + +.. figure:: dataflow.png + :align: center + :alt: rsyslog data flow + + rsyslog data flow + +Note that there is a `video +tutorial <http://www.rsyslog.com/Article350.phtml>`_ available on the +data flow. It is not perfect, but may aid in understanding this picture. + +For our needs, the important fact to know is that messages enter rsyslog +on "the left side" (for example, via UDP), are preprocessed, put +into the so-called main queue, taken off that queue, filtered and are +placed into one or several action queues (depending on filter results). +They leave rsyslog on "the right side" where output modules (like the +file or database writer) consume them. + +So there are always **two** stages where a message (conceptually) is +queued - first in the main queue and later on in *n* action specific +queues (with *n* being the number of actions that the message in +question needs to be processed by, what is being decided by the "Filter +Engine"). As such, a message will be in at least two queues during its +lifetime (with the exception of messages being discarded by the queue +itself, but for the purpose of this document, we will ignore that +possibility). + +Also, it is vitally important to understand that **each** action has a +queue sitting in front of it. If you have dug into the details of +rsyslog configuration, you have probably seen that a queue mode can be +set for each action. And the default queue mode is the so-called "direct +mode", in which "the queue does not actually enqueue data". That sounds +silly, but is not. It is an important abstraction that helps keep the +code clean. + +To understand this, we first need to look at who is the active +component. In our data flow, the active part always sits to the left of +the object. For example, the "Preprocessor" is being called by the +inputs and calls itself into the main message queue. That is, the queue +receiver is called, it is passive. One might think that the "Parser & +Filter Engine" is an active component that actively pulls messages from +the queue. This is wrong! Actually, it is the queue that has a pool of +worker threads, and these workers pull data from the queue and then call +the passively waiting Parser and Filter Engine with those messages. So +the main message queue is the active part, the Parser and Filter Engine +is passive. + +Let's now try an analogy analogy for this part: Think about a TV show. +The show is produced in some TV studio, from there sent (actively) to a +radio tower. The radio tower passively receives from the studio and then +actively sends out a signal, which is passively received by your TV set. +In our simplified view, we have the following picture: + +.. figure:: queue_analogy_tv.png + :align: center + :alt: rsyslog queues and TV analogy + + rsyslog queues and TV analogy + +The lower part of the picture lists the equivalent rsyslog entities, in +an abstracted way. Every queue has a producer (in the above sample the +input) and a consumer (in the above sample the Parser and Filter +Engine). Their active and passive functions are equivalent to the TV +entities that are listed on top of the rsyslog entity. For example, a +rsyslog consumer can never actively initiate reception of a message in +the same way a TV set cannot actively "initiate" a TV show - both can +only "handle" (display or process) what is sent to them. + +Now let's look at the action queues: here, the active part, the +producer, is the Parser and Filter Engine. The passive part is the +Action Processor. The latter does any processing that is necessary to +call the output plugin, in particular it processes the template to +create the plugin calling parameters (either a string or vector of +arguments). From the action queue's point of view, Action Processor and +Output form a single entity. Again, the TV set analogy holds. The Output +**does not** actively ask the queue for data, but rather passively waits +until the queue itself pushes some data to it. + +Armed with this knowledge, we can now look at the way action queue modes +work. My analogy here is a junction, as shown below (note that the +colors in the pictures below are **not** related to the colors in the +pictures above!): + +.. figure:: direct_queue0.png + :align: center + :alt: + +This is a very simple real-life traffic case: one road joins another. We +look at traffic on the straight road, here shown by blue and green +arrows. Traffic in the opposing direction is shown in blue. Traffic +flows without any delays as long as nobody takes turns. To be more +precise, if the opposing traffic takes a (right) turn, traffic still +continues to flow without delay. However, if a car in the red traffic +flow intends to do a (left, then) turn, the situation changes: + +.. figure:: direct_queue1.png + :align: center + :alt: + +The turning car is represented by the green arrow. It cannot turn unless +there is a gap in the "blue traffic stream". And as this car blocks the +roadway, the remaining traffic (now shown in red, which should indicate +the block condition), must wait until the "green" car has made its turn. +So a queue will build up on that lane, waiting for the turn to be +completed. Note that in the examples below I do not care that much about +the properties of the opposing traffic. That is, because its structure +is not really important for what I intend to show. Think about the blue +arrow as being a traffic stream that most of the time blocks +left-turners, but from time to time has a gap that is sufficiently large +for a left-turn to complete. + +Our road network designers know that this may be unfortunate, and for +more important roads and junctions, they came up with the concept of +turning lanes: + +.. figure:: direct_queue2.png + :align: center + :alt: + +Now, the car taking the turn can wait in a special area, the turning +lane. As such, the "straight" traffic is no longer blocked and can flow +in parallel to the turning lane (indicated by a now-green-again arrow). + +However, the turning lane offers only finite space. So if too many cars +intend to take a left turn, and there is no gap in the "blue" traffic, +we end up with this well-known situation: + +.. figure:: direct_queue3.png + :align: center + :alt: + +The turning lane is now filled up, resulting in a tailback of cars +intending to left turn on the main driving lane. The end result is that +"straight" traffic is again being blocked, just as in our initial +problem case without the turning lane. In essence, the turning lane has +provided some relief, but only for a limited amount of cars. Street +system designers now try to weight cost vs. benefit and create (costly) +turning lanes that are sufficiently large to prevent traffic jams in +most, but not all cases. + +**Now let's dig a bit into the mathematical properties of turning +lanes.** We assume that cars all have the same length. So, units of +cars, the length is always one (which is nice, as we don't need to care +about that factor any longer ;)). A turning lane has finite capacity of +*n* cars. As long as the number of cars wanting to take a turn is less +than or equal to *n*, "straight traffic" is not blocked (or the other way +round, traffic is blocked if at least *n + 1* cars want to take a +turn!). We can now find an optimal value for *n*: it is a function of +the probability that a car wants to turn and the cost of the turning +lane (as well as the probability there is a gap in the "blue" traffic, +but we ignore this in our simple sample). If we start from some finite +upper bound of *n*, we can decrease *n* to a point where it reaches +zero. But let's first look at *n = 1*, in which case exactly one car can +wait on the turning lane. More than one car, and the rest of the traffic +is blocked. Our everyday logic indicates that this is actually the +lowest boundary for *n*. + +In an abstract view, however, *n* can be zero and that works nicely. +There still can be *n* cars at any given time on the turning lane, it +just happens that this means there can be no car at all on it. And, as +usual, if we have at least *n + 1* cars wanting to turn, the main +traffic flow is blocked. True, but *n + 1 = 0 + 1 = 1* so as soon as +there is any car wanting to take a turn, the main traffic flow is +blocked (remember, in all cases, I assume no sufficiently large gaps in +the opposing traffic). + +This is the situation our everyday perception calls "road without +turning lane". In my math model, it is a "road with turning lane of size +0". The subtle difference is important: my math model guarantees that, +in an abstract sense, there always is a turning lane, it may just be too +short. But it exists, even though we don't see it. And now I can claim +that even in my small home village, all roads have turning lanes, which +is rather impressive, isn't it? ;) + +**And now we finally have arrived at rsyslog's queues!** Rsyslog action +queues exists for all actions just like all roads in my village have +turning lanes! And as in this real-life sample, it may be hard to see +the action queues for that reason. In rsyslog, the "direct" queue mode +is the equivalent to the 0-sized turning lane. And actions queues are +the equivalent to turning lanes in general, with our real-life *n* being +the maximum queue size. The main traffic line (which sometimes is +blocked) is the equivalent to the main message queue. And the periods +without gaps in the opposing traffic are equivalent to execution time of +an action. In a rough sketch, the rsyslog main and action queues look +like in the following picture. + +.. figure:: direct_queue_rsyslog.png + :align: center + :alt: + +We need to read this picture from right to left (otherwise I would need +to redo all the graphics ;)). In action 3, you see a 0-sized turning +lane, aka an action queue in "direct" mode. All other queues are run in +non-direct modes, but with different sizes greater than 0. + +Let us first use our car analogy: Assume we are in a car on the main +lane that wants to take turn into the "action 4" road. We pass action 1, +where a number of cars wait in the turning lane and we pass action 2, +which has a slightly smaller, but still not filled up turning lane. So +we pass that without delay, too. Then we come to "action 3", which has +no turning lane. Unfortunately, the car in front of us wants to turn +left into that road, so it blocks the main lane. So, this time we need +to wait. An observer standing on the sidewalk may see that while we need +to wait, there are still some cars in the "action 4" turning lane. As +such, even though no new cars can arrive on the main lane, cars still +turn into the "action 4" lane. In other words, an observer standing in +"action 4" road is unable to see that traffic on the main lane is +blocked. + +Now on to rsyslog: Other than in the real-world traffic example, +messages in rsyslog can - at more or less the same time - "take turns" +into several roads at once. This is done by duplicating the message if +the road has a non-zero-sized "turning lane" - or in rsyslog terms a +queue that is running in any non-direct mode. If so, a deep copy of the +message object is made, that placed into the action queue and then the +initial message proceeds on the "main lane". The action queue then +pushes the duplicates through action processing. This is also the reason +why a discard action inside a non-direct queue does not seem to have an +effect. Actually, it discards the copy that was just created, but the +original message object continues to flow. + +In action 1, we have some entries in the action queue, as we have in +action 2 (where the queue is slightly shorter). As we have seen, new +messages pass action one and two almost instantaneously. However, when a +messages reaches action 3, its flow is blocked. Now, message processing +must wait for the action to complete. Processing flow in a direct mode +queue is something like a U-turn: + +.. figure:: direct_queue_directq.png + :align: center + :alt: message processing in an rsyslog action queue in direct mode + + message processing in an rsyslog action queue in direct mode + +The message starts to execute the action and once this is done, +processing flow continues. In a real-life analogy, this may be the route +of a delivery man who needs to drop a parcel in a side street before he +continues driving on the main route. As a side-note, think of what +happens with the rest of the delivery route, at least for today, if the +delivery truck has a serious accident in the side street. The rest of +the parcels won't be delivered today, will they? This is exactly how the +discard action works. It drops the message object inside the action and +thus the message will no longer be available for further delivery - but +as I said, only if the discard is done in a direct mode queue (I am +stressing this example because it often causes a lot of confusion). + +Back to the overall scenario. We have seen that messages need to wait +for action 3 to complete. Does this necessarily mean that at the same +time no messages can be processed in action 4? Well, it depends. As in +the real-life scenario, action 4 will continue to receive traffic as +long as its action queue ("turn lane") is not drained. In our drawing, +it is not. So action 4 will be executed while messages still wait for +action 3 to be completed. + +Now look at the overall picture from a slightly different angle: + +.. figure:: direct_queue_rsyslog2.png + :align: center + :alt: message processing in an rsyslog action queue in direct mode + + message processing in an rsyslog action queue in direct mode + +The number of all connected green and red arrows is four - one each for +action 1, 2 and 4 (this one is dotted as action 4 was a special case) +and one for the "main lane" as well as action 3 (this one contains the +sole red arrow). **This number is the lower bound for the number of +threads in rsyslog's output system ("right-hand part" of the main +message queue)!** Each of the connected arrows is a continuous thread +and each "turn lane" is a place where processing is forked onto a new +thread. Also, note that in action 3 the processing is carried out on the +main thread, but not in the non-direct queue modes. + +I have said this is "the lower bound for the number of threads...". This +is with good reason: the main queue may have more than one worker thread +(individual action queues currently do not support this, but could do in +the future - there are good reasons for that, too but exploring why +would finally take us away from what we intend to see). Note that you +configure an upper bound for the number of main message queue worker +threads. The actual number varies depending on a lot of operational +variables, most importantly the number of messages inside the queue. The +number *t\_m* of actually running threads is within the integer-interval +[0,confLimit] (with confLimit being the operator configured limit, which +defaults to 5). Output plugins may have more than one thread created by +themselves. It is quite unusual for an output plugin to create such +threads and so I assume we do not have any of these. Then, the overall +number of threads in rsyslog's filtering and output system is *t\_total += t\_m + number of actions in non-direct modes*. Add the number of +inputs configured to that and you have the total number of threads +running in rsyslog at a given time (assuming again that inputs utilize +only one thread per plugin, a not-so-safe assumption). + +A quick side-note: I gave the lower bound for *t\_m* as zero, which is +somewhat in contrast to what I wrote at the beginning of the last paragraph. +Zero is actually correct, because rsyslog stops all worker threads when +there is no work to do. This is also true for the action queues. So the +ultimate lower bound for a rsyslog output system without any work to +carry out actually is zero. But this bound will never be reached when +there is continuous flow of activity. And, if you are curious: if the +number of workers is zero, the worker wakeup process is actually handled +within the threading context of the "left-hand-side" (or producer) of +the queue. After being started, the worker begins to play the active +queue component again. All of this, of course, can be overridden with +configuration directives. + +When looking at the threading model, one can simply add n lanes to the +main lane but otherwise retain the traffic analogy. This is a very good +description of the actual process (think what this means to the "turning +lanes"; hint: there still is only one per action!). + +**Let's try to do a warp-up:** I have hopefully been able to show that +in rsyslog, an action queue "sits in front of" each output plugin. +Messages are received and flow, from input to output, over various +stages and two level of queues to the outputs. Actions queues are always +present, but may not easily be visible when in direct mode (where no +actual queuing takes place). The "road junction with turning lane" +analogy well describes the way - and intent - of the various queue +levels in rsyslog. + +On the output side, the queue is the active component, **not** the +consumer. As such, the consumer cannot ask the queue for anything (like +n number of messages) but rather is activated by the queue itself. As +such, a queue somewhat resembles a "living thing" whereas the outputs +are just tools that this "living thing" uses. + +**Note that I left out a couple of subtleties**, especially when it +comes to error handling and terminating a queue (you hopefully have now +at least a rough idea why I say "terminating **a queue**" and not +"terminating an action" - *who is the "living thing"?*). An action +returns a status to the queue, but it is the queue that ultimately +decides which messages can finally be considered processed and which +not. Please note that the queue may even cancel an output right in the +middle of its action. This happens, if configured, if an output needs +more than a configured maximum processing time and is a guard condition +to prevent slow outputs from deferring a rsyslog restart for too long. +Especially in this case re-queuing and cleanup is not trivial. Also, +note that I did not discuss disk-assisted queue modes. The basic rules +apply, but there are some additional constraints, especially in regard +to the threading model. Transitioning between actual disk-assisted mode +and pure-in-memory-mode (which is done automatically when needed) is +also far from trivial and a real joy for an implementer to work on ;). + +If you have not done so before, it may be worth reading +:doc:`Understanding rsyslog Queues <../concepts/queues>`, which most +importantly lists all the knobs you can turn to tweak queue operation. |