diff options
Diffstat (limited to '')
-rw-r--r-- | source/development/config_data_model.rst | 173 | ||||
-rw-r--r-- | source/development/debugging.rst | 45 | ||||
-rw-r--r-- | source/development/dev_codestyle.rst | 15 | ||||
-rw-r--r-- | source/development/dev_oplugins.rst | 371 | ||||
-rw-r--r-- | source/development/dev_queue.rst | 299 | ||||
-rw-r--r-- | source/development/dev_testbench.rst | 20 | ||||
-rw-r--r-- | source/development/generic_design.rst | 146 | ||||
-rw-r--r-- | source/development/index.rst | 7 | ||||
-rw-r--r-- | source/development/queueWorkerLogic.jpg | bin | 0 -> 59405 bytes |
9 files changed, 1076 insertions, 0 deletions
diff --git a/source/development/config_data_model.rst b/source/development/config_data_model.rst new file mode 100644 index 0000000..0bff2af --- /dev/null +++ b/source/development/config_data_model.rst @@ -0,0 +1,173 @@ +The rsyslog config data model +============================= + +This document describes the config data model on a high layer. +For details, it is suggested to review the actual source code. +The aim of this document is to provide general understanding for +both rsyslog developers as well as developers writing config +management systems. + +Objects +======= +Most config objects live in a flat space and are global to rsyslog. +However, actual rule processing is done via a script-like language. +These config scripts need to be represented via a tree structure. + +Note that the language as currently implemented is Turing-complete +if the user makes use of very tricky constructs. It was never our +intention to provide a Turing-complete language and we will probably +try to disable these tricks in the future. However, this is not a +priority for us, as these users get what they deserve. For someone +involved with the config, it probably is sufficient to know that +loops are **not** supported by the config language (even though you +can create loop-like structures). Thus, a tree is fully sufficient +to represent any configuration. + +In the following sections, we'll quickly describe variables/properties, +flat structure elements and the execution tree. + +Variables/Properties +-------------------- +Rsyslog supports + +* traditional syslog (RFC-based) message properties +* structured data content, including any non-syslog properties +* Variables + + - global + - local + - message-enhancing (like message properties) + +A description of these properties and variables is available elsewhere. As +far as a config processor is concerned, the important thing to know is +that they be used during template definitions and script operations. + +Flat Elements +------------- + +Global Parameters +^^^^^^^^^^^^^^^^^ +This element must contain all global parameters settable by rsyslog. +This includes elements from the global() as well as main_queue() config +statements. As of this writing, some global parameter can only be set +by legacy statements. + +Note that main_queue() actually is a full queue definition. + +Modules +^^^^^^^ +This contains all loaded modules, among others: + +* input modules +* output modules +* message modification modules +* message parsers + +Note that for historical reasons some output modules are directly linked +into rsyslog and must not be specified. + +Each module must be given only once. The data object must contain all +module-global parameters. + +Inputs +^^^^^^ +Describes all defined inputs with their parameters. Is build from the +input() statement or its legacy equivalent (ugly). Contains links to + +* module used for input +* ruleset used for processing + +Rulesets +^^^^^^^^ +They contain the tree-like execution structure. However, rulesets +itself are flat and cannot be nested. Note that there exists statements +that permit rulesets to call into each other, but all rulesets are in +the same flat top-level space. + +Note that a ruleset has an associated queue object which (by default) +operates in direct mode. As a reminder, direct queues do not queue or +buffer any of the queue elements. In most cases this is sufficient, +but if the ruleset is bound to an input or is used to run +multiple actions independently (e.g., forwarding messages to two +destinations), then you should configure the associated queue object +as a real queue. + +See the :doc:`Understanding rsyslog Queues <../concepts/queues>` or +:doc:`Turning Lanes and Rsyslog Queues <../whitepapers/queues_analogy>` docs +for more information. + +Hierarchical Elements +--------------------- +These are used for rule execution. They are somewhat hard to fit into a +traditional config scheme, as they provide full tree-like branching +structure. + +Basically, a tree consists of statements and evaluations. Consider the +ruleset to be the root of the execution tree. It is rather common that +the tree's main level is a long linked list, with only actions being +branched out. This, for example, happens with a traditional +rsyslog.conf setting, which only contains files to be written based +on some priority filters. However, one must not be tricked into +thinking that this basic case is sufficient to support as enterprise +users typically create far more complex cases. + +In essence, rsyslog walks the tree, and executes statements while it +does so. Usually, a filter needs to be evaluated and execution branches +based on the filter outcome. The tree actually **is** an AST. + +Execution Statements +^^^^^^^^^^^^^^^^^^^^ +These are most easy to implement as they are end nodes (and as such +nothing can be nested under them). They are most importantly created by +the action() config object, but also with statements like "set" +and "unset". Note that "call" is also considered a terminal node, even +though it executes *another* ruleset. + +Note that actions have associated queues, so a queue object and its +parameter need to be present. When building configurations interactively, +it is suggested that the default is either not to configure queue parameters +by default or to do this only for actions where it makes sense (e.g. +connection to remote systems which may go offline). + +Expression Evaluation +^^^^^^^^^^^^^^^^^^^^^ +A full expression evaluation engine is available who does the typical +programming-language type of expression processing. The usual mathematical, +boolean and string operations are supported, as well as functions. As of +this writing, functions are hard-coded into rsyslog but may in the future +be part of a loadable module. Evaluations can access all rsyslog properties +and variables. They may be nested arbitrarily deep. + +Control-of-Flow Statements +^^^^^^^^^^^^^^^^^^^^^^^^^^ +Remember that rsyslog does intentionally not support loop statements. So +control-of-flow boils down to + +* conditional statements + + - "if ... then ... else ..." + - syslog PRI-based filters + - property-based filters + +* stop + +Where "stop" terminates processing of this message. The conditional statements +contain subbranches, where "if" contains both "then" and "else" subbranches +and the other two only the "then" subbranch (Note: inside the execution +engine, the others may also have "else" branches, but these are result +of the rsyslog config optimizer run and cannot configured by the user). + +When executing a config script, rsyslog executes the subbranch in question +and then continues to evaluate the next statement in the currently +executing branch that contained the conditional statement. If there is no +next statement, it goes up one layer. This is continued until the last +statement of the root statement list is reached. At that point execution +of the message is terminated and the message object destructed. +Again, think AST, as this is exactly what it is. + +Note on Queue Objects +--------------------- +Queue objects are **not** named objects inside the rsyslog configuration. +So their data is always contained with the object that uses the queue +(action(), ruleset(), main_queue()). From a UI perspective, this +unfortunately tends to complicate a config builder a bit. diff --git a/source/development/debugging.rst b/source/development/debugging.rst new file mode 100644 index 0000000..9b3a46d --- /dev/null +++ b/source/development/debugging.rst @@ -0,0 +1,45 @@ +Debugging +========= + +**Author:** Pascal Withopf <pascalwithopf1@gmail.com> + +Target audience are developers and users who need to debug an error with tests. +For debugging with rsyslog.conf see :doc:`troubleshooting <../troubleshooting/index>`. + +Debugging with tests +-------------------- + +| When you want to solve a specific problem you will probably create a test +| and want to debug with it instead of configuring rsyslog. If you want to +| write a debug log you need to open the file **../rsyslog/tests/diag.sh** +| and delete the **#** in front of the two lines: + +| **export RSYSLOG_DEBUG="debug nologfuncflow noprintmutexaction nostdout"** +| **export RSYSLOG_DEBUGLOG="log"** + +| A debug log will be written now, but remember to put the **#** back again +| before committing your changes. Otherwise it won't work. + +Memory debugging +---------------- + +| You can't use multiple memory debugger at the same time. This will resort +| in errors. Also remember to undo all changes in diag.sh after you are done, +| because it will also resort in errors if you commit them with your work. + +Valgrind +~~~~~~~~ + +| If you want to use Valgrind you need to enable it for tests. +| To do that open the file **../rsyslog/tests/diag.sh** and delete the **#** +| in front of the line: +| **valgrind="valgrind --malloc-fill=ff --free-fill=fe --log-fd=1"** +| This will enable valgrind and you will have extra debugging in your test-suite.log file. + +Address sanitizer +~~~~~~~~~~~~~~~~~ + +| If you want to use address sanitizer you need to set your CFLAGS. Use this command: +| **export CFLAGS="-g -fsanitizer=address"** +| After this is done you need to configure and build rsyslog again, otherwise it won't work. + diff --git a/source/development/dev_codestyle.rst b/source/development/dev_codestyle.rst new file mode 100644 index 0000000..f5407a2 --- /dev/null +++ b/source/development/dev_codestyle.rst @@ -0,0 +1,15 @@ +rsyslog code style +================== + +**Note**: code style is still under construction. This guide lists +some basic style requirements. + +**Code that does not match the code style guide will not pass CI testing.** + +The following is required right now: + + * we use ANSI C99 + * indentation is done with tabs, not spaces + * trailing whitespace in lines is not permitted + * lines longer than 120 characters are not permitted; + everything over 120 chars is rejected and must be reformatted. diff --git a/source/development/dev_oplugins.rst b/source/development/dev_oplugins.rst new file mode 100644 index 0000000..5fb7baf --- /dev/null +++ b/source/development/dev_oplugins.rst @@ -0,0 +1,371 @@ +Writing Rsyslog Output Plugins +============================== + +This page is the begin of some developer documentation for writing +output plugins. Doing so is quite easy (and that was a design goal), but +there currently is only sparse documentation on the process available. I +was tempted NOT to write this guide here because I know I will most +probably not be able to write a complete guide. + +However, I finally concluded that it may be better to have some +information and pointers than to have nothing. + +Getting Started and Samples +--------------------------- + +The best to get started with rsyslog plugin development is by looking at +existing plugins. All that start with "om" are **o**\ utput +**m**\ odules. That means they are primarily thought of being message +sinks. In theory, however, output plugins may aggregate other +functionality, too. Nobody has taken this route so far so if you would +like to do that, it is highly suggested to post your plan on the rsyslog +mailing list, first (so that we can offer advice). + +The rsyslog distribution tarball contains the omstdout plugin which is +extremely well targeted for getting started. Just note that this plugin +itself is not meant for production use. But it is very simplistic and so +a really good starting point to grasp the core ideas. + +In any case, you should also read the comments in +./runtime/module-template.h. Output plugins are built based on a large +set of code-generating macros. These macros handle most of the plumbing +needed by the interface. As long as no special callback to rsyslog is +needed (it typically is not), an output plugin does not really need to +be aware that it is executed by rsyslog. As a plug-in programmer, you +can (in most cases) "code as usual". However, all macros and entry +points need to be provided and thus reading the code comments in the +files mentioned is highly suggested. + +For testing, you need rsyslog's debugging support. Some useful +information is given in "`troubleshooting rsyslog <troubleshoot.html>`_ +from the doc set. + +Special Topics +-------------- + +Threading +~~~~~~~~~ + +Rsyslog uses massive parallel processing and multithreading. However, a +plugin's entry points are guaranteed to be never called concurrently +**for the same action**. That means your plugin must be able to be +called concurrently by two or more threads, but you can be sure that for +the same instance no concurrent calls happen. This is guaranteed by the +interface specification and the rsyslog core guards against multiple +concurrent calls. An instance, in simple words, is one that shares a +single instanceData structure. + +So as long as you do not mess around with global data, you do not need +to think about multithreading (and can apply a purely sequential +programming methodology). + +Please note that during the configuration parsing stage of execution, +access to global variables for the configuration system is safe. In that +stage, the core will only call sequentially into the plugin. + +Getting Message Data +~~~~~~~~~~~~~~~~~~~~ + +The doAction() entry point of your plugin is provided with messages to +be processed. It will only be activated after filtering and all other +conditions, so you do not need to apply any other conditional but can +simply process the message. + +Note that you do NOT receive the full internal representation of the +message object. There are various (including historical) reasons for +this and, among others, this is a design decision based on security. + +Your plugin will only receive what the end user has configured in a +$template statement. However, starting with 4.1.6, there are two ways of +receiving the template content. The default mode, and in most cases +sufficient and optimal, is to receive a single string with the expanded +template. As I said, this is usually optimal, think about writing things +to files, emailing content or forwarding it. + +The important philosophy is that a plugin should **never** reformat any +of such strings - that would either remove the user's ability to fully +control message formats or it would lead to duplicating code that is +already present in the core. If you need some formatting that is not yet +present in the core, suggest it to the rsyslog project, best done by +sending a patch ;), and we will try hard to get it into the core (so +far, we could accept all such suggestions - no promise, though). + +If a single string seems not suitable for your application, the plugin +can also request access to the template components. The typical use case +seems to be databases, where you would like to access properties via +specific fields. With that mode, you receive a char \*\* array, where +each array element points to one field from the template (from left to +right). Fields start at array index 0 and a NULL pointer means you have +reached the end of the array (the typical Unix "poor man's linked list +in an array" design). Note, however, that each of the individual +components is a string. It is not a date stamp, number or whatever, but +a string. This is because rsyslog processes strings (from a high-level +design look at it) and so this is the natural data type. Feel free to +convert to whatever you need, but keep in mind that malformed packets +may have lead to field contents you'd never expected... + +If you like to use the array-based parameter passing method, think that +it is only available in rsyslog 4.1.6 and above. If you can accept that +your plugin will not be working with previous versions, you do not need +to handle pre 4.1.6 cases. However, it would be "nice" if you shut down +yourself in these cases - otherwise the older rsyslog core engine will +pass you a string where you expect the array of pointers, what most +probably results in a segfault. To check whether or not the core +supports the functionality, you can use this code sequence: + +:: + + + BEGINmodInit() + rsRetVal localRet; + rsRetVal (*pomsrGetSupportedTplOpts)(unsigned long *pOpts); + unsigned long opts; + int bArrayPassingSupported; /* does core support template passing as an array? */ + CODESTARTmodInit + *ipIFVersProvided = CURR_MOD_IF_VERSION; /* we only support the current interface specification */ + CODEmodInit_QueryRegCFSLineHdlr + /* check if the rsyslog core supports parameter passing code */ + bArrayPassingSupported = 0; + localRet = pHostQueryEtryPt((uchar*)"OMSRgetSupportedTplOpts", &pomsrGetSupportedTplOpts); + if(localRet == RS_RET_OK) { + /* found entry point, so let's see if core supports array passing */ + CHKiRet((*pomsrGetSupportedTplOpts)(&opts)); + if(opts & OMSR_TPL_AS_ARRAY) + bArrayPassingSupported = 1; + } else if(localRet != RS_RET_ENTRY_POINT_NOT_FOUND) { + ABORT_FINALIZE(localRet); /* Something else went wrong, what is not acceptable */ + } + DBGPRINTF("omstdout: array-passing is %ssupported by rsyslog core.\n", bArrayPassingSupported ? "" : "not "); + + if(!bArrayPassingSupported) { + DBGPRINTF("rsyslog core too old, shutting down this plug-in\n"); + ABORT_FINALIZE(RS_RET_ERR); + } + +The code first checks if the core supports the OMSRgetSupportedTplOpts() +API (which is also not present in all versions!) and, if so, queries the +core if the OMSR\_TPL\_AS\_ARRAY mode is supported. If either does not +exits, the core is too old for this functionality. The sample snippet +above then shuts down, but a plugin may instead just do things +different. In omstdout, you can see how a plugin may deal with the +situation. + +**In any case, it is recommended that at least a graceful shutdown is +made and the array-passing capability not blindly be used.** In such +cases, we can not guard the plugin from segfaulting and if the plugin +(as currently always) is run within rsyslog's process space, that +results in a segfault for rsyslog. So do not do this. + +Another possible mode is OMSR\_TPL\_AS\_JSON, where instead of the +template a json-c memory object tree is passed to the module. The module +can extract data via json-c API calls. It MUST NOT modify the provided +structure. This mode is primarily aimed at plugins that need to process +tree-like data, as found for example in MongoDB or ElasticSearch. + +Batching of Messages +~~~~~~~~~~~~~~~~~~~~ + +Starting with rsyslog 4.3.x, batching of output messages is supported. +Previously, only a single-message interface was supported. + +With the **single message** plugin interface, each message is passed via +a separate call to the plugin. Most importantly, the rsyslog engine +assumes that each call to the plugin is a complete transaction and as +such assumes that messages be properly committed after the plugin returns +to the engine. + +With the **batching** interface, rsyslog employs something along the +line of "transactions". Obviously, the rsyslog core can not make +non-transactional outputs to be fully transactional. But what it can is +support that the output tells the core which messages have been committed +by the output and which not yet. The core can than take care of those +uncommitted messages when problems occur. For example, if a plugin has +received 50 messages but not yet told the core that it committed them, +and then returns an error state, the core assumes that all these 50 +messages were **not** written to the output. The core then requeues all +50 messages and does the usual retry processing. Once the output plugin +tells the core that it is ready again to accept messages, the rsyslog +core will provide it with these 50 not yet committed messages again +(actually, at this point, the rsyslog core no longer knows that it is +re-submitting the messages). If, in contrary, the plugin had told rsyslog +that 40 of these 50 messages were committed (before it failed), then only +10 would have been requeued and resubmitted. + +In order to provide an efficient implementation, there are some (mild) +constraints in that transactional model: first of all, rsyslog itself +specifies the ultimate transaction boundaries. That is, it tells the +plugin when a transaction begins and when it must finish. The plugin is +free to commit messages in between, but it **must** commit all work done +when the core tells it that the transaction ends. All messages passed in +between a begin and end transaction notification are called a batch of +messages. They are passed in one by one, just as without transaction +support. Note that batch sizes are variable within the range of 1 to a +user configured maximum limit. Most importantly, that means that plugins +may receive batches of single messages, so they are required to commit +each message individually. If the plugin tries to be "smarter" than the +rsyslog engine and does not commit messages in those cases (for +example), the plugin puts message stream integrity at risk: once rsyslog +has notified the plugin of transaction end, it discards all messages as +it considers them committed and save. If now something goes wrong, the +rsyslog core does not try to recover lost messages (and keep in mind +that "goes wrong" includes such uncontrollable things like connection +loss to a database server). So it is highly recommended to fully abide +to the plugin interface details, even though you may think you can do it +better. The second reason for that is that the core engine will have +configuration settings that enable the user to tune commit rate to their +use-case specific needs. And, as a relief: why would rsyslog ever decide +to use batches of one? There is a trivial case and that is when we have +very low activity so that no queue of messages builds up, in which case +it makes sense to commit work as it arrives. (As a side-note, there are +some valid cases where a timeout-based commit feature makes sense. This +is also under evaluation and, once decided, the core will offer an +interface plus a way to preserve message stream integrity for +properly-crafted plugins). + +The second restriction is that if a plugin makes commits in between +(what is perfectly legal) those commits must be in-order. So if a commit +is made for message ten out of 50, this means that messages one to nine +are also committed. It would be possible to remove this restriction, but +we have decided to deliberately introduce it to simplify things. + +Output Plugin Transaction Interface +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In order to keep compatible with existing output plugins (and because it +introduces no complexity), the transactional plugin interface is build +on the traditional non-transactional one. Well... actually the +traditional interface was transactional since its introduction, in the +sense that each message was processed in its own transaction. + +So the current +``doAction() entry point can be considered to have this structure (from the transactional interface point of view):`` + +:: + + doAction() + { + beginTransaction() + ProcessMessage() + endTransaction() + } + +For the **transactional interface**, we now move these implicit +``beginTransaction()`` and ``endTransaction(()`` call out of the message +processing body, resulting is such a structure: + +:: + + beginTransaction() + { + /* prepare for transaction */ + } + + doAction() + { + ProcessMessage() + /* maybe do partial commits */ + } + + endTransaction() + { + /* commit (rest of) batch */ + } + +And this calling structure actually is the transactional interface! It +is as simple as this. For the new interface, the core calls a +``beginTransaction()`` entry point inside the plugin at the start of the +batch. Similarly, the core call ``endTransaction()`` at the end of the +batch. The plugin must implement these entry points according to its +needs. + +But how does the core know when to use the old or the new calling +interface? This is rather easy: when loading a plugin, the core queries +the plugin for the ``beginTransaction()`` and ``endTransaction()`` entry +points. If the plugin supports these, the new interface is used. If the +plugin does not support them, the old interface is used and rsyslog +implies that a commit is done after each message. Note that there is no +special "downlevel" handling necessary to support this. In the case of +the non-transactional interface, rsyslog considers each completed call +to ``doAction`` as partial commit up to the current message. So +implementation inside the core is very straightforward. + +Actually, **we recommend that the transactional entry points only be +defined by those plugins that actually need them**. All others should +not define them in which case the default commit behaviour inside +rsyslog will apply (thus removing complexity from the plugin). + +In order to support partial commits, special return codes must be +defined for ``doAction``. All those return codes mean that processing +completed successfully. But they convey additional information about the +commit status as follows: + ++----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| *RS\_RET\_OK* | The record and all previous inside the batch has been committed. *Note:* this definition is what makes integrating plugins without the transaction being/end calls so easy - this is the traditional "success" return state and if every call returns it, there is no need for actually calling ``endTransaction()``, because there is no transaction open). | ++----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| *RS\_RET\_DEFER\_COMMIT* | The record has been processed, but is not yet committed. This is the expected state for transactional-aware plugins. | ++----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| *RS\_RET\_PREVIOUS\_COMMITTED* | The **previous** record inside the batch has been committed, but the current one not yet. This state is introduced to support sources that fill up buffers and commit once a buffer is completely filled. That may occur halfway in the next record, so it may be important to be able to tell the engine the everything up to the previous record is committed | ++----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +Note that the typical **calling cycle** is ``beginTransaction()``, +followed by *n* times ``doAction()`` followed by ``endTransaction()``. +However, if either ``beginTransaction()`` or ``doAction()`` return back +an error state (including RS\_RET\_SUSPENDED), then the transaction is +considered aborted. In result, the remaining calls in this cycle (e.g. +``endTransaction()``) are never made and a new cycle (starting with +``beginTransaction()`` is begun when processing resumes. So an output +plugin must expect and handle those partial cycles gracefully. + +**The question remains how can a plugin know if the core supports +batching?** First of all, even if the engine would not know it, the +plugin would return with RS\_RET\_DEFER\_COMMIT, what then would be +treated as an error by the engine. This would effectively disable the +output, but cause no further harm (but may be harm enough in itself). + +The real solution is to enable the plugin to query the rsyslog core if +this feature is supported or not. At the time of the introduction of +batching, no such query-interface exists. So we introduce it with that +release. What the means is if a rsyslog core can not provide this query +interface, it is a core that was build before batching support was +available. So the absence of a query interface indicates that the +transactional interface is not available. One might now be tempted to +think there is no need to do the actual check, but is is recommended to +ask the rsyslog engine explicitly if the transactional interface is +present and will be honored. This enables us to create versions in the +future which have, for whatever reason we do not yet know, no support +for this interface. + +The logic to do these checks is contained in the ``INITChkCoreFeature`` +macro, which can be used as follows: + +:: + + INITChkCoreFeature(bCoreSupportsBatching, CORE_FEATURE_BATCHING); + +Here, bCoreSupportsBatching is a plugin-defined integer which after +execution is 1 if batches (and thus the transactional interface) is +supported and 0 otherwise. CORE\_FEATURE\_BATCHING is the feature we are +interested in. Future versions of rsyslog may contain additional +feature-test-macros (you can see all of them in ./runtime/rsyslog.h). + +Note that the ompsql output plugin supports transactional mode in a +hybrid way and thus can be considered good example code. + +Open Issues +----------- + +- Processing errors handling +- reliable re-queue during error handling and queue termination + +Licensing +~~~~~~~~~ + +From the rsyslog point of view, plugins constitute separate projects. As +such, we think plugins are not required to be compatible with GPLv3. +However, this is not legal advise. If you intend to release something +under a non-GPLV3 compatible license it is probably best to consult with +your lawyer. + +Most importantly, and this is definite, the rsyslog team does not expect +or require you to contribute your plugin to the rsyslog project (but of +course we are happy if you do). diff --git a/source/development/dev_queue.rst b/source/development/dev_queue.rst new file mode 100644 index 0000000..fb5a286 --- /dev/null +++ b/source/development/dev_queue.rst @@ -0,0 +1,299 @@ +The rsyslog queue object +======================== + +This page reflects the status as of 2008-01-17. The documentation is +still incomplete. Target audience is developers and users who would like +to get an in-depth understanding of queues as used in +`rsyslog <http://www.rsyslog.com/>`_. + +**Please note that this document is outdated and does not longer reflect +the specifics of the queue object. However, I have decided to leave it +in the doc set, as the overall picture provided still is quite OK. I +intend to update this document somewhat later when I have reached the +"store-and-forward" milestone.** + +Some definitions +---------------- + +A queue is DA-enabled if it is configured to use disk-assisted mode when +there is need to. A queue is in DA mode (or DA run mode), when it +actually runs disk assisted. + +Implementation Details +---------------------- + +Disk-Assisted Mode +~~~~~~~~~~~~~~~~~~ + +Memory-Type queues may utilize disk-assisted (DA) mode. DA mode is +enabled whenever a queue file name prefix is provided. This is called +DA-enabled mode. If DA-enabled, the queue operates as a regular memory +queue until a high water mark is reached. If that happens, the queue +activates disk assistance (called "runs disk assisted" or "runs DA" - +you can find that often in source file comments). To do so, it creates a +helper queue instance (the DA queue). At that point, there are two +queues running - the primary queue's consumer changes to a +shuffle-to-DA-queue consumer and the original primary consumer is +assigned to the DA queue. Existing and new messages are spooled to the +disk queue, where the DA worker takes them from and passes them for +execution to the actual consumer. In essence, the primary queue has now +become a memory buffer for the DA queue. The primary queue will be +drained until a low water mark is reached. At that point, processing is +held. New messages enqueued to the primary queue will not be processed +but kept in memory. Processing resumes when either the high water mark +is reached again or the DA queue indicates it is empty. If the DA queue +is empty, it is shut down and processing of the primary queue continues +as a regular in-memory queue (aka "DA mode is shut down"). The whole +thing iterates once the high water mark is hit again. + +There is one special case: if the primary queue is shut down and could +not finish processing all messages within the configured timeout +periods, the DA queue is instantiated to take up the remaining messages. +These will be preserved and be processed during the next run. During +that period, the DA queue runs in "enqueue-only" mode and does not +execute any consumer. Draining the primary queue is typically very fast. +If that behaviour is not desired, it can be turned of via parameters. In +that case, any remaining in-memory messages are lost. + +Due to the fact that when running DA two queues work closely together +and worker threads (including the DA worker) may shut down at any time +(due to timeout), processing synchronization and startup and shutdown is +somewhat complex. I'll outline the exact conditions and steps down here. +I also do this so that I know clearly what to develop to, so please be +patient if the information is a bit too in-depth ;) + +DA Run Mode Initialization +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Three cases: + +#. any time during queueEnqObj() when the high water mark is hit +#. at queue startup if there is an on-disk queue present (presence of QI + file indicates presence of queue data) +#. at queue shutdown if remaining in-memory data needs to be persisted + to disk + +In **case 1**, the worker pool is running. When switching to DA mode, +all regular workers are sent termination commands. The DA worker is +initiated. Regular workers may run in parallel to the DA worker until +they terminate. Regular workers shall terminate as soon as their current +consumer has completed. They shall not execute the DA consumer. + +In **case 2**, the worker pool is not yet running and is NOT started. +The DA worker is initiated. + +In **case 3**, the worker pool is already shut down. The DA worker is +initiated. The DA queue runs in enqueue-only mode. + +In all cases, the DA worker starts up and checks if DA mode is already +fully initialized. If not, it initializes it, what most importantly +means construction of the queue. + +Then, regular worker processing is carried out. That is, the queue +worker will wait on empty queue and terminate after an timeout. However, +If any message is received, the DA consumer is executed. That consumer +checks the low water mark. If the low water mark is reached, it stops +processing until either the high water mark is reached again or the DA +queue indicates it is empty (there is a pthread\_cond\_t for this +synchronization). + +In theory, a **case-2** startup could lead to the worker becoming +inactive and terminating while waiting on the primary queue to fill. In +practice, this is highly unlikely (but only for the main message queue) +because rsyslog issues a startup message. HOWEVER, we can not rely on +that, it would introduce a race. If the primary rsyslog thread (the one +that issues the message) is scheduled very late and there is a low +inactivity timeout for queue workers, the queue worker may terminate +before the startup message is issued. And if the on-disk queue holds +only a few messages, it may become empty before the DA worker is +re-initiated again. So it is possible that the DA run mode termination +criteria occurs while no DA worker is running on the primary queue. + +In cases 1 and 3, the DA worker can never become inactive without +hitting the DA shutdown criteria. In **case 1**, it either shuffles +messages from the primary to the DA queue or it waits because it has the +hit low water mark. + +In **case 3**, it always shuffles messages between the queues (because, +that's the sole purpose of that run). In order for this to happen, the +high water mark has been set to the value of 1 when DA run mode has been +initialized. This ensures that the regular logic can be applied to drain +the primary queue. To prevent a hold due to reaching the low water mark, +that mark must be changed to 0 before the DA worker starts. + +DA Run Mode Shutdown +~~~~~~~~~~~~~~~~~~~~ + +In essence, DA run mode is terminated when the DA queue is empty and the +primary worker queue size is below the high water mark. It is also +terminated when the primary queue is shut down. The decision to switch +back to regular (non-DA) run mode is typically made by the DA worker. If +it switches, the DA queue is destructed and the regular worker pool is +restarted. In some cases, the queue shutdown process may initiate the +"switch" (in this case more or less a clean shutdown of the DA queue). + +One might think that it would be more natural for the DA queue to detect +being idle and shut down itself. However, there are some issues +associated with that. Most importantly, all queue worker threads need to +be shut down during queue destruction. Only after that has happened, +final destruction steps can happen (else we would have a myriad of +races). However, it is the DA queues worker thread that detects it is +empty (empty queue detection always happens at the consumer side and +must so). That would lead to the DA queue worker thread to initiate DA +queue destruction which in turn would lead to that very same thread +being canceled (because workers must shut down before the queue can be +destructed). Obviously, this does not work out (and I didn't even +mention the other issues - so let's forget about it). As such, the +thread that enqueues messages must destruct the queue - and that is the +primary queue's DA worker thread. + +There are some subtleties due to thread synchronization and the fact +that the DA consumer may not be running (in a **case-2 startup**). So it +is not trivial to reliably change the queue back from DA run mode to +regular run mode. The priority is a clean switch. We accept the fact +that there may be situations where we cleanly shut down DA run mode, +just to re-enable it with the very next message being enqueued. While +unlikely, this will happen from time to time and is considered perfectly +legal. We can't predict the future and it would introduce too great +complexity to try to do something against that (that would most probably +even lead to worse performance under regular conditions). + +The primary queue's DA worker thread may wait at two different places: + +#. after reaching the low water mark and waiting for either high water + or DA queue empty +#. at the regular pthread\_cond\_wait() on an empty primary queue + +Case 2 is unlikely, but may happen (see info above on a case 2 startup). + +**The DA worker may also not wait at all,** because it is actively +executing and shuffling messages between the queues. In that case, +however, the program flow passes both of the two wait conditions but +simply does not wait. + +**Finally, the DA worker may be inactive**\ (again, with a case-2 +startup). In that case no work(er) at all is executed. Most importantly, +without the DA worker being active, nobody will ever detect the need to +change back to regular mode. If we have this situation, the very next +message enqueued will cause the switch, because then the DA run mode +shutdown criteria is met. However, it may take close to eternal for this +message to arrive. During that time, disk and memory resources for the +DA queue remain allocated. This also leaves processing in a sub-optimal +state and it may take longer than necessary to switch back to regular +queue mode when a message burst happens. In extreme cases, this could +even lead to shutdown of DA run mode, which takes so long that the high +water mark is passed and DA run mode is immediately re-initialized - +while with an immediate switch, the message burst may have been able to +be processed by the in-memory queue without DA support. + +So in short, it is desirable switch to regular run mode as soon as +possible. To do this, we need an active DA worker. The easy solution is +to initiate DA worker startup from the DA queue's worker once it detects +empty condition. To do so, the DA queue's worker must call into a "*DA +worker startup initiation*\ " routine inside the main queue. As a +reminder, the DA worker will most probably not receive the "DA queue +empty" signal in that case, because it will be long sent (in most cases) +before the DA worker even waits for it. So **it is vital that DA run +mode termination checks be done in the DA worker before it goes into any +wait condition**. + +Please note that the "*DA worker startup initiation*\ " routine may be +called concurrently from multiple initiators. **To prevent a race, it +must be guarded by the queue mutex**\ and return without any action (and +no error code!) if the DA worker is already initiated. + +All other cases can be handled by checking the termination criteria +immediately at the start of the worker and then once again for each run. +The logic follows this simplified flow diagram: + +.. |image0| image:: queueWorkerLogic.jpg + +Some of the more subtle aspects of worker processing (e.g. enqueue +thread signaling and other fine things) have been left out in order to +get the big picture. What is called "check DA mode switchback..." right +after "worker init" is actually a check for the worker's termination +criteria. Typically, **the worker termination criteria is a shutdown +request**. However, **for a DA worker, termination is also requested if +the queue size is below the high water mark AND the DA queue is empty**. +There is also a third termination criteria and it is not even on the +chart: that is the inactivity timeout, which exists in all modes. Note +that while the inactivity timeout shuts down a thread, it logically does +not terminate the worker pool (or DA worker): workers are restarted on +an as-needed basis. However, inactivity timeouts are very important +because they require us to restart workers in some situations where we +may expect a running one. So always keep them on your mind. + +Queue Destruction +~~~~~~~~~~~~~~~~~ + +Now let's consider **the case of destruction of the primary +queue.**\ During destruction, our focus is on loosing as few messages as +possible. If the queue is not DA-enabled, there is nothing but the +configured timeouts to handle that situation. However, with a DA-enabled +queue there are more options. + +If the queue is DA-enabled, it may be *configured to persist messages to +disk before it is terminated*. In that case, loss of messages never +occurs (at the price of a potentially lengthy shutdown). Even if that +setting is not applied, the queue should drain as many messages as +possible to the disk. For that reason, it makes no sense to wait on a +low water mark. Also, if the queue is already in DA run mode, it does +not make any sense to switch back to regular run mode during termination +and then try to process some messages via the regular consumer. It is +much more appropriate the try completely drain the queue during the +remaining timeout period. For the same reason, it is preferred that no +new consumers be activated (via the DA queue's worker), as they only +cost valuable CPU cycles and, more importantly, would potentially be +long(er)-running and possibly be needed to be cancelled. To prevent all +of that, **queue parameters are changed for DA-enabled queues:** the +high water mark is to 1 and the low water mark to 0 on the primary +queue. The DA queue is commanded to run in enqueue-only mode. If the +primary queue is *configured to persist messages to disk before it is +terminated*, its SHUTDOWN timeout is changed to to eternal. These +parameters will cause the queue to drain as much as possible to disk +(and they may cause a case 3 DA run mode initiation). Please note that +once the primary queue has been drained, the DA queue's worker will +automatically switch back to regular (non-DA) run mode. **It must be +ensured that no worker cancellation occurs during that switchback**. +Please note that the queue may not switch back to regular run mode if it +is not *configured to persist messages to disk before it is terminated*. +In order to apply the new parameters, **worker threads must be +awakened.** Remember we may not be in DA run mode at this stage. In that +case, the regular workers must be awakened, which then will switch to DA +run mode. No worker may be active, in that case one must be initiated. +If in DA run mode and the DA worker is inactive, theĀ "*DA worker +startup initiation*\ " must be called to activate it. That routine +ensures only one DA worker is started even with multiple concurrent +callers - this may be the case here. The DA queue's worker may have +requested DA worker startup in order to terminate on empty queue (which +will probably not be honored as we have changed the low water mark). + +After all this is done, the queue destructor requests termination of the +queue's worker threads. It will use the normal timeouts and potentially +cancel too-long running worker threads. **The shutdown process must +ensure that all workers reach running state before they are commanded to +terminate**. Otherwise it may run into a race condition that could lead +to a false shutdown with workers running asynchronously. As a few +workers may have just been started to initialize (to apply new parameter +settings), the probability for this race condition is extremely high, +especially on single-CPU systems. + +After all workers have been shut down (or cancelled), the queue may +still be in DA run mode. If so, this must be terminated, which now can +simply be done by destructing the DA queue object. This is not a real +switchback to regular run mode, but that doesn't matter because the +queue object will soon be gone away. + +Finally, the queue is mostly shut down and ready to be actually +destructed. As a last try, the queuePersists() entry point is called. It +is used to persists a non-DA-enabled queue in whatever way is possible +for that queue. There may be no implementation for the specific queue +type. Please note that this is not just a theoretical construct. This is +an extremely important code path when the DA queue itself is destructed. +Remember that it is a queue object in its own right. The DA queue is +obviously not DA-enabled, so it calls into queuePersists() during its +destruction - this is what enables us to persist the disk queue! + +After that point, left over queue resources (mutexes, dynamic memory, +...) are freed and the queue object is actually destructed. + diff --git a/source/development/dev_testbench.rst b/source/development/dev_testbench.rst new file mode 100644 index 0000000..40721c5 --- /dev/null +++ b/source/development/dev_testbench.rst @@ -0,0 +1,20 @@ +writing rsyslog tests +===================== + +The rsyslog testbench is executed via `make check` or `make distcheck`. For details, on +these modes, see the GNU autotools documentation. The most important thing is that +the `make distcheck` test execution environment is considerably different from its +`make check` counterpart. The rsyslog testbench is crafted to handle both cases and +does so with the (intensive) use of environment variables. + +The rsyslog testbench aims to support parallel tests. This is not yet fully implemented, +but we are working towards that goal. This has a number of implications/requirements: + +* all file names, ports, etc need to be unique +* the diag.sh framework supports auto-generation capabilities to support this: + use `${RSYSLOG_DYNNAME}` a prefix for all files you generate. For the frequently + used files, the framework already defines `${RSYSLOG_OUT_LOG}` and `${RSYSLOG_OUT_LOG2}` + + +When writing new tests, it is in general advisable to copy an existing test and change +it. This also helps you get requirements files. diff --git a/source/development/generic_design.rst b/source/development/generic_design.rst new file mode 100644 index 0000000..6c02ad6 --- /dev/null +++ b/source/development/generic_design.rst @@ -0,0 +1,146 @@ +Generic design of a syslogd +--------------------------- + +Written 2007-04-10 by `Rainer Gerhards <https://rainer.gerhards.net>`_ + +The text below describes a generic approach on how a syslogd can be +implemented. I created this description for some other project, where it +was not used. Instead of throwing it away, I thought it would be a good +addition to the rsyslog documentation. While rsyslog differs in details +from the description below, it is sufficiently close to it. Further +development of rsyslog will probably match it even closer to the +description. + +If you intend to read the rsyslog source code, I recommend reading this +document here first. You will not find the same names and not all of the +concepts inside rsyslog. However, I think your understanding will +benefit from knowing the generic architecture. + +:: + + + +-----------------+ + | "remote" PLOrig | + +-----------------+ + | + I +--------+-----+-----+ +-----+-------+------+-----+ + P | PLOrig | GWI | ... | | GWO | Store | Disc | ... | + C +--------+-----+-----+ +-----+-------+------+-----+ + | | ^ + v v | + +--------------+ +------------+ +--------------+ + | PLGenerator | | RelayEng | | CollectorEng | + +--------------+ +------------+ +--------------+ + | ^ ^ + | | | + v v | + +-------------+ +------------+ +--------------+ + | PLG Ext | | RelEng Ext | | CollcEng Ext | + +-------------+ +------------+ +--------------+ + | ^ ^ + | | | + v v | + +--------------------------------------------------------------+ + | Message Router | + +--------------------------------------------------------------+ + | ^ + v | + +--------------------------------------------------------------+ + | Message CoDec (e.g. RFC 3164, RFCYYYY) | + +--------------------------------------------------------------+ + | ^ + v | + +---------------------+-----------------------+----------------+ + | transport UDP | transport TLS | ... | + +---------------------+-----------------------+----------------+ + + Generic Syslog Application Architecture + +- A "syslog application" is an application whose purpose is the + processing of syslog messages. It may be part of a larger application + with a broader purpose. An example: a database application might come + with its own syslog send subsystem and not go through a central + syslog application. In the sense of this document, that application + is called a "syslog application" even though a casual observer might + correctly call it a database application and may not even know that + it supports sending of syslog messages. +- Payload is the information that is to be conveyed. Payload by itself + may have any format and is totally independent from to format + specified in this document. The "Message CoDec" of the syslog + application will bring it into the required format. +- Payload Originators ("PLOrig") are the original creators of payload. + Typically, these are application programs. +- A "Remote PLOrig" is a payload originator residing in a different + application than the syslog application itself. That application may + reside on a different machine and may talk to the syslog application + via RPC. +- A "PLOrig" is a payload originator residing within the syslog + application itself. Typically, this PLOrig emits syslog application + startup, shutdown, error and status log messages. +- A "GWI" is a inbound gateway. For example, a SNMP-to-syslog gateway + may receive SNMP messages and translate them into syslog. +- The ellipsis after "GWI" indicates that there are potentially a + variety of different other ways to originally generate payload. +- A "PLGenerator" is a payload generator. It takes the information from + the payload-generating source and integrates it into the syslog + subsystem of the application. This is a highly theoretical concept. + In practice, there may not actually be any such component. Instead, + the payload generators (or other parts like the GWI) may talk + directly to the syslog subsystem. Conceptually, the "PLGenerator" is + the first component where the information is actually syslog content. +- A "PLG Ext" is a payload generator extension. It is used to modify + the syslog information. An example of a "PLG Ext" might be the + addition of cryptographic signatures to the syslog information. +- A "Message Router" is a component that accepts in- and outbound + syslog information and routes it to the proper next destination + inside the syslog application. The routing information itself is + expected to be learnt by operator configuration. +- A "Message CoDec" is the message encoder/decoder. The encoder takes + syslog information and encodes them into the required format + for a syslog message. The decoder takes a syslog message and decodes + it into syslog information. Codecs for multiple syslog formats may be + present inside a single syslog application. +- A transport (UDP, TLS, yet-to-be-defined ones) sends and receives + syslog messages. Multiple transports may be used by a single + syslog application at the same time. A single transport instance may + be used for both sending and receiving. Alternatively, a single + instance might be used for sending and receiving exclusively. + Multiple instances may be used for different listener ports and + receivers. +- A "RelayEng" is the relaying engine. It provides functionality + necessary for receiving syslog information and sending it to another + syslog application. +- A "RelEng Ext" is an extension that processes syslog information as + it enters or exits a RelayEng. An example of such a component might + be a relay cryptographically signing received syslog messages. Such a + function might be useful to guarantee authenticity starting from a + given point inside a relay chain. +- A "CollectorEng" is a collector engine. At this component, syslog + information leaves the syslog system and is translated into some + other form. After the CollectorEng, the information is no longer + defined to be of native syslog type. +- A "CollcEng Ext" is a collector engine extension. It modifies syslog + information before it is passed on to the CollectorEng. An example + for this might be the verification of cryptographically signed syslog + message information. Please note that another implementation approach + would be to do the verification outside of the syslog application or + in a stage after "CollectorEng". +- A "GWO" is an outbound gateway. An example of this might be the + forwarding of syslog information via SNMP or SMTP. Please note that + when a GWO directly connects to a GWI on a different syslog + application, no native exchange of syslog information takes place. + Instead, the native protocol of these gateways (e.g. SNMP) is used. + The syslog information is embedded inside that protocol. Depending on + protocol and gateway implementation, some of the native syslog + information might be lost. +- A "Store" is any way to persistently store the extracted syslog + information, e.g. to the file system or to a data base. +- "Disc" means the discarding of messages. Operators often find it + useful to discard noise messages and so most syslog applications + contain a way to do that. +- The ellipsis after "Disc" indicates that there are potentially a + variety of different other ways to consume syslog information. +- There may be multiple instances of each of the described components + in a single syslog application. +- A syslog application is made up of all or some of the above mentioned + components. diff --git a/source/development/index.rst b/source/development/index.rst new file mode 100644 index 0000000..02a5cc0 --- /dev/null +++ b/source/development/index.rst @@ -0,0 +1,7 @@ +Development +=========== + +.. toctree:: + :glob: + + * diff --git a/source/development/queueWorkerLogic.jpg b/source/development/queueWorkerLogic.jpg Binary files differnew file mode 100644 index 0000000..fb143c4 --- /dev/null +++ b/source/development/queueWorkerLogic.jpg |