summaryrefslogtreecommitdiffstats
path: root/source/whitepapers/syslog_parsing.rst
blob: 3e3f2de12f9b5679ca0b4336d34b5c0ec728ef92 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
syslog parsing in rsyslog
=========================

*Written by* `Rainer Gerhards <https://rainer.gerhards.net/>`_,
`Großrinderfeld <https://www.rainer-gerhards.de/grossrinderfeld/>`_
*(2008-09-23)*

**We regularly receive messages asking why**
`rsyslog <http://www.rsyslog.com>`_ **parses this or that message
incorrectly.** Of course, it turns out that rsyslog does the right
thing, but the message sender does not. And also of course, this is not
even of the slightest help to the end user experiencing the problem ;).
So I thought I write this paper. It describes the problem source and
shows potential solutions (aha!).

Syslog Standardization
----------------------

The syslog protocol has not been standardized until relatively
recently. The first document "smelling" a bit like a standard is
:rfc:`3164`, which dates back to August
2001. The problem is that this document is no real standard. It has
assigned "informational" status by the `IETF <http://www.ietf.org>`_
which means it provides some hopefully useful information but does not
demand anything. It is impossible to "comply" to an informational
document. This, of course, doesn't stop marketing guys from telling they
comply to RFC3164 and it also does not stop some techs to tell you "this
and that does not comply to RFC3164, so it is <anybody else but them>'s
fault".

Then, there is :rfc:`3195`, which
is a real standard. In it's section 3 it makes (a somewhat questionable)
reference to (informational) RFC 3164 which may be interpreted in a way
that RFC3195 standardizes the format layed out in RFC 3164 by virtue of
referencing them. So RFC3195 seems to extend its standardization domain
to the concepts layed out in RFC 3164 (which is why I tend to find that
reference questionable). In that sense, RFC3195 standardizes the format
informationally described in RFC3164, Section 4. But it demands it only
for the scope of RFC3195, which is syslog over BEEP - and NOT syslog
over UDP. So one may argue whether or not the RFC3164 format could be
considered a standard for any non-BEEP (including UDP) syslog, too. In
the strict view I tend to have, it does not. Refering to the RFC3195
context usually does not help, because there are virtually no RFC3195
implementations available (at this time, I would consider this RFC a
failure).

Now let's for a short moment assume that RFC3195 would somehow be able
to demand RFC3164 format for non-BEEP syslog. So we could use RFC3164
format as a standard. But does that really help? Let's cite RFC 3164,
right at the beginning of section 4 (actually, this is the first
sentence):

    ::

           The payload of any IP packet that has a UDP destination port of 514
           MUST be treated as a syslog message. 


        Think a bit about it: this means that whatever is send to port 514 must be considered
        a valid syslog message. No format at all is demanded. So if "this is junk" is sent to
        UDP port 514 - voila, we have a valid message (interestingly, it is no longer a syslog
        message if it is sent to port 515 ;)). You may now argue that I am overdoing. So let's
        cite RFC 3164, Section 5.4, Example 2:


          Example 2

                Use the BFG!

           While this is a valid message, it has extraordinarily little useful
           information.

As you can see, RFC3164 explicitly states that no format at all is
required.

Now a side-note is due: all of this does not mean that the RFC3164
authors did not know what they were doing. No, right the contrary is
true: RFC3164 mission is to describe what has been seen in practice as
syslog messages and the conclusion is quite right that there is no
common understanding on the message format. This is also the reason why
RFC3164 is an informational document: it provides useful information,
but does not precisely specify anything.

After all of this bashing, I now have to admit that RFC3164 has some
format recommendations layed out in section 4. The format described has
quite some value in it and implementors recently try to follow it. This
format is usually meant when someone tells you that a software is
"RFC3164 compliant" or expects "RFC3164 compliant messages". I also have
to admit that rsyslog also uses this format and, in the sense outlined
here, expects messages received to be "RFC3164 compliant" (knowingly
that such a beast does not exist - I am simply lying here ;)).

Please note that there is some relief of the situation in reach. There
is a new normative syslog RFC series upcoming, and it specifies a
standard message format. At the time of this writing, the main documents
are sitting in the RFC editor queue waiting for a transport mapping to
be completed. I personally expect them to be assigned RFC numbers in
2009.

Update: the numbers are now assigned and the base RFC is :rfc:`5424`.

Practical Format Requirements
-----------------------------

From a practical point of view, the message format expected (and
generated by default in legacy mode) is:

::

    <PRI>TIMESTAMP SP HOST SP TAG MSG(Freetext)

SP is the ASCII "space" character and the definition of the rest of the
fields can be taken from RFC3164. Please note that there also is a lot
of confusion on what syntax and semantics the TAG actually has. This
format is called "legacy syslog" because it is not well specified (as
you know by now) and has been "inherited from the real world".

Rsyslog offers two parsers: one for the upcoming RFC series and one for
legacy format. We concentrate on the later. That parser applies some
logic to detect missing hostnames, is able to handle various ways the
TIMESTAMP is typically malformed. In short it applies a lot of guesswork
in trying to figure out what a message really means. I am sure the
guessing algorithm can be improved, and I am always trying that when I
see new malformed messages (and there is an ample set of them...).
However, this finds its limits where it is not possible to differentiate
between two entities which could be either. For example, look at this
message:

::

    <144>Tue Sep 23 11:40:01 taghost sample message

Does it contain a hostname? Maybe. The value "taghost" is a valid
hostname. Of course, it is also a valid tag. If it is a hostname, the
tag's value is "sample" and the msg value is "message". Or is the
hostname missing, the tag is "taghost" and msg is "sample message"? As a
human, I tend to say the later interpretation is correct. But that's
hard to tell the message parser (and, no, I do not intend to apply
artificial intelligence just to guess what the hostname value is...).

One approach is to configure the parser so that it never expects
hostnames. This becomes problematic if you receive messages from
multiple devices. Over time, I may implement parser conditionals, but
this is not yet available and I am not really sure if it is needed
complexity...

Things like this, happen. Even more scary formats happen in practice.
Even from mainstream vendors. For example, I was just asked about this
message (which, btw, finally made me write this article here):

::

    "<130> [ERROR] iapp_socket_task.c 399: iappSocketTask: iappRecvPkt returned error"

If you compare it with the format RFC3164 "suggests", you'll quickly
notice that the message is "a bit" malformed. Actually, even my human
intelligence is not sufficient to guess if there is a TAG or not (is
"[ERROR]" a tag or part of the message). I may not be the smartest guy,
but don't expect me to program a parser that is smarter than me.

To the best of my knowledge, these vendor's device's syslog format can
be configured, so it would probably be a good idea to include a
(sufficiently well-formed) timestamp, the sending hostname and (maybe?)
a tag to make this message well parseable. I will also once again take
this sample and see if we can apply some guesswork. For example, "[" can
not be part of a well-formed TIMESTAMP, so logic can conclude there is
not TIMESTAMP. Also, "[" can not be used inside a valid hostname, so
logic can conclude that the message contains no hostname. Even if I
implement this logic (which I will probably do), this is a partial
solution: it is impossible to guess if there is a tag or not
(honestly!). And, even worse, it is a solution only for those set of
messages that can be handled by the logic described. Now consider this
hypothetical message:

::

    "<130> [ERROR] host.example.net 2008-09-23 11-40-22 PST iapp_socket_task.c 399: iappSocketTask: iappRecvPkt returned error"

Obviously, it requires additional guesswork. If we iterate over all the
cases, we can very quickly see that it is impossible to guess everything
correct. In the example above we can not even surely tell if PST should
be a timezone or some other message property.

A potential solution is to generate a parser-table based parser, but
this requires considerable effort and also has quite some runtime
overhead. I try to avoid this for now (but I may do it, especially if
someone sponsors this work ;)). Side-note: if you want to be a bit
scared about potential formats, you may want to have a look at my paper
"`On the Nature of Syslog
Data <http://www.monitorware.com/en/workinprogress/nature-of-syslog-data.php>`_\ ".

Work-Around
-----------

**The number one work-around is to configure your devices so that they
emit (sufficiently) well-formed messages.** You should by now know what
these look like.

If that cure is not available, there are some things you can do in
rsyslog to handle the situation. First of all, be sure to read about
:doc:`rsyslog.conf format <../configuration/basic_structure>` and the
:doc:`property replacer <../configuration/property_replacer>` specifically. You need to
understand that everything is configured in rsyslog. And that the
message is parsed into properties. There are also properties available
which do not stem back directly to parsing. Most importantly, %fromhost%
property holds the name of the system rsyslog received the message from.
In non-relay cases, this can be used instead of hostname. In relay
cases, there is no cure other than to either fix the original sender or
at least one of the relays in front of the rsyslog instance in question.
Similarly, you can use %timegenerated% instead of %timereported%.
Timegenerated is the time the message hit rsyslog for the first time.
For non-relayed, locally connected peers, Timegenerated should be a very
close approximation of the actual time a message was formed at the
sender (depending, of course, on potential internal queueing inside the
sender). Also, you may use the %rawmsg% property together with the
several extraction modes the property replacer supports. Rawmsg contains
the message as it is received from the remote peer. In a sense, you can
implement a post-parser with this method.

To use these properties, you need to define your own templates and
assign them. Details can be found in the above-quoted documentation.
Just let's do a quick example. Let's say you have the horrible message
shown above and can not fix the sending device for some good reason. In
rsyslog.conf, you used to say:

::

    *.* /var/log/somefile

Of course, things do not work out well with that ill-formed message. So
you decide to dump the rawmsg to the file and pull the remote host and
time of message generation from rsyslog's internal properties (which,
btw, is clever, because otherwise there is no indication of these two
properties...). So you need to define a template for that and make sure
the template is used with your file logging action. This is how it may
look:

::

    $template, MalfromedMsgFormater,"%timegenerated% %fromhost% %rawmsg:::drop-last-lf%\n"
    *.* /var/log/somefile;MalformedMsgFormatter

This will make your log much nicer, but not look perfect. Experiment a
bit with the available properties and replacer extraction options to
fine-tune it to your needs.

The Ultimate Solution...
------------------------

Is available with rsyslog 5.3.4 and above. Here, we can define so-called
custom parsers. These are plugin modules, written in C and adapted to a
specific message format need. The big plus of custom parsers is that
they offer excellent performance and unlimited possibilities - far
better than any work-around could do. Custom parsers can be `bound to
specific rule sets <rsconf1_rulesetparser.html>`_ (and thus listening)
ports with relative ease. The only con is that they must be written.
However, if you are lucky, a parser for your device may already exist.
If not, you can opt to write it yourself, what is not too hard if you
know some C. Alternatively, Adiscon can program one for you as part of
the `rsyslog professional services
offering <http://www.rsyslog.com/professional-services>`_. In any case,
you should seriously consider custom parsers as an alternative if you
can not reconfigure your device to send decent message format.

Wrap-Up
-------

Syslog message format is not sufficiently standardized. There exists a
weak "standard" format, which is used by a good number of
implementations. However, there exist many others, including mainstream
vendor implementations, which have a (sometimes horribly) different
format. Rsyslog tries to deal with anomalies but can not guess right in
all instances. If possible, the sender should be configured to submit
well-formed messages. If that is not possible, you can work around these
issues with rsyslog's property replacer and template system. Or you can
use a suitable message parser or write one for your needs.

I hope this is a useful guide. You may also have a look at the `rsyslog
troubleshooting guide <troubleshoot.html>`_ for further help and places
where to ask questions.