summaryrefslogtreecommitdiffstats
path: root/source/configuration/property_replacer.rst
blob: 3d99f5f1fcd631d2c142827eaa0fbe9dbf630ffb (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
The Property Replacer
=====================

**The property replacer is a core component in rsyslogd's** `string template
system <templates.html>`_. A syslog message has a number of well-defined properties.
Each of these properties can be accessed **and** manipulated by
the property replacer. With it, it is easy to use only part of a
property value or manipulate the value, e.g. by converting all
characters to lower case.

Accessing Properties
--------------------

Syslog message properties are used inside templates. They are accessed
by putting them between percent signs. Properties can be modified by the
property replacer. The full syntax is as follows:

::

    %property:fromChar:toChar:options%

Available Properties
^^^^^^^^^^^^^^^^^^^^

The property replacer can use all :doc:`rsyslog properties <properties>`.

Character Positions
^^^^^^^^^^^^^^^^^^^^

**FromChar** and **toChar** are used to build substrings. They
specify the offset within the string that should be copied. Offset
counting starts at 1, so if you need to obtain the first 2 characters of
the message text, you can use this syntax: "%msg:1:2%". If you do not
wish to specify from and to, but you want to specify options, you still
need to include the colons. For example, if you would like to convert
the full message text to lower case, use "%msg:::lowercase%". If you
would like to extract from a position until the end of the string, you
can place a dollar-sign ("$") in toChar (e.g. %msg:10:$%, which will
extract from position 10 to the end of the string).

There is also support for **regular expressions**. To use them, you need
to place a "R" into FromChar. This tells rsyslog that a regular
expression instead of position-based extraction is desired. The actual
regular expression must then be provided in toChar. The regular
expression **must** be followed by the string "--end". It denotes the
end of the regular expression and will not become part of it. If you are
using regular expressions, the property replacer will return the part of
the property text that matches the regular expression. An example for a
property replacer sequence with a regular expression is:
"%msg:R:.\*Sev:. \\(.\*\\) \\[.\*--end%"

It is possible to specify some parameters after the "R". These are
comma-separated. They are:

R,<regexp-type>,<submatch>,<:doc:`nomatch <nomatch>`\ >,<match-number>

regexp-type is either "BRE" for Posix basic regular expressions or "ERE"
for extended ones. The string must be given in upper case. The default
is "BRE" to be consistent with earlier versions of rsyslog that did not
support ERE. The submatch identifies the submatch to be used with the
result. A single digit is supported. Match 0 is the full match, while 1
to 9 are the actual submatches. The match-number identifies which match
to use, if the expression occurs more than once inside the string.
Please note that the first match is number 0, the second 1 and so on. Up
to 10 matches (up to number 9) are supported. Please note that it would
be more natural to have the match-number in front of submatch, but this
would break backward-compatibility. So the match-number must be
specified after "nomatch".

:doc:`nomatch <nomatch>` specifies what should be used in
case no match is found.

The following is a sample of an ERE expression that takes the first
submatch from the message string and replaces the expression with the
full field if no match is found:

::

%msg:R,ERE,1,FIELD:for (vlan[0-9]\*):--end%

and this takes the first submatch of the second match of said
expression:

::

%msg:R,ERE,1,FIELD,1:for (vlan[0-9]\*):--end%

**Please note: there is also a** `rsyslog regular expression
checker/generator <http://www.rsyslog.com/tool-regex>`_ **online tool
available.** With that tool, you can check your regular expressions and
also generate a valid property replacer sequence. Usage of this tool is
recommended. Depending on the version offered, the tool may not cover
all subtleties that can be done with the property replacer. It
concentrates on the most often used cases. So it is still useful to
hand-craft expressions for demanding environments.

**Also, extraction can be done based on so-called "fields"**. To do so,
place a "F" into FromChar. A field in its current definition is anything
that is delimited by a delimiter character. The delimiter by default is
TAB (US-ASCII value 9). However, if can be changed to any other US-ASCII
character by specifying a comma and the **decimal** US-ASCII value of
the delimiter immediately after the "F". For example, to use comma (",")
as a delimiter, use this field specifier: "F,44".  If your syslog data
is delimited, this is a quicker way to extract than via regular
expressions (actually, a *much* quicker way). Field counting starts at
1. Field zero is accepted, but will always lead to a "field not found"
error. The same happens if a field number higher than the number of
fields in the property is requested. The field number must be placed in
the "ToChar" parameter. An example where the 3rd field (delimited by
TAB) from the msg property is extracted is as follows: "%msg:F:3%". The
same example with semicolon as delimiter is "%msg:F,59:3%".

The use of fields does not permit to select substrings, what is rather
unfortunate. To solve this issue, starting with 6.3.9, fromPos and toPos
can be specified for strings as well. However, the syntax is quite ugly,
but it was the only way to integrate this functionality into the
already-existing system. To do so, use ",fromPos" and ",toPos" during
field extraction. Let's assume you want to extract the substring from
position 5 to 9 in the previous example. Then, the syntax is as follows:
"%msg:F,59,5:3,9%". As you can see, "F,59" means field-mode, with
semicolon delimiter and ",5" means starting at position 5. Then "3,9"
means field 3 and string extraction to position 9.

Please note that the special characters "F" and "R" are case-sensitive.
Only upper case works, lower case will return an error. There are no
white spaces permitted inside the sequence (that will lead to error
messages and will NOT provide the intended result).

Each occurrence of the field delimiter starts a new field. However, if
you add a plus sign ("+") after the field delimiter, multiple
delimiters, one immediately after the others, are treated as separate
fields. This can be useful in cases where the syslog message contains
such sequences. A frequent case may be with code that is written as
follows:

````

::

    int n, m;
    ...
    syslog(LOG_ERR, "%d test %6d", n, m);

This will result into things like this in syslog messages: "1
test      2", "1 test     23", "1 test  234567"

As you can see, the fields are delimited by space characters, but their
exact number is unknown. They can properly be extracted as follows:

::

   "%msg:F,32:2%" to "%msg:F,32+:2%".

This feature was suggested by Zhuang Yuyao and implemented by him. It is
modeled after perl compatible regular expressions.

Property Options
^^^^^^^^^^^^^^^^

**Property options** are case-insensitive. Currently, the following
options are defined:

**uppercase**
  convert property to uppercase only

**lowercase**
  convert property text to lowercase only

**fixed-width**
  changes behaviour of toChar so that it pads the source string with spaces
  up to the value of toChar if the source string is shorter.
  *This feature was introduced in rsyslog 8.13.0*

**json**
  encode the value so that it can be used inside a JSON field. This means
  that several characters (according to the JSON spec) are being escaped, for 
  example US-ASCII LF is replaced by "\\n".
  The json option cannot be used together with either jsonf or csv options.

**jsonf**\[:outname\]
  (available in 6.3.9+)
  This signifies that the property should be expressed as a JSON field.
  That means not only the property is written, but rather a complete JSON field in
  the format

  ``"fieldname"="value"``

  where "fieldname" is given in the *outname* property (or the property name
  if none was assigned) and value is the end result of property replacer operation. 
  Note that value supports all property replacer options, like substrings, case 
  conversion and the like. Values are properly JSON-escaped, however field names are 
  (currently) not, so it is expected that proper field names are configured.
  The jsonf option cannot be used together with either json or csv options.

  For more information you can read `this article from Rainer's blog 
  <https://rainer.gerhards.net/2012/04/rsyslog-templates-json.html>`_.

**csv**
  formats the resulting field (after all modifications) in CSV format as
  specified in `RFC 4180 <http://www.ietf.org/rfc/rfc4180.txt>`_. Rsyslog
  will always use double quotes. Note that in order to have full
  CSV-formatted text, you need to define a proper template. An example is
  this one:
  $template csvline,"%syslogtag:::csv%,%msg:::csv%"
  Most importantly, you need to provide the commas between the fields
  inside the template.
  *This feature was introduced in rsyslog 4.1.6.*

**drop-last-lf**
  The last LF in the message (if any), is dropped. Especially useful for
  PIX.

**date-utc**
  convert data to UTC prior to outputting it (available since 8.18.0)

**date-mysql**
  format as mysql date

**date-rfc3164**
  format as RFC 3164 date

**date-rfc3164-buggyday**
  similar to date-rfc3164, but emulates a common coding error: RFC 3164
  demands that a space is written for single-digit days. With this option,
  a zero is written instead. This format seems to be used by syslog-ng and
  the date-rfc3164-buggyday option can be used in migration scenarios
  where otherwise lots of scripts would need to be adjusted. It is
  recommended *not* to use this option when forwarding to remote hosts -
  they may treat the date as invalid (especially when parsing strictly
  according to RFC 3164).

  *This feature was introduced in rsyslog 4.6.2 and v4 versions above and
  5.5.3 and all versions above.*

**date-rfc3339**
  format as RFC 3339 date
  
**date-unixtimestamp**
  Format as a unix timestamp (seconds since epoch)

**date-year**
  just the year part (4-digit) of a timestamp

**date-month**
  just the month part (2-digit) of a timestamp

**date-day**
  just the day part (2-digit) of a timestamp

**date-hour**
  just the hour part (2-digit, 24-hour clock) of a timestamp

**date-minute**
  just the minute part (2-digit) of a timestamp

**date-second**
  just the second part (2-digit) of a timestamp

**date-subseconds**
  just the subseconds of a timestamp (always 0 for a low precision
  timestamp)

**date-tzoffshour**
  just the timezone offset hour part (2-digit) of a timestamp

**date-tzoffsmin**
  just the timezone offset minute part (2-digit) of a timestamp. Note 
  that this is usually 0, but there are some time zones that have
  offsets which are not hourly-granular. If so, this is the minute
  offset.

**date-tzoffsdirection**
  just the timezone offset direction part of a timestamp. This 
  specifies if the offsets needs to be added ("+") or subtracted ("-")
  to the timestamp in order to get UTC.

**date-ordinal**
  returns the ordinal for the given day, e.g. it is 2 for January, 2nd

**date-iso-week** and **date-iso-week-year**
  return the ISO week number adn week-numbering year, which should be used together. See `ISO week date <https://en.wikipedia.org/wiki/ISO_week_date>`_ for more details

**date-week**
  returns the week number

**date-wday**
  just the weekday number of the timstamp. This is a single digit,
  with 0=Sunday, 1=Monday, ..., 6=Saturday.

**date-wdayname**
  just the abbreviated english name of the weekday (e.g. "Mon", "Sat") of
  the timestamp.

**escape-cc**
  replace control characters (ASCII value 127 and values less then 32)
  with an escape sequence. The sequence is "#<charval>" where charval is
  the 3-digit decimal value of the control character. For example, a
  tabulator would be replaced by "#009".
  Note: using this option requires that
  `$EscapeControlCharactersOnReceive <rsconf1_escapecontrolcharactersonreceive.html>`_
  is set to off.

**space-cc**
  replace control characters by spaces
  Note: using this option requires that
  `$EscapeControlCharactersOnReceive <rsconf1_escapecontrolcharactersonreceive.html>`_
  is set to off.

**drop-cc**
  drop control characters - the resulting string will neither contain
  control characters, escape sequences nor any other replacement character
  like space.
  Note: using this option requires that
  `$EscapeControlCharactersOnReceive <rsconf1_escapecontrolcharactersonreceive.html>`_
  is set to off.

**compressspace**
  compresses multiple spaces (US-ASCII SP character) inside the
  string to a single one. This compression happens at a very late
  stage in processing. Most importantly, it happens after substring
  extraction, so the **FromChar** and **ToChar** positions are **NOT**
  affected by this option. (available since v8.18.0)

**sp-if-no-1st-sp**
  This option looks scary and should probably not be used by a user. For
  any field given, it returns either a single space character or no
  character at all. Field content is never returned. A space is returned
  if (and only if) the first character of the field's content is NOT a
  space. This option is kind of a hack to solve a problem rooted in RFC
  3164: 3164 specifies no delimiter between the syslog tag sequence and
  the actual message text. Almost all implementation in fact delimit the
  two by a space. As of RFC 3164, this space is part of the message text
  itself. This leads to a problem when building the message (e.g. when
  writing to disk or forwarding). Should a delimiting space be included if
  the message does not start with one? If not, the tag is immediately
  followed by another non-space character, which can lead some log parsers
  to misinterpret what is the tag and what the message. The problem
  finally surfaced when the klog module was restructured and the tag
  correctly written. It exists with other message sources, too. The
  solution was the introduction of this special property replacer option.
  Now, the default template can contain a conditional space, which exists
  only if the message does not start with one. While this does not solve
  all issues, it should work good enough in the far majority of all cases.
  If you read this text and have no idea of what it is talking about -
  relax: this is a good indication you will never need this option. Simply
  forget about it ;)

**secpath-drop**
  Drops slashes inside the field (e.g. "a/b" becomes "ab"). Useful for
  secure pathname generation (with dynafiles).

**secpath-replace**
  Replace slashes inside the field by an underscore. (e.g. "a/b" becomes
  "a\_b"). Useful for secure pathname generation (with dynafiles).

To use multiple options, simply place them one after each other with a
comma delimiting them. For example "escape-cc,sp-if-no-1st-sp". If you
use conflicting options together, the last one will override the
previous one. For example, using "escape-cc,drop-cc" will use drop-cc
and "drop-cc,escape-cc" will use escape-cc mode.

Further Links
-------------

-  Article on ":doc:`Recording the Priority of Syslog
   Messages <../tutorials/recording_pri>`" (describes use of
   templates to record severity and facility of a message)
-  `Configuration file syntax <rsyslog_conf.html>`_, this is where you
   actually use the property replacer.

.. toctree::
   :maxdepth: 2

   nomatch