source/whitepapers/reliable_logging.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81

How reliable should reliable logging be?
========================================
With any logging, you need to decide what you want to do if the log cannot
be written

* do you want the application to stop because it can't write a log message

or

* do you want the application to continue, but not write the log message

Note that this decision is still there even if you are not logging
remotely, your local disk partition where you are writing logs can fill up,
become read-only, or have other problems.

The RFC for syslog (dating back a couple of decades, well before rsyslog
started) specify that the application writing the log message should block
and wait for the log message to be processed. Rsyslog (like every other
modern syslog daemon) fudges this a bit and buffers the log data in RAM
rather than following the original behavior of writing the data to disk and
doing a fsync before acknowledging the log message.

If you have a problem with your output from rsyslog, your application will
keep running until rsyslog fills it's queues, and then it will stop.

When you configure rsyslog to send the logs to another machine (either to
rsyslog on another machine or to some sort of database), you introduce a
significant new set of failure modes for the output from rsyslog.

You can configure the size of the rsyslog memory queues (I had one machine
dedicated to running rsyslog where I created queues large enough to use
>100G of ram for logs)

You can configure rsyslog to spill from it's memory queues to disk queues
(disk assisted queue mode) when it fills it's memory queues.

You can create a separate set of queues for the action that has a high
probability of failing (sending to a remote machine via TCP in this case),
but this doesn't buy you more time, it just means that other logs can
continue to be written when the remote system is down.

You can configure rsyslog to have high/low watermark levels, when the queue
fills past the high watermark, rsyslog will start discarding logs below a
specified severity, and stop doing so when it drops below the low watermark
level

For rsyslog -> \*syslog, you can use UDP for your transport so that the logs
will get dropped at the network layer if the remote system is unresponsive.

You have lots of options.

If you are really concerned with reliability, I should point out that using
TCP does not eliminate the possibility of loosing logs when a remote system
goes down. When you send a message via TCP, the sender considers it sent
when it's handed to the OS to send it. The OS has a window of how much data
it allows to be outstanding (sent without acknowledgement from the remote
system), and when the TCP connection fails (due to a firewall or a remote
machine going down), the sending OS has no way to tell the application what
data what data is outstanding, so the outstanding data will be lost. This
is a smaller window of loss than UDP, which will happily keep sending your
data forever, but it's still a potential for loss. Rsyslog offers the RELP
(Reliable Event Logging Protocol), which addresses this problem by using
application level acknowledgements so no messages can get lost due to
network issues. That just leaves memory buffering (both in rsyslog and in
the OS after rsyslog tells the OS to write the logs) as potential data loss
points. Those failures will only trigger if the system crashes or rsyslog
is shutdown (and yes, there are ways to address these as well)

The reason why nothing today operates without the possibility of loosing
log messages is that making the logs completely reliable absolutely kills
performance. With buffering, rsyslog can handle 400,000 logs/sec on a
low-mid range machine. With utterly reliable logs and spinning disks, this
rate drops to <100 logs/sec. With a $5K PCI SSD card, you can get up to
~4,000 logs/sec (in both cases, at the cost of not being able to use the
disk for anything else on the system (so if you do use the disk for
anything else, performance drops from there, and pretty rapidly). This is
why traditional syslog had a reputation for being very slow.

See Also
--------
* https://rainer.gerhards.net/2008/04/on-unreliability-of-plain-tcp-syslog.html