diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 16:27:18 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 16:27:18 +0000 |
commit | f7f20c3f5e0be02585741f5f54d198689ccd7866 (patch) | |
tree | 190d5e080f6cbcc40560b0ceaccfd883cb3faa01 /source/whitepapers/reliable_logging.rst | |
parent | Initial commit. (diff) | |
download | rsyslog-doc-f7f20c3f5e0be02585741f5f54d198689ccd7866.tar.xz rsyslog-doc-f7f20c3f5e0be02585741f5f54d198689ccd7866.zip |
Adding upstream version 8.2402.0+dfsg.upstream/8.2402.0+dfsg
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'source/whitepapers/reliable_logging.rst')
-rw-r--r-- | source/whitepapers/reliable_logging.rst | 81 |
1 files changed, 81 insertions, 0 deletions
diff --git a/source/whitepapers/reliable_logging.rst b/source/whitepapers/reliable_logging.rst new file mode 100644 index 0000000..aef2066 --- /dev/null +++ b/source/whitepapers/reliable_logging.rst @@ -0,0 +1,81 @@ +How reliable should reliable logging be? +======================================== +With any logging, you need to decide what you want to do if the log cannot +be written + +* do you want the application to stop because it can't write a log message + +or + +* do you want the application to continue, but not write the log message + +Note that this decision is still there even if you are not logging +remotely, your local disk partition where you are writing logs can fill up, +become read-only, or have other problems. + +The RFC for syslog (dating back a couple of decades, well before rsyslog +started) specify that the application writing the log message should block +and wait for the log message to be processed. Rsyslog (like every other +modern syslog daemon) fudges this a bit and buffers the log data in RAM +rather than following the original behavior of writing the data to disk and +doing a fsync before acknowledging the log message. + +If you have a problem with your output from rsyslog, your application will +keep running until rsyslog fills it's queues, and then it will stop. + +When you configure rsyslog to send the logs to another machine (either to +rsyslog on another machine or to some sort of database), you introduce a +significant new set of failure modes for the output from rsyslog. + +You can configure the size of the rsyslog memory queues (I had one machine +dedicated to running rsyslog where I created queues large enough to use +>100G of ram for logs) + +You can configure rsyslog to spill from it's memory queues to disk queues +(disk assisted queue mode) when it fills it's memory queues. + +You can create a separate set of queues for the action that has a high +probability of failing (sending to a remote machine via TCP in this case), +but this doesn't buy you more time, it just means that other logs can +continue to be written when the remote system is down. + +You can configure rsyslog to have high/low watermark levels, when the queue +fills past the high watermark, rsyslog will start discarding logs below a +specified severity, and stop doing so when it drops below the low watermark +level + +For rsyslog -> \*syslog, you can use UDP for your transport so that the logs +will get dropped at the network layer if the remote system is unresponsive. + +You have lots of options. + +If you are really concerned with reliability, I should point out that using +TCP does not eliminate the possibility of loosing logs when a remote system +goes down. When you send a message via TCP, the sender considers it sent +when it's handed to the OS to send it. The OS has a window of how much data +it allows to be outstanding (sent without acknowledgement from the remote +system), and when the TCP connection fails (due to a firewall or a remote +machine going down), the sending OS has no way to tell the application what +data what data is outstanding, so the outstanding data will be lost. This +is a smaller window of loss than UDP, which will happily keep sending your +data forever, but it's still a potential for loss. Rsyslog offers the RELP +(Reliable Event Logging Protocol), which addresses this problem by using +application level acknowledgements so no messages can get lost due to +network issues. That just leaves memory buffering (both in rsyslog and in +the OS after rsyslog tells the OS to write the logs) as potential data loss +points. Those failures will only trigger if the system crashes or rsyslog +is shutdown (and yes, there are ways to address these as well) + +The reason why nothing today operates without the possibility of loosing +log messages is that making the logs completely reliable absolutely kills +performance. With buffering, rsyslog can handle 400,000 logs/sec on a +low-mid range machine. With utterly reliable logs and spinning disks, this +rate drops to <100 logs/sec. With a $5K PCI SSD card, you can get up to +~4,000 logs/sec (in both cases, at the cost of not being able to use the +disk for anything else on the system (so if you do use the disk for +anything else, performance drops from there, and pretty rapidly). This is +why traditional syslog had a reputation for being very slow. + +See Also +-------- +* https://rainer.gerhards.net/2008/04/on-unreliability-of-plain-tcp-syslog.html |