summaryrefslogtreecommitdiffstats
path: root/docs/JOURNAL_NATIVE_PROTOCOL.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/JOURNAL_NATIVE_PROTOCOL.md')
-rw-r--r--docs/JOURNAL_NATIVE_PROTOCOL.md191
1 files changed, 191 insertions, 0 deletions
diff --git a/docs/JOURNAL_NATIVE_PROTOCOL.md b/docs/JOURNAL_NATIVE_PROTOCOL.md
new file mode 100644
index 0000000..77fb27f
--- /dev/null
+++ b/docs/JOURNAL_NATIVE_PROTOCOL.md
@@ -0,0 +1,191 @@
+---
+title: Native Journal Protocol
+category: Interfaces
+layout: default
+SPDX-License-Identifier: LGPL-2.1-or-later
+---
+
+# Native Journal Protocol
+
+`systemd-journald.service` accepts log data via various protocols:
+
+* Classic RFC3164 BSD syslog via the `/dev/log` socket
+* STDOUT/STDERR of programs via `StandardOutput=journal` + `StandardError=journal` in service files (both of which are default settings)
+* Kernel log messages via the `/dev/kmsg` device node
+* Audit records via the kernel's audit subsystem
+* Structured log messages via `journald`'s native protocol
+
+The latter is what this document is about: if you are developing a program and
+want to pass structured log data to `journald`, it's the Journal's native
+protocol that you want to use. The systemd project provides the
+[`sd_journal_print(3)`](https://www.freedesktop.org/software/systemd/man/sd_journal_print.html)
+API that implements the client side of this protocol. This document explains
+what this interface does behind the scenes, in case you'd like to implement a
+client for it yourself, without linking to `libsystemd` — for example because
+you work in a programming language other than C or otherwise want to avoid the
+dependency.
+
+## Basics
+
+The native protocol of `journald` is spoken on the
+`/run/systemd/journal/socket` `AF_UNIX`/`SOCK_DGRAM` socket on which
+`systemd-journald.service` listens. Each datagram sent to this socket
+encapsulates one journal entry that shall be written. Since datagrams are
+subject to a size limit and we want to allow large journal entries, datagrams
+sent over this socket may come in one of two formats:
+
+* A datagram with the literal journal entry data as payload, without
+ any file descriptors attached.
+
+* A datagram with an empty payload, but with a single
+ [`memfd`](https://man7.org/linux/man-pages/man2/memfd_create.2.html)
+ file descriptor that contains the literal journal entry data.
+
+Other combinations are not permitted, i.e. datagrams with both payload and file
+descriptors, or datagrams with neither, or more than one file descriptor. Such
+datagrams are ignored. The `memfd` file descriptor should be fully sealed. The
+binary format in the datagram payload and in the `memfd` memory is
+identical. Typically a client would attempt to first send the data as datagram
+payload, but if this fails with an `EMSGSIZE` error it would immediately retry
+via the `memfd` logic.
+
+A client probably should bump up the `SO_SNDBUF` socket option of its `AF_UNIX`
+socket towards `journald` in order to delay blocking I/O as much as possible.
+
+## Data Format
+
+Each datagram should consist of a number of environment-like key/value
+assignments. Unlike environment variable assignments the value may contain NUL
+bytes however, as well as any other binary data. Keys may not include the `=`
+or newline characters (or any other control characters or non-ASCII characters)
+and may not be empty.
+
+Serialization into the datagram payload or `memfd` is straightforward: each
+key/value pair is serialized via one of two methods:
+
+* The first method inserts a `=` character between key and value, and suffixes
+the result with `\n` (i.e. the newline character, ASCII code 10). Example: a
+key `FOO` with a value `BAR` is serialized `F`, `O`, `O`, `=`, `B`, `A`, `R`,
+`\n`.
+
+* The second method should be used if the value of a field contains a `\n`
+byte. In this case, the key name is serialized as is, followed by a `\n`
+character, followed by a (non-aligned) little-endian unsigned 64bit integer
+encoding the size of the value, followed by the literal value data, followed by
+`\n`. Example: a key `FOO` with a value `BAR` may be serialized using this
+second method as: `F`, `O`, `O`, `\n`, `\003`, `\000`, `\000`, `\000`, `\000`,
+`\000`, `\000`, `\000`, `B`, `A`, `R`, `\n`.
+
+If the value of a key/value pair contains a newline character (`\n`), it *must*
+be serialized using the second method. If it does not, either method is
+permitted. However, it is generally recommended to use the first method if
+possible for all key/value pairs where applicable since the generated datagrams
+are easily recognized and understood by the human eye this way, without any
+manual binary decoding — which improves the debugging experience a lot, in
+particular with tools such as `strace` that can show datagram content as text
+dump. After all, log messages are highly relevant for debugging programs, hence
+optimizing log traffic for readability without special tools is generally
+desirable.
+
+Note that keys that begin with `_` have special semantics in `journald`: they
+are *trusted* and implicitly appended by `journald` on the receiving
+side. Clients should not send them — if they do anyway, they will be ignored.
+
+The most important key/value pair to send is `MESSAGE=`, as that contains the
+actual log message text. Other relevant keys a client should send in most cases
+are `PRIORITY=`, `CODE_FILE=`, `CODE_LINE=`, `CODE_FUNC=`, `ERRNO=`. It's
+recommended to generate these fields implicitly on the client side. For further
+information see the [relevant documentation of these
+fields](https://www.freedesktop.org/software/systemd/man/systemd.journal-fields.html).
+
+The order in which the fields are serialized within one datagram is undefined
+and may be freely chosen by the client. The server side might or might not
+retain or reorder it when writing it to the Journal.
+
+Some programs might generate multi-line log messages (e.g. a stack unwinder
+generating log output about a stack trace, with one line for each stack
+frame). It's highly recommended to send these as a single datagram, using a
+single `MESSAGE=` field with embedded newline characters between the lines (the
+second serialization method described above must hence be used for this
+field). If possible do not split up individual events into multiple Journal
+events that might then be processed and written into the Journal as separate
+entries. The Journal toolchain is capable of handling multi-line log entries
+just fine, and it's generally preferred to have a single set of metadata fields
+associated with each multi-line message.
+
+Note that the same keys may be used multiple times within the same datagram,
+with different values. The Journal supports this and will write such entries to
+disk without complaining. This is useful for associating a single log entry
+with multiple suitable objects of the same type at once. This should only be
+used for specific Journal fields however, where this is expected. Do not use
+this for Journal fields where this is not expected and where code reasonably
+assumes per-event uniqueness of the keys. In most cases code that consumes and
+displays log entries is likely to ignore such non-unique fields or only
+consider the first of the specified values. Specifically, if a Journal entry
+contains multiple `MESSAGE=` fields, likely only the first one is
+displayed. Note that a well-written logging client library thus will not use a
+plain dictionary for accepting structured log metadata, but rather a data
+structure that allows non-unique keys, for example an array, or a dictionary
+that optionally maps to a set of values instead of a single value.
+
+## Example Datagram
+
+Here's an encoded message, with various common fields, all encoded according to
+the first serialization method, with the exception of one, where the value
+contains a newline character, and thus the second method is needed to be used.
+
+```
+PRIORITY=3\n
+SYSLOG_FACILITY=3\n
+CODE_FILE=src/foobar.c\n
+CODE_LINE=77\n
+BINARY_BLOB\n
+\004\000\000\000\000\000\000\000xx\nx\n
+CODE_FUNC=some_func\n
+SYSLOG_IDENTIFIER=footool\n
+MESSAGE=Something happened.\n
+```
+
+(Lines are broken here after each `\n` to make things more readable. C-style
+backslash escaping is used.)
+
+## Automatic Protocol Upgrading
+
+It might be wise to automatically upgrade to logging via the Journal's native
+protocol in clients that previously used the BSD syslog protocol. Behaviour in
+this case should be pretty obvious: try connecting a socket to
+`/run/systemd/journal/socket` first (on success use the native Journal
+protocol), and if that fails fall back to `/dev/log` (and use the BSD syslog
+protocol).
+
+Programs normally logging to STDERR might also choose to upgrade to native
+Journal logging in case they are invoked via systemd's service logic, where
+STDOUT and STDERR are going to the Journal anyway. By preferring the native
+protocol over STDERR-based logging, structured metadata can be passed along,
+including priority information and more — which is not available on STDERR
+based logging. If a program wants to detect automatically whether its STDERR is
+connected to the Journal's stream transport, look for the `$JOURNAL_STREAM`
+environment variable. The systemd service logic sets this variable to a
+colon-separated pair of device and inode number (formatted in decimal ASCII) of
+the STDERR file descriptor. If the `.st_dev` and `.st_ino` fields of the
+`struct stat` data returned by `fstat(STDERR_FILENO, …)` match these values a
+program can be sure its STDERR is connected to the Journal, and may then opt to
+upgrade to the native Journal protocol via an `AF_UNIX` socket of its own, and
+cease to use STDERR.
+
+Why bother with this environment variable check? A service program invoked by
+systemd might employ shell-style I/O redirection on invoked subprograms, and
+those should likely not upgrade to the native Journal protocol, but instead
+continue to use the redirected file descriptors passed to them. Thus, by
+comparing the device and inode number of the actual STDERR file descriptor with
+the one the service manager passed, one can make sure that no I/O redirection
+took place for the current program.
+
+## Alternative Implementations
+
+If you are looking for alternative implementations of this protocol (besides
+systemd's own in `sd_journal_print()`), consider
+[GLib's](https://gitlab.gnome.org/GNOME/glib/-/blob/main/glib/gmessages.c) or
+[`dbus-broker`'s](https://github.com/bus1/dbus-broker/blob/main/src/util/log.c).
+
+And that's already all there is to it.