1 files changed, 103 insertions, 304 deletions
diff --git a/collectors/systemd-journal.plugin/README.md b/collectors/systemd-journal.plugin/README.md
index 51aa1b7cd..c3c639045 100644
--- a/collectors/systemd-journal.plugin/README.md
+++ b/collectors/systemd-journal.plugin/README.md
@@ -40,31 +40,34 @@ For more information check [this discussion](https://github.com/netdata/netdata/
 
 The following are limitations related to the availability of the plugin:
 
-- This plugin is not available when Netdata is installed in a container. The problem is that `libsystemd` is not
-  available in Alpine Linux (there is a `libsystemd`, but it is a dummy that returns failure on all calls). We plan to
-  change this, by shipping Netdata containers based on Debian.
+- Netdata versions prior to 1.44 shipped in a docker container do not include this plugin. 
+  The problem is that `libsystemd` is not available in Alpine Linux (there is a `libsystemd`, but it is a dummy that 
+  returns failure on all calls). Starting with Netdata version 1.44, Netdata containers use a Debian base image
+  making this plugin available when Netdata is running in a container.
 - For the same reason (lack of `systemd` support for Alpine Linux), the plugin is not available on `static` builds of
-  Netdata (which are based on `muslc`, not `glibc`).
+  Netdata (which are based on `muslc`, not `glibc`). If your Netdata is installed in `/opt/netdata` you most likely have
+  a static build of Netdata.
 - On old systemd systems (like Centos 7), the plugin runs always in "full data query" mode, which makes it slower. The
   reason, is that systemd API is missing some important calls we need to use the field indexes of `systemd` journal.
   However, when running in this mode, the plugin offers also negative matches on the data (like filtering for all logs
   that do not have set some field), and this is the reason "full data query" mode is also offered as an option even on
   newer versions of `systemd`.
 
-To use the plugin, install one of our native distribution packages, or install it from source.
-
 #### `systemd` journal features
 
 The following are limitations related to the features of `systemd` journal:
 
-- This plugin does not support binary field values. `systemd` journal has the ability to assign fields with binary data.
-  This plugin assumes all fields contain text values (text in this context includes numbers).
+- This plugin assumes that binary field values are text fields with newlines in them. `systemd-journal` has the ability
+  to support binary fields, without specifying the nature of the binary data. However, binary fields are commonly used
+  to store log entries that include multiple lines of text. The plugin treats all binary fields are multi-line text.
 - This plugin does not support multiple values per field for any given log entry. `systemd` journal has the ability to
   accept the same field key, multiple times, with multiple values on a single log entry. This plugin will present the
   last value and ignore the others for this log entry.
-- This plugin will only read journal files located in `/var/log/journal` or `/run/log/journal`. `systemd-remote` has the
+- This plugin will only read journal files located in `/var/log/journal` or `/run/log/journal`. `systemd-journal-remote` has the
   ability to store journal files anywhere (user configured). If journal files are not located in `/var/log/journal`
-  or `/run/log/journal` (and any of their subdirectories), the plugin will not find them.
+  or `/run/log/journal` (and any of their subdirectories), the plugin will not find them. A simple solution is to link
+  the other directories somewhere inside `/var/log/journal`. The plugin will pick them up, even if a sub-directory of
+  `/var/log/journal` is a link to a directory outside `/var/log/journal`.
 
 Other than the above, this plugin supports all features of `systemd` journals.
 
@@ -125,8 +128,8 @@ Usually `remote` journals are named by the IP of the server sending these logs.
 extracts these IPs and performs a reverse DNS lookup to find their hostnames. When this is successful,
 `remote` journals are named by the hostnames of the origin servers.
 
-For information about configuring a journals' centralization server,
-check [this FAQ item](#how-do-i-configure-a-journals-centralization-server).
+For information about configuring a journal centralization server,
+check [this FAQ item](#how-do-i-configure-a-journal-centralization-server).
 
 ## Journal Fields
 
@@ -153,6 +156,7 @@ The plugin automatically enriches certain fields to make them more user-friendly
 - `_GID`, `OBJECT_GID`: the local group database is consulted to annotate them with group names.
 - `_CAP_EFFECTIVE`: the encoded value is annotated with a human-readable list of the linux capabilities.
 - `_SOURCE_REALTIME_TIMESTAMP`: the numeric value is annotated with human-readable datetime in UTC.
+- `MESSAGE_ID`: for the known `MESSAGE_ID`s, the value is replaced with the well known name of the event.
 
 The values of all other fields are presented as found in the journals.
 
@@ -237,6 +241,11 @@ Full text search is combined with the selected filters.
 The text box accepts asterisks `*` as wildcards. So, `a*b*c` means match anything that contains `a`, then `b` and
 then `c` with anything between them.
 
+Spaces are treated as OR expressions. So that `a*b c*d` means `a*b OR c*d`.
+
+Negative expressions are supported, by prefixing any string with `!`. Example: `!systemd *` means match anything that
+does not contain `systemd` on any of its fields.
+
 ## Query performance
 
 Journal files are designed to be accessed by multiple readers and one writer, concurrently.
@@ -278,9 +287,9 @@ multiple journal files, over long time-frames.
 During the development of this plugin, we submitted, to `systemd`, a number of patches to improve `journalctl`
 performance by a factor of 14:
 
-- https://github.com/systemd/systemd/pull/29365
-- https://github.com/systemd/systemd/pull/29366
-- https://github.com/systemd/systemd/pull/29261
+- <https://github.com/systemd/systemd/pull/29365>
+- <https://github.com/systemd/systemd/pull/29366>
+- <https://github.com/systemd/systemd/pull/29261>
 
 However, even after these patches are merged, `journalctl` will still be 2x slower than this Netdata plugin,
 on multi-journal queries.
@@ -290,13 +299,85 @@ the Netdata plugin queries each file individually and it then it merges the resu
 This is transparent, thanks to the `facets` library in `libnetdata` that handles on-the-fly indexing, filtering,
 and searching of any dataset, independently of its source.
 
+## Performance at scale
+
+On busy logs servers, or when querying long timeframes that match millions of log entries, the plugin has a sampling
+algorithm to allow it respond promptly. It works like this:
+
+1. The latest 500k log entries are queried in full, evaluating all the fields of every single log entry. This evaluation
+   allows counting the unique values per field, updating the counters next to each value at the filters section of the
+   dashboard.
+2. When the latest 500k log entries have been processed and there are more data to read, the plugin divides evenly 500k
+   more log entries to the number of journal files matched by the query. So, it will continue to evaluate all the fields
+   of all log entries, up to the budget per file, aiming to fully query 1 million log entries in total.
+3. When the budget is hit for a given file, the plugin continues to scan log entries, but this time it does not evaluate
+   the fields and their values, so the counters per field and value are not updated. These unsampled log entries are
+   shown in the histogram with the label `[unsampled]`.
+4. The plugin continues to count `[unsampled]` entries until as many as sampled entries have been evaluated and at least
+   1% of the journal file has been processed.
+5. When the `[unsampled]` budget is exhausted, the plugin stops processing the journal file and based on the processing
+   completed so far and the number of entries in the journal file, it estimates the remaining number of log entries in
+   that file. This is shown as `[estimated]` at the histogram.
+6. In systemd versions 254 or later, the plugin fetches the unique sequence number of each log entry and calculates the
+   the percentage of the file matched by the query, versus the total number of the log entries in the journal file.
+7. In systemd versions prior to 254, the plugin estimates the number of entries the journal file contributes to the
+   query, using the amount of log entries matched it vs. the total duration the log file has entries for. 
+
+The above allow the plugin to respond promptly even when the number of log entries in the journal files is several
+dozens millions, while providing accurate estimations of the log entries over time at the histogram and enough counters
+at the fields filtering section to help users get an overview of the whole timeframe.
+
+The fact that the latest 500k log entries and 1% of all journal files (which are spread over time) have been fully
+evaluated, including counting the number of appearances for each field value, the plugin usually provides an accurate
+representation of the whole timeframe.
+
+Keep in mind that although the plugin is quite effective and responds promptly when there are hundreds of journal files
+matching a query, response times may be longer when there are several thousands of smaller files. systemd versions 254+
+attempt to solve this problem by allowing `systemd-journal-remote` to create larger files. However, for systemd
+versions prior to 254, `systemd-journal-remote` creates files of up to 32MB each, which when running very busy
+journals centralization servers aggregating several thousands of log entries per second, the number of files can grow
+to several dozens of thousands quickly. In such setups, the plugin should ideally skip processing journal files
+entirely, relying solely on the estimations of the sequence of files each file is part of. However, this has not been
+implemented yet. To improve the query performance in such setups, the user has to query smaller timeframes.
+
+Another optimization taking place in huge journal centralization points, is the initial scan of the database. The plugin
+needs to know the list of all journal files available, including the details of the first and the last message in each
+of them. When there are several thousands of files in a directory (like it usually happens in `/var/log/journal/remote`),
+directory listing and examination of each file can take a considerable amount of time (even `ls -l` takes minutes).
+To work around this problem, the plugin uses `inotify` to receive file updates immediately and scans the library from
+the newest to the oldest file, allowing the user interface to work immediately after startup, for the most recent
+timeframes.
+
+### Best practices for better performance
+
+systemd-journal has been designed **first to be reliable** and then to be fast. It includes several mechanisms to ensure
+minimal data loss under all conditions (e.g. disk corruption, tampering, forward secure sealing) and despite the fact
+that it utilizes several techniques to require minimal disk footprint (like deduplication of log entries, linking of
+values and fields, compression) the disk footprint of journal files remains significantly higher compared to other log
+management solutions.
+
+The higher disk footprint results in higher disk I/O during querying, since a lot more data have to read from disk to
+evaluate a query. Query performance at scale can greatly benefit by utilizing a compressed filesystem (ext4, btrfs, zfs)
+to store systemd-journal files.
+
+systemd-journal files are cached by the operating system. There is no database server to serve queries. Each file is
+opened and the query runs by directly accessing the data in it.
+
+Therefore systemd-journal relies on the caching layer of the operating system to optimize query performance. The more
+RAM the system has, although it will not be reported as `used` (it will be reported as `cache`), the faster the queries
+will get. The first time a timeframe is accessed the query performance will be slower, but further queries on the same
+timeframe will be significantly faster since journal data are now cached in memory.
+
+So, on busy logs centralization systems, queries performance can be improved significantly by using a compressed
+filesystem for storing the journal files, and higher amounts of RAM.
+
 ## Configuration and maintenance
 
 This Netdata plugin does not require any configuration or maintenance.
 
 ## FAQ
 
-### Can I use this plugin on journals' centralization servers?
+### Can I use this plugin on journal centralization servers?
 
 Yes. You can centralize your logs using `systemd-journal-remote`, and then install Netdata
 on this logs centralization server to explore the logs of all your infrastructure.
@@ -304,7 +385,7 @@ on this logs centralization server to explore the logs of all your infrastructur
 This plugin will automatically provide multi-node views of your logs and also give you the ability to combine the logs
 of multiple servers, as you see fit.
 
-Check [configuring a logs centralization server](#configuring-a-journals-centralization-server).
+Check [configuring a logs centralization server](#how-do-i-configure-a-journal-centralization-server).
 
 ### Can I use this plugin from a parent Netdata?
 
@@ -364,7 +445,7 @@ Yes. It is simple, fast and the software to do it is already in your systems.
 For application and system logs, `systemd` journal is ideal and the visibility you can get
 by centralizing your system logs and the use of this Netdata plugin, is unparalleled.
 
-### How do I configure a journals' centralization server?
+### How do I configure a journal centralization server?
 
 A short summary to get journal server running can be found below.
 There are two strategies you can apply, when it comes down to a centralized server for `systemd` journal logs.
@@ -374,294 +455,13 @@ There are two strategies you can apply, when it comes down to a centralized serv
 
 For more options and reference to documentation, check `man systemd-journal-remote` and `man systemd-journal-upload`.
 
-#### _passive_ journals' centralization without encryption
-
-> ℹ️ _passive_ is a journal server that waits for clients to push their metrics to it.
-
-> ⚠️ **IMPORTANT**
-> These instructions will copy your logs to a central server, without any encryption or authorization.
-> DO NOT USE THIS ON NON-TRUSTED NETWORKS.
-
-##### _passive_ server, without encryption
-
-On the centralization server install `systemd-journal-remote`:
-
-```sh
-# change this according to your distro
-sudo apt-get install systemd-journal-remote
-```
-
-Make sure the journal transfer protocol is `http`:
-
-```sh
-sudo cp /lib/systemd/system/systemd-journal-remote.service /etc/systemd/system/
-
-# edit it to make sure it says:
-# --listen-http=-3
-# not:
-# --listen-https=-3
-sudo nano /etc/systemd/system/systemd-journal-remote.service
-
-# reload systemd
-sudo systemctl daemon-reload
-```
-
-Optionally, if you want to change the port (the default is `19532`), edit `systemd-journal-remote.socket`
-
-```sh
-# edit the socket file
-sudo systemctl edit systemd-journal-remote.socket
-```
-
-and add the following lines into the instructed place, and choose your desired port; save and exit.
-
-```sh
-[Socket]
-ListenStream=<DESIRED_PORT>
-```
-
-Finally, enable it, so that it will start automatically upon receiving a connection:
-
-```
-# enable systemd-journal-remote
-sudo systemctl enable --now systemd-journal-remote.socket
-sudo systemctl enable systemd-journal-remote.service
-```
-
-`systemd-journal-remote` is now listening for incoming journals from remote hosts.
-
-##### _passive_ client, without encryption
-
-On the clients, install `systemd-journal-remote`:
-
-```sh
-# change this according to your distro
-sudo apt-get install systemd-journal-remote
-```
-
-Edit `/etc/systemd/journal-upload.conf` and set the IP address and the port of the server, like so:
-
-```
-[Upload]
-URL=http://centralization.server.ip:19532
-```
-
-Edit `systemd-journal-upload`, and add `Restart=always` to make sure the client will keep trying to push logs, even if the server is temporarily not there, like this:
-
-```sh
-sudo systemctl edit systemd-journal-upload
-```
-
-At the top, add:
-
-```
-[Service]
-Restart=always
-```
-
-Enable and start `systemd-journal-upload`, like this:
-
-```sh
-sudo systemctl enable systemd-journal-upload
-sudo systemctl start systemd-journal-upload
-```
-
-##### verify it works
-
-To verify the central server is receiving logs, run this on the central server:
-
-```sh
-sudo ls -l /var/log/journal/remote/
-```
-
-You should see new files from the client's IP.
-
-Also, `systemctl status systemd-journal-remote` should show something like this:
-
-```
-systemd-journal-remote.service - Journal Remote Sink Service
-     Loaded: loaded (/etc/systemd/system/systemd-journal-remote.service; indirect; preset: disabled)
-     Active: active (running) since Sun 2023-10-15 14:29:46 EEST; 2h 24min ago
-TriggeredBy: ● systemd-journal-remote.socket
-       Docs: man:systemd-journal-remote(8)
-             man:journal-remote.conf(5)
-   Main PID: 2118153 (systemd-journal)
-     Status: "Processing requests..."
-      Tasks: 1 (limit: 154152)
-     Memory: 2.2M
-        CPU: 71ms
-     CGroup: /system.slice/systemd-journal-remote.service
-             └─2118153 /usr/lib/systemd/systemd-journal-remote --listen-http=-3 --output=/var/log/journal/remote/
-```
-
-Note the `status: "Processing requests..."` and the PID under `CGroup`.
-
-On the client `systemctl status systemd-journal-upload` should show something like this:
-
-```
-● systemd-journal-upload.service - Journal Remote Upload Service
-     Loaded: loaded (/lib/systemd/system/systemd-journal-upload.service; enabled; vendor preset: disabled)
-    Drop-In: /etc/systemd/system/systemd-journal-upload.service.d
-             └─override.conf
-     Active: active (running) since Sun 2023-10-15 10:39:04 UTC; 3h 17min ago
-       Docs: man:systemd-journal-upload(8)
-   Main PID: 4169 (systemd-journal)
-     Status: "Processing input..."
-      Tasks: 1 (limit: 13868)
-     Memory: 3.5M
-        CPU: 1.081s
-     CGroup: /system.slice/systemd-journal-upload.service
-             └─4169 /lib/systemd/systemd-journal-upload --save-state
-```
-
-Note the `Status: "Processing input..."` and the PID under `CGroup`.
-
-#### _passive_ journals' centralization with encryption using self-signed certificates
-
-> ℹ️ _passive_ is a journal server that waits for clients to push their metrics to it.
+#### _passive_ journal centralization without encryption
 
-##### _passive_ server, with encryption and self-singed certificates
+If you want to setup your own passive journal centralization setup without encryption, [check out guide on it](https://github.com/netdata/netdata/blob/master/collectors/systemd-journal.plugin/passive_journal_centralization_guide_no_encryption.md).
 
-On the centralization server install `systemd-journal-remote` and `openssl`:
-
-```sh
-# change this according to your distro
-sudo apt-get install systemd-journal-remote openssl
-```
-
-Make sure the journal transfer protocol is `https`:
-
-```sh
-sudo cp /lib/systemd/system/systemd-journal-remote.service /etc/systemd/system/
-
-# edit it to make sure it says:
-# --listen-https=-3
-# not:
-# --listen-http=-3
-sudo nano /etc/systemd/system/systemd-journal-remote.service
-
-# reload systemd
-sudo systemctl daemon-reload
-```
-
-Optionally, if you want to change the port (the default is `19532`), edit `systemd-journal-remote.socket`
-
-```sh
-# edit the socket file
-sudo systemctl edit systemd-journal-remote.socket
-```
-
-and add the following lines into the instructed place, and choose your desired port; save and exit.
-
-```sh
-[Socket]
-ListenStream=<DESIRED_PORT>
-```
-
-Finally, enable it, so that it will start automatically upon receiving a connection:
-
-```sh
-# enable systemd-journal-remote
-sudo systemctl enable --now systemd-journal-remote.socket
-sudo systemctl enable systemd-journal-remote.service
-```
-
-`systemd-journal-remote` is now listening for incoming journals from remote hosts.
-
-Use [this script](https://gist.github.com/ktsaou/d62b8a6501cf9a0da94f03cbbb71c5c7) to create a self-signed certificates authority and certificates for all your servers.
-
-```sh
-wget -O systemd-journal-self-signed-certs.sh "https://gist.githubusercontent.com/ktsaou/d62b8a6501cf9a0da94f03cbbb71c5c7/raw/c346e61e0a66f45dc4095d254bd23917f0a01bd0/systemd-journal-self-signed-certs.sh"
-chmod 755 systemd-journal-self-signed-certs.sh
-```
-
-Edit the script and at its top, set your settings:
-
-```sh
-# The directory to save the generated certificates (and everything about this certificate authority).
-# This is only used on the node generating the certificates (usually on the journals server).
-DIR="/etc/ssl/systemd-journal-remote"
-
-# The journals centralization server name (the CN of the server certificate).
-SERVER="server-hostname"
-
-# All the DNS names or IPs this server is reachable at (the certificate will include them).
-# Journal clients can use any of them to connect to this server.
-# systemd-journal-upload validates its URL= hostname, against this list.
-SERVER_ALIASES=("DNS:server-hostname1" "DNS:server-hostname2" "IP:1.2.3.4" "IP:10.1.1.1" "IP:172.16.1.1")
-
-# All the names of the journal clients that will be sending logs to the server (the CNs of their certificates).
-# These names are used by systemd-journal-remote to name the journal files in /var/log/journal/remote/.
-# Also the remote hosts will be presented using these names on Netdata dashboards.
-CLIENTS=("vm1" "vm2" "vm3" "add_as_may_as_needed")
-```
-
-Then run the script:
-
-```sh
-sudo ./systemd-journal-self-signed-certs.sh
-```
-
-The script will create the directory `/etc/ssl/systemd-journal-remote` and in it you will find all the certificates needed.
-
-There will also be files named `runme-on-XXX.sh`. There will be 1 script for the server and 1 script for each of the clients. You can copy and paste (or `scp`) these scripts on your server and each of your clients and run them as root:
-
-```sh
-scp /etc/ssl/systemd-journal-remote/runme-on-XXX.sh XXX:/tmp/
-```
-
-Once the above is done, `ssh` to each server/client and do:
-
-```sh
-sudo bash /tmp/runme-on-XXX.sh
-```
-
-The scripts install the needed certificates, fix their file permissions to be accessible by systemd-journal-remote/upload, change `/etc/systemd/journal-remote.conf` (on the server) or `/etc/systemd/journal-upload.conf` on the clients and restart the relevant services.
-
-
-##### _passive_ client, with encryption and self-singed certificates
-
-On the clients, install `systemd-journal-remote`:
-
-```sh
-# change this according to your distro
-sudo apt-get install systemd-journal-remote
-```
-
-Edit `/etc/systemd/journal-upload.conf` and set the IP address and the port of the server, like so:
-
-```
-[Upload]
-URL=https://centralization.server.ip:19532
-```
-
-Make sure that `centralization.server.ip` is one of the `SERVER_ALIASES` when you created the certificates.
-
-Edit `systemd-journal-upload`, and add `Restart=always` to make sure the client will keep trying to push logs, even if the server is temporarily not there, like this:
-
-```sh
-sudo systemctl edit systemd-journal-upload
-```
-
-At the top, add:
-
-```
-[Service]
-Restart=always
-```
-
-Enable and start `systemd-journal-upload`, like this:
-
-```sh
-sudo systemctl enable systemd-journal-upload
-```
-
-Copy the relevant `runme-on-XXX.sh` script as described on server setup and run it:
-
-```sh
-sudo bash /tmp/runme-on-XXX.sh
-```
+#### _passive_ journal centralization with encryption using self-signed certificates
 
+If you want to setup your own passive journal centralization setup using self-signed certificates for encryption, [check out guide on it](https://github.com/netdata/netdata/blob/master/collectors/systemd-journal.plugin/passive_journal_centralization_guide_self_signed_certs.md).
 
 #### Limitations when using a logs centralization server
 
@@ -670,4 +470,3 @@ As of this writing `namespaces` support by `systemd` is limited:
 - Docker containers cannot log to namespaces. Check [this issue](https://github.com/moby/moby/issues/41879).
 - `systemd-journal-upload` automatically uploads `system` and `user` journals, but not `namespaces` journals. For this
   you need to spawn a `systemd-journal-upload` per namespace.
-