summaryrefslogtreecommitdiffstats
path: root/collectors/log2journal/README.md
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-03-09 13:19:22 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-03-09 13:19:22 +0000
commitc21c3b0befeb46a51b6bf3758ffa30813bea0ff0 (patch)
tree9754ff1ca740f6346cf8483ec915d4054bc5da2d /collectors/log2journal/README.md
parentAdding upstream version 1.43.2. (diff)
downloadnetdata-upstream/1.44.3.tar.xz
netdata-upstream/1.44.3.zip
Adding upstream version 1.44.3.upstream/1.44.3
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to '')
-rw-r--r--collectors/log2journal/README.md912
1 files changed, 912 insertions, 0 deletions
diff --git a/collectors/log2journal/README.md b/collectors/log2journal/README.md
new file mode 100644
index 000000000..16ccc033c
--- /dev/null
+++ b/collectors/log2journal/README.md
@@ -0,0 +1,912 @@
+
+# log2journal
+
+`log2journal` and `systemd-cat-native` can be used to convert a structured log file, such as the ones generated by web servers, into `systemd-journal` entries.
+
+By combining these tools you can create advanced log processing pipelines sending any kind of structured text logs to systemd-journald. This is a simple, but powerful and efficient way to handle log processing.
+
+The process involves the usual piping of shell commands, to get and process the log files in realtime.
+
+The result is like this: nginx logs into systemd-journal:
+
+![image](https://github.com/netdata/netdata/assets/2662304/16b471ff-c5a1-4fcc-bcd5-83551e089f6c)
+
+
+The overall process looks like this:
+
+```bash
+tail -F /var/log/nginx/*.log |\ # outputs log lines
+ log2journal 'PATTERN' |\ # outputs Journal Export Format
+ systemd-cat-native # send to local/remote journald
+```
+
+These are the steps:
+
+1. `tail -F /var/log/nginx/*.log`<br/>this command will tail all `*.log` files in `/var/log/nginx/`. We use `-F` instead of `-f` to ensure that files will still be tailed after log rotation.
+2. `log2joural` is a Netdata program. It reads log entries and extracts fields, according to the PCRE2 pattern it accepts. It can also apply some basic operations on the fields, like injecting new fields or duplicating existing ones or rewriting their values. The output of `log2journal` is in Systemd Journal Export Format, and it looks like this:
+ ```bash
+ KEY1=VALUE1 # << start of the first log line
+ KEY2=VALUE2
+ # << log lines separator
+ KEY1=VALUE1 # << start of the second log line
+ KEY2=VALUE2
+ ```
+3. `systemd-cat-native` is a Netdata program. I can send the logs to a local `systemd-journald` (journal namespaces supported), or to a remote `systemd-journal-remote`.
+
+
+## Processing pipeline
+
+The sequence of processing in Netdata's `log2journal` is designed to methodically transform and prepare log data for export in the systemd Journal Export Format. This transformation occurs through a pipeline of stages, each with a specific role in processing the log entries. Here's a description of each stage in the sequence:
+
+1. **Input**<br/>
+ The tool reads one log line at a time from the input source. It supports different input formats such as JSON, logfmt, and free-form logs defined by PCRE2 patterns.
+
+2. **Extract Fields and Values**<br/>
+ Based on the input format (JSON, logfmt, or custom pattern), it extracts fields and their values from each log line. In the case of JSON and logfmt, it automatically extracts all fields. For custom patterns, it uses PCRE2 regular expressions, and fields are extracted based on sub-expressions defined in the pattern.
+
+3. **Transliteration**<br/>
+ Extracted fields are transliterated to the limited character set accepted by systemd-journal: capitals A-Z, digits 0-9, underscores.
+
+4. **Apply Optional Prefix**<br/>
+ If a prefix is specified, it is added to all keys. This happens before any other processing so that all subsequent matches and manipulations take the prefix into account.
+
+5. **Rename Fields**<br/>
+ Renames fields as specified in the configuration. This is used to change the names of the fields to match desired or required naming conventions.
+
+6. **Inject New Fields**<br/>
+ New fields are injected into the log data. This can include constants or values derived from other fields, using variable substitution.
+
+7. **Rewrite Field Values**<br/>
+ Applies rewriting rules to alter the values of the fields. This can involve complex transformations, including regular expressions and variable substitutions. The rewrite rules can also inject new fields into the data.
+
+8. **Filter Fields**<br/>
+ Fields are filtered based on include and exclude patterns. This stage selects which fields are to be sent to the journal, allowing for selective logging.
+
+9. **Output**<br/>
+ Finally, the processed log data is output in the Journal Export Format. This format is compatible with systemd's journaling system and can be sent to local or remote systemd journal systems, by piping the output of `log2journal` to `systemd-cat-native`.
+
+This pipeline ensures a flexible and comprehensive approach to log processing, allowing for a wide range of modifications and customizations to fit various logging requirements. Each stage builds upon the previous one, enabling complex log transformations and enrichments before the data is exported to the systemd journal.
+
+## Real-life example
+
+We have an nginx server logging in this standard combined log format:
+
+```bash
+ log_format combined '$remote_addr - $remote_user [$time_local] '
+ '"$request" $status $body_bytes_sent '
+ '"$http_referer" "$http_user_agent"';
+```
+
+### Extracting fields with a pattern
+
+First, let's find the right pattern for `log2journal`. We ask ChatGPT:
+
+```
+My nginx log uses this log format:
+
+log_format access '$remote_addr - $remote_user [$time_local] '
+ '"$request" $status $body_bytes_sent '
+ '"$http_referer" "$http_user_agent"';
+
+I want to use `log2joural` to convert this log for systemd-journal.
+`log2journal` accepts a PCRE2 regular expression, using the named groups
+in the pattern as the journal fields to extract from the logs.
+
+Please give me the PCRE2 pattern to extract all the fields from my nginx
+log files.
+```
+
+ChatGPT replies with this:
+
+```regexp
+ (?x) # Enable PCRE2 extended mode
+ ^
+ (?<remote_addr>[^ ]+) \s - \s
+ (?<remote_user>[^ ]+) \s
+ \[
+ (?<time_local>[^\]]+)
+ \]
+ \s+ "
+ (?<request>
+ (?<request_method>[A-Z]+) \s+
+ (?<request_uri>[^ ]+) \s+
+ (?<server_protocol>[^"]+)
+ )
+ " \s+
+ (?<status>\d+) \s+
+ (?<body_bytes_sent>\d+) \s+
+ "(?<http_referer>[^"]*)" \s+
+ "(?<http_user_agent>[^"]*)"
+```
+
+Let's see what the above says:
+
+1. `(?x)`: enable PCRE2 extended mode. In this mode spaces and newlines in the pattern are ignored. To match a space you have to use `\s`. This mode allows us to split the pattern is multiple lines and add comments to it.
+1. `^`: match the beginning of the line
+2. `(?<remote_addr[^ ]+)`: match anything up to the first space (`[^ ]+`), and name it `remote_addr`.
+3. `\s`: match a space
+4. `-`: match a hyphen
+5. and so on...
+
+We edit `nginx.yaml` and add it, like this:
+
+```yaml
+pattern: |
+ (?x) # Enable PCRE2 extended mode
+ ^
+ (?<remote_addr>[^ ]+) \s - \s
+ (?<remote_user>[^ ]+) \s
+ \[
+ (?<time_local>[^\]]+)
+ \]
+ \s+ "
+ (?<request>
+ (?<request_method>[A-Z]+) \s+
+ (?<request_uri>[^ ]+) \s+
+ (?<server_protocol>[^"]+)
+ )
+ " \s+
+ (?<status>\d+) \s+
+ (?<body_bytes_sent>\d+) \s+
+ "(?<http_referer>[^"]*)" \s+
+ "(?<http_user_agent>[^"]*)"
+```
+
+Let's test it with a sample line (instead of `tail`):
+
+```bash
+# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
+BODY_BYTES_SENT=4172
+HTTP_REFERER=-
+HTTP_USER_AGENT=Go-http-client/1.1
+REMOTE_ADDR=1.2.3.4
+REMOTE_USER=-
+REQUEST=GET /index.html HTTP/1.1
+REQUEST_METHOD=GET
+REQUEST_URI=/index.html
+SERVER_PROTOCOL=HTTP/1.1
+STATUS=200
+TIME_LOCAL=19/Nov/2023:00:24:43 +0000
+
+```
+
+As you can see, it extracted all the fields and made them capitals, as systemd-journal expects them.
+
+### Prefixing field names
+
+To make sure the fields are unique for nginx and do not interfere with other applications, we should prefix them with `NGINX_`:
+
+```yaml
+pattern: |
+ (?x) # Enable PCRE2 extended mode
+ ^
+ (?<remote_addr>[^ ]+) \s - \s
+ (?<remote_user>[^ ]+) \s
+ \[
+ (?<time_local>[^\]]+)
+ \]
+ \s+ "
+ (?<request>
+ (?<request_method>[A-Z]+) \s+
+ (?<request_uri>[^ ]+) \s+
+ (?<server_protocol>[^"]+)
+ )
+ " \s+
+ (?<status>\d+) \s+
+ (?<body_bytes_sent>\d+) \s+
+ "(?<http_referer>[^"]*)" \s+
+ "(?<http_user_agent>[^"]*)"
+
+prefix: 'NGINX_' # <<< we added this
+```
+
+And let's try it:
+
+```bash
+# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
+NGINX_BODY_BYTES_SENT=4172
+NGINX_HTTP_REFERER=-
+NGINX_HTTP_USER_AGENT=Go-http-client/1.1
+NGINX_REMOTE_ADDR=1.2.3.4
+NGINX_REMOTE_USER=-
+NGINX_REQUEST=GET /index.html HTTP/1.1
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_URI=/index.html
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_STATUS=200
+NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
+
+```
+
+### Renaming fields
+
+Now, all fields start with `NGINX_` but we want `NGINX_REQUEST` to be the `MESSAGE` of the log line, as we will see it by default in `journalctl` and the Netdata dashboard. Let's rename it:
+
+```yaml
+pattern: |
+ (?x) # Enable PCRE2 extended mode
+ ^
+ (?<remote_addr>[^ ]+) \s - \s
+ (?<remote_user>[^ ]+) \s
+ \[
+ (?<time_local>[^\]]+)
+ \]
+ \s+ "
+ (?<request>
+ (?<request_method>[A-Z]+) \s+
+ (?<request_uri>[^ ]+) \s+
+ (?<server_protocol>[^"]+)
+ )
+ " \s+
+ (?<status>\d+) \s+
+ (?<body_bytes_sent>\d+) \s+
+ "(?<http_referer>[^"]*)" \s+
+ "(?<http_user_agent>[^"]*)"
+
+prefix: 'NGINX_'
+
+rename: # <<< we added this
+ - new_key: MESSAGE # <<< we added this
+ old_key: NGINX_REQUEST # <<< we added this
+```
+
+Let's test it:
+
+```bash
+# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
+MESSAGE=GET /index.html HTTP/1.1 # <<< renamed !
+NGINX_BODY_BYTES_SENT=4172
+NGINX_HTTP_REFERER=-
+NGINX_HTTP_USER_AGENT=Go-http-client/1.1
+NGINX_REMOTE_ADDR=1.2.3.4
+NGINX_REMOTE_USER=-
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_URI=/index.html
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_STATUS=200
+NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
+
+```
+
+### Injecting new fields
+
+To have a complete message in journals we need 3 fields: `MESSAGE`, `PRIORITY` and `SYSLOG_IDENTIFIER`. We have already added `MESSAGE` by renaming `NGINX_REQUEST`. We can also inject a `SYSLOG_IDENTIFIER` and `PRIORITY`.
+
+Ideally, we would want the 5xx errors to be red in our `journalctl` output and the dashboard. To achieve that we need to set the `PRIORITY` field to the right log level. Log priorities are numeric and follow the `syslog` priorities. Checking `/usr/include/sys/syslog.h` we can see these:
+
+```c
+#define LOG_EMERG 0 /* system is unusable */
+#define LOG_ALERT 1 /* action must be taken immediately */
+#define LOG_CRIT 2 /* critical conditions */
+#define LOG_ERR 3 /* error conditions */
+#define LOG_WARNING 4 /* warning conditions */
+#define LOG_NOTICE 5 /* normal but significant condition */
+#define LOG_INFO 6 /* informational */
+#define LOG_DEBUG 7 /* debug-level messages */
+```
+
+Avoid setting priority to 0 (`LOG_EMERG`), because these will be on your terminal (the journal uses `wall` to let you know of such events). A good priority for errors is 3 (red), or 4 (yellow).
+
+To set the PRIORITY field in the output, we can use `NGINX_STATUS`. We will do this in 2 steps: a) inject the priority field as a copy is `NGINX_STATUS` and then b) use a pattern on its value to rewrite it to the priority level we want.
+
+First, let's inject `SYSLOG_IDENTIFIER` and `PRIORITY`:
+
+```yaml
+pattern: |
+ (?x) # Enable PCRE2 extended mode
+ ^
+ (?<remote_addr>[^ ]+) \s - \s
+ (?<remote_user>[^ ]+) \s
+ \[
+ (?<time_local>[^\]]+)
+ \]
+ \s+ "
+ (?<request>
+ (?<request_method>[A-Z]+) \s+
+ (?<request_uri>[^ ]+) \s+
+ (?<server_protocol>[^"]+)
+ )
+ " \s+
+ (?<status>\d+) \s+
+ (?<body_bytes_sent>\d+) \s+
+ "(?<http_referer>[^"]*)" \s+
+ "(?<http_user_agent>[^"]*)"
+
+prefix: 'NGINX_'
+
+rename:
+ - new_key: MESSAGE
+ old_key: NGINX_REQUEST
+
+inject: # <<< we added this
+ - key: PRIORITY # <<< we added this
+ value: '${NGINX_STATUS}' # <<< we added this
+
+ - key: SYSLOG_IDENTIFIER # <<< we added this
+ value: 'nginx-log' # <<< we added this
+```
+
+Let's see what this does:
+
+```bash
+# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
+MESSAGE=GET /index.html HTTP/1.1
+NGINX_BODY_BYTES_SENT=4172
+NGINX_HTTP_REFERER=-
+NGINX_HTTP_USER_AGENT=Go-http-client/1.1
+NGINX_REMOTE_ADDR=1.2.3.4
+NGINX_REMOTE_USER=-
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_URI=/index.html
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_STATUS=200
+NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
+PRIORITY=200 # <<< PRIORITY added
+SYSLOG_IDENTIFIER=nginx-log # <<< SYSLOG_IDENTIFIER added
+
+```
+
+### Rewriting field values
+
+Now we need to rewrite `PRIORITY` to the right syslog level based on its value (`NGINX_STATUS`). We will assign the priority 6 (info) when the status is 1xx, 2xx, 3xx, priority 5 (notice) when status is 4xx, priority 3 (error) when status is 5xx and anything else will go to priority 4 (warning). Let's do it:
+
+```yaml
+pattern: |
+ (?x) # Enable PCRE2 extended mode
+ ^
+ (?<remote_addr>[^ ]+) \s - \s
+ (?<remote_user>[^ ]+) \s
+ \[
+ (?<time_local>[^\]]+)
+ \]
+ \s+ "
+ (?<request>
+ (?<request_method>[A-Z]+) \s+
+ (?<request_uri>[^ ]+) \s+
+ (?<server_protocol>[^"]+)
+ )
+ " \s+
+ (?<status>\d+) \s+
+ (?<body_bytes_sent>\d+) \s+
+ "(?<http_referer>[^"]*)" \s+
+ "(?<http_user_agent>[^"]*)"
+
+prefix: 'NGINX_'
+
+rename:
+ - new_key: MESSAGE
+ old_key: NGINX_REQUEST
+
+inject:
+ - key: PRIORITY
+ value: '${NGINX_STATUS}'
+
+rewrite: # <<< we added this
+ - key: PRIORITY # <<< we added this
+ match: '^[123]' # <<< we added this
+ value: 6 # <<< we added this
+
+ - key: PRIORITY # <<< we added this
+ match: '^4' # <<< we added this
+ value: 5 # <<< we added this
+
+ - key: PRIORITY # <<< we added this
+ match: '^5' # <<< we added this
+ value: 3 # <<< we added this
+
+ - key: PRIORITY # <<< we added this
+ match: '.*' # <<< we added this
+ value: 4 # <<< we added this
+```
+
+Rewrite rules are processed in order and the first matching a field, stops by default processing for this field. This is why the last rule, that matches everything does not always change the priority to 4.
+
+Let's test it:
+
+```bash
+# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
+MESSAGE=GET /index.html HTTP/1.1
+NGINX_BODY_BYTES_SENT=4172
+NGINX_HTTP_REFERER=-
+NGINX_HTTP_USER_AGENT=Go-http-client/1.1
+NGINX_REMOTE_ADDR=1.2.3.4
+NGINX_REMOTE_USER=-
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_URI=/index.html
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_STATUS=200
+NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
+PRIORITY=6 # <<< PRIORITY rewritten here
+SYSLOG_IDENTIFIER=nginx-log
+
+```
+
+Rewrite rules are powerful. You can have named groups in them, like in the main pattern, to extract sub-fields from them, which you can then use in variable substitution. You can use rewrite rules to anonymize the URLs, e.g to remove customer IDs or transaction details from them.
+
+### Sending logs to systemd-journal
+
+Now the message is ready to be sent to a systemd-journal. For this we use `systemd-cat-native`. This command can send such messages to a journal running on the localhost, a local journal namespace, or a `systemd-journal-remote` running on another server. By just appending `| systemd-cat-native` to the command, the message will be sent to the local journal.
+
+
+```bash
+# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml | systemd-cat-native
+# no output
+
+# let's find the message
+# journalctl -r -o verbose SYSLOG_IDENTIFIER=nginx-log
+Wed 2023-12-06 13:23:07.083299 EET [s=5290f0133f25407aaa1e2c451c0e4756;i=57194;b=0dfa96ecc2094cecaa8ec0efcb93b865;m=b133308867;t=60bd59346a289;x=5c1bdacf2b9c4bbd]
+ PRIORITY=6
+ _UID=0
+ _GID=0
+ _CAP_EFFECTIVE=1ffffffffff
+ _SELINUX_CONTEXT=unconfined
+ _BOOT_ID=0dfa96ecc2094cecaa8ec0efcb93b865
+ _MACHINE_ID=355c8eca894d462bbe4c9422caf7a8bb
+ _HOSTNAME=lab-logtest-src
+ _RUNTIME_SCOPE=system
+ _TRANSPORT=journal
+ MESSAGE=GET /index.html HTTP/1.1
+ NGINX_BODY_BYTES_SENT=4172
+ NGINX_HTTP_REFERER=-
+ NGINX_HTTP_USER_AGENT=Go-http-client/1.1
+ NGINX_REMOTE_ADDR=1.2.3.4
+ NGINX_REMOTE_USER=-
+ NGINX_REQUEST_METHOD=GET
+ NGINX_REQUEST_URI=/index.html
+ NGINX_SERVER_PROTOCOL=HTTP/1.1
+ NGINX_STATUS=200
+ NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
+ SYSLOG_IDENTIFIER=nginx-log
+ _PID=114343
+ _COMM=systemd-cat-nat
+ _AUDIT_SESSION=253
+ _AUDIT_LOGINUID=1000
+ _SYSTEMD_CGROUP=/user.slice/user-1000.slice/session-253.scope
+ _SYSTEMD_SESSION=253
+ _SYSTEMD_OWNER_UID=1000
+ _SYSTEMD_UNIT=session-253.scope
+ _SYSTEMD_SLICE=user-1000.slice
+ _SYSTEMD_USER_SLICE=-.slice
+ _SYSTEMD_INVOCATION_ID=c59e33ead8c24880b027e317b89f9f76
+ _SOURCE_REALTIME_TIMESTAMP=1701861787083299
+
+```
+
+So, the log line, with all its fields parsed, ended up in systemd-journal. Now we can send all the nginx logs to systemd-journal like this:
+
+```bash
+tail -F /var/log/nginx/access.log |\
+ log2journal -f nginx.yaml |\
+ systemd-cat-native
+```
+
+## Best practices
+
+**Create a systemd service unit**: Add the above commands to a systemd unit file. When you run it in a systemd unit file you will be able to start/stop it and also see its status. Furthermore you can use the `LogNamespace=` directive of systemd service units to isolate your nginx logs from the logs of the rest of the system. Here is how to do it:
+
+Create the file `/etc/systemd/system/nginx-logs.service` (change `/path/to/nginx.yaml` to the right path):
+
+```
+[Unit]
+Description=NGINX Log to Systemd Journal
+After=network.target
+
+[Service]
+ExecStart=/bin/sh -c 'tail -F /var/log/nginx/access.log | log2journal -f /path/to/nginx.yaml' | systemd-cat-native
+LogNamespace=nginx-logs
+Restart=always
+RestartSec=3
+
+[Install]
+WantedBy=multi-user.target
+```
+
+Reload systemd to grab this file:
+
+```bash
+sudo systemctl daemon-reload
+```
+
+Enable and start the service:
+
+```bash
+sudo systemctl enable nginx-logs.service
+sudo systemctl start nginx-logs.service
+```
+
+To see the logs of the namespace, use:
+
+```bash
+journalctl -f --namespace=nginx-logs
+```
+
+Netdata will automatically pick the new namespace and present it at the list of sources of the dashboard.
+
+You can also instruct `systemd-cat-native` to log to a remote system, sending the logs to a `systemd-journal-remote` instance running on another server. Check [the manual of systemd-cat-native](https://github.com/netdata/netdata/blob/master/libnetdata/log/systemd-cat-native.md).
+
+
+## Performance
+
+`log2journal` and `systemd-cat-native` have been designed to process hundreds of thousands of log lines per second. They both utilize high performance indexing hashtables to speed up lookups, and queues that dynamically adapt to the number of log lines offered, offering a smooth and fast experience under all conditions.
+
+In our tests, the combined CPU utilization of `log2journal` and `systemd-cat-native` versus `promtail` with similar configuration is 1 to 5. So, `log2journal` and `systemd-cat-native` combined, are 5 times faster than `promtail`.
+
+### PCRE2 patterns
+
+The key characteristic that can influence the performance of a logs processing pipeline using these tools, is the quality of the PCRE2 patterns used. Poorly created PCRE2 patterns can make processing significantly slower, or CPU consuming.
+
+Especially the pattern `.*` seems to have the biggest impact on CPU consumption, especially when multiple `.*` are on the same pattern.
+
+Usually we use `.*` to indicate that we need to match everything up to a character, e.g. `.* ` to match up to a space. By replacing it with `[^ ]+` (meaning: match at least a character up to a space), the regular expression engine can be a lot more efficient, reducing the overall CPU utilization significantly.
+
+### Performance of systemd journals
+
+The ingestion pipeline of logs, from `tail` to `systemd-journald` or `systemd-journal-remote` is very efficient in all aspects. CPU utilization is better than any other system we tested and RAM usage is independent of the number of fields indexed, making systemd-journal one of the most efficient log management engines for ingesting high volumes of structured logs.
+
+High fields cardinality does not have a noticable impact on systemd-journal. The amount of fields indexed and the amount of unique values per field, have a linear and predictable result in the resource utilization of `systemd-journald` and `systemd-journal-remote`. This is unlike other logs management solutions, like Loki, that their RAM requirements grow exponentially as the cardinality increases, making it impractical for them to index the amount of information systemd journals can index.
+
+However, the number of fields added to journals influences the overall disk footprint. Less fields means more log entries per journal file, smaller overall disk footprint and faster queries.
+
+systemd-journal files are primarily designed for security and reliability. This comes at the cost of disk footprint. The internal structure of journal files is such that in case of corruption, minimum data loss will incur. To achieve such a unique characteristic, certain data within the files need to be aligned at predefined boundaries, so that in case there is a corruption, non-corrupted parts of the journal file can be recovered.
+
+Despite the fact that systemd-journald employees several techniques to optimize disk footprint, like deduplication of log entries, shared indexes for fields and their values, compression of long log entries, etc. the disk footprint of journal files is generally 10x more compared to other monitoring solutions, like Loki.
+
+This can be improved by storing journal files in a compressed filesystem. In our tests, a compressed filesystem can save up to 75% of the space required by journal files. The journal files will still be bigger than the overall disk footprint of other solutions, but the flexibility (index any number of fields), reliability (minimal potential data loss) and security (tampering protection and sealing) features of systemd-journal justify the difference.
+
+When using versions of systemd prior to 254 and you are centralizing logs to a remote system, `systemd-journal-remote` creates very small files (32MB). This results in increased duplication of information across the files, increasing the overall disk footprint. systemd versions 254+, added options to `systemd-journal-remote` to control the max size per file. This can significantly reduce the duplication of information.
+
+Another limitation of the `systemd-journald` ecosystem is the uncompressed transmission of logs across systems. `systemd-journal-remote` up to version 254 that we tested, accepts encrypted, but uncompressed data. This means that when centralizing logs to a logs server, the bandwidth required will be increased compared to other log management solution.
+
+## Security Considerations
+
+`log2journal` and `systemd-cat-native` are used to convert log files to structured logs in the systemd-journald ecosystem.
+
+Systemd-journal is a logs management solution designed primarily for security and reliability. When configured properly, it can reliably and securely store your logs, ensuring they will available and unchanged for as long as you need them.
+
+When sending logs to a remote system, `systemd-cat-native` can be configured the same way `systemd-journal-upload` is configured, using HTTPS and private keys to encrypt and secure their transmission over the network.
+
+When dealing with sensitive logs, organizations usually follow 2 strategies:
+
+1. Anonymize the logs before storing them, so that the stored logs do not have any sensitive information.
+2. Store the logs in full, including sensitive information, and carefully control who and how has access to them.
+
+Netdata can help in both cases.
+
+If you want to anonymize the logs before storing them, use rewriting rules at the `log2journal` phase to remove sensitive information from them. This process usually means matching the sensitive part and replacing with `XXX` or `CUSTOMER_ID`, or `CREDIT_CARD_NUMBER`, so that the resulting log entries stored in journal files will not include any such sensitive information.
+
+If on other hand your organization prefers to maintain the full logs and control who and how has access on them, use Netdata Cloud to assign roles to your team members and control which roles can access the journal logs in your environment.
+
+## `log2journal` options
+
+```
+
+Netdata log2journal v1.43.0-341-gdac4df856
+
+Convert logs to systemd Journal Export Format.
+
+ - JSON logs: extracts all JSON fields.
+ - logfmt logs: extracts all logfmt fields.
+ - free-form logs: uses PCRE2 patterns to extracts fields.
+
+Usage: ./log2journal [OPTIONS] PATTERN|json
+
+Options:
+
+ --file /path/to/file.yaml or -f /path/to/file.yaml
+ Read yaml configuration file for instructions.
+
+ --config CONFIG_NAME or -c CONFIG_NAME
+ Run with the internal YAML configuration named CONFIG_NAME.
+ Available internal YAML configs:
+
+ nginx-combined nginx-json default
+
+--------------------------------------------------------------------------------
+ INPUT PROCESSING
+
+ PATTERN
+ PATTERN should be a valid PCRE2 regular expression.
+ RE2 regular expressions (like the ones usually used in Go applications),
+ are usually valid PCRE2 patterns too.
+ Sub-expressions without named groups are evaluated, but their matches are
+ not added to the output.
+
+ - JSON mode
+ JSON mode is enabled when the pattern is set to: json
+ Field names are extracted from the JSON logs and are converted to the
+ format expected by Journal Export Format (all caps, only _ is allowed).
+
+ - logfmt mode
+ logfmt mode is enabled when the pattern is set to: logfmt
+ Field names are extracted from the logfmt logs and are converted to the
+ format expected by Journal Export Format (all caps, only _ is allowed).
+
+ All keys extracted from the input, are transliterated to match Journal
+ semantics (capital A-Z, digits 0-9, underscore).
+
+ In a YAML file:
+ ```yaml
+ pattern: 'PCRE2 pattern | json | logfmt'
+ ```
+
+--------------------------------------------------------------------------------
+ GLOBALS
+
+ --prefix PREFIX
+ Prefix all fields with PREFIX. The PREFIX is added before any other
+ processing, so that the extracted keys have to be matched with the PREFIX in
+ them. PREFIX is NOT transliterated and it is assumed to be systemd-journal
+ friendly.
+
+ In a YAML file:
+ ```yaml
+ prefix: 'PREFIX_' # prepend all keys with this prefix.
+ ```
+
+ --filename-key KEY
+ Add a field with KEY as the key and the current filename as value.
+ Automatically detects filenames when piped after 'tail -F',
+ and tail matches multiple filenames.
+ To inject the filename when tailing a single file, use --inject.
+
+ In a YAML file:
+ ```yaml
+ filename:
+ key: KEY
+ ```
+
+--------------------------------------------------------------------------------
+ RENAMING OF KEYS
+
+ --rename NEW=OLD
+ Rename fields. OLD has been transliterated and PREFIX has been added.
+ NEW is assumed to be systemd journal friendly.
+
+ Up to 512 renaming rules are allowed.
+
+ In a YAML file:
+ ```yaml
+ rename:
+ - new_key: KEY1
+ old_key: KEY2 # transliterated with PREFIX added
+ - new_key: KEY3
+ old_key: KEY4 # transliterated with PREFIX added
+ # add as many as required
+ ```
+
+--------------------------------------------------------------------------------
+ INJECTING NEW KEYS
+
+ --inject KEY=VALUE
+ Inject constant fields to the output (both matched and unmatched logs).
+ --inject entries are added to unmatched lines too, when their key is
+ not used in --inject-unmatched (--inject-unmatched override --inject).
+ VALUE can use variable like ${OTHER_KEY} to be replaced with the values
+ of other keys available.
+
+ Up to 512 fields can be injected.
+
+ In a YAML file:
+ ```yaml
+ inject:
+ - key: KEY1
+ value: 'VALUE1'
+ - key: KEY2
+ value: '${KEY3}${KEY4}' # gets the values of KEY3 and KEY4
+ # add as many as required
+ ```
+
+--------------------------------------------------------------------------------
+ REWRITING KEY VALUES
+
+ --rewrite KEY=/MATCH/REPLACE[/OPTIONS]
+ Apply a rewrite rule to the values of a specific key.
+ The first character after KEY= is the separator, which should also
+ be used between the MATCH, REPLACE and OPTIONS.
+
+ OPTIONS can be a comma separated list of `non-empty`, `dont-stop` and
+ `inject`.
+
+ When `non-empty` is given, MATCH is expected to be a variable
+ substitution using `${KEY1}${KEY2}`. Once the substitution is completed
+ the rule is matching the KEY only if the result is not empty.
+ When `non-empty` is not set, the MATCH string is expected to be a PCRE2
+ regular expression to be checked against the KEY value. This PCRE2
+ pattern may include named groups to extract parts of the KEY's value.
+
+ REPLACE supports variable substitution like `${variable}` against MATCH
+ named groups (when MATCH is a PCRE2 pattern) and `${KEY}` against the
+ keys defined so far.
+
+ Example:
+ --rewrite DATE=/^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})$/
+ ${day}/${month}/${year}
+ The above will rewrite dates in the format YYYY-MM-DD to DD/MM/YYYY.
+
+ Only one rewrite rule is applied per key; the sequence of rewrites for a
+ given key, stops once a rule matches it. This allows providing a sequence
+ of independent rewriting rules for the same key, matching the different
+ values the key may get, and also provide a catch-all rewrite rule at the
+ end, for setting the key value if no other rule matched it. The rewrite
+ rule can allow processing more rewrite rules when OPTIONS includes
+ the keyword 'dont-stop'.
+
+ Up to 512 rewriting rules are allowed.
+
+ In a YAML file:
+ ```yaml
+ rewrite:
+ # the order if these rules in important - processed top to bottom
+ - key: KEY1
+ match: 'PCRE2 PATTERN WITH NAMED GROUPS'
+ value: 'all match fields and input keys as ${VARIABLE}'
+ inject: BOOLEAN # yes = inject the field, don't just rewrite it
+ stop: BOOLEAN # no = continue processing, don't stop if matched
+ - key: KEY2
+ non_empty: '${KEY3}${KEY4}' # match only if this evaluates to non empty
+ value: 'all input keys as ${VARIABLE}'
+ inject: BOOLEAN # yes = inject the field, don't just rewrite it
+ stop: BOOLEAN # no = continue processing, don't stop if matched
+ # add as many rewrites as required
+ ```
+
+ By default rewrite rules are applied only on fields already defined.
+ This allows shipping YAML files that include more rewrites than are
+ required for a specific input file.
+ Rewrite rules however allow injecting new fields when OPTIONS include
+ the keyword `inject` or in YAML `inject: yes` is given.
+
+ MATCH on the command line can be empty to define an unconditional rule.
+ Similarly, `match` and `non_empty` can be omitted in the YAML file.
+--------------------------------------------------------------------------------
+ UNMATCHED LINES
+
+ --unmatched-key KEY
+ Include unmatched log entries in the output with KEY as the field name.
+ Use this to include unmatched entries to the output stream.
+ Usually it should be set to --unmatched-key=MESSAGE so that the
+ unmatched entry will appear as the log message in the journals.
+ Use --inject-unmatched to inject additional fields to unmatched lines.
+
+ In a YAML file:
+ ```yaml
+ unmatched:
+ key: MESSAGE # inject the error log as MESSAGE
+ ```
+
+ --inject-unmatched LINE
+ Inject lines into the output for each unmatched log entry.
+ Usually, --inject-unmatched=PRIORITY=3 is needed to mark the unmatched
+ lines as errors, so that they can easily be spotted in the journals.
+
+ Up to 512 such lines can be injected.
+
+ In a YAML file:
+ ```yaml
+ unmatched:
+ key: MESSAGE # inject the error log as MESSAGE
+ inject::
+ - key: KEY1
+ value: 'VALUE1'
+ # add as many constants as required
+ ```
+
+--------------------------------------------------------------------------------
+ FILTERING
+
+ --include PATTERN
+ Include only keys matching the PCRE2 PATTERN.
+ Useful when parsing JSON of logfmt logs, to include only the keys given.
+ The keys are matched after the PREFIX has been added to them.
+
+ --exclude PATTERN
+ Exclude the keys matching the PCRE2 PATTERN.
+ Useful when parsing JSON of logfmt logs, to exclude some of the keys given.
+ The keys are matched after the PREFIX has been added to them.
+
+ When both include and exclude patterns are set and both match a key,
+ exclude wins and the key will not be added, like a pipeline, we first
+ include it and then exclude it.
+
+ In a YAML file:
+ ```yaml
+ filter:
+ include: 'PCRE2 PATTERN MATCHING KEY NAMES TO INCLUDE'
+ exclude: 'PCRE2 PATTERN MATCHING KEY NAMES TO EXCLUDE'
+ ```
+
+--------------------------------------------------------------------------------
+ OTHER
+
+ -h, or --help
+ Display this help and exit.
+
+ --show-config
+ Show the configuration in YAML format before starting the job.
+ This is also an easy way to convert command line parameters to yaml.
+
+The program accepts all parameters as both --option=value and --option value.
+
+The maximum log line length accepted is 1048576 characters.
+
+PIPELINE AND SEQUENCE OF PROCESSING
+
+This is a simple diagram of the pipeline taking place:
+
+ +---------------------------------------------------+
+ | INPUT |
+ | read one log line at a time |
+ +---------------------------------------------------+
+ v v v v v v
+ +---------------------------------------------------+
+ | EXTRACT FIELDS AND VALUES |
+ | JSON, logfmt, or pattern based |
+ | (apply optional PREFIX - all keys use capitals) |
+ +---------------------------------------------------+
+ v v v v v v
+ +---------------------------------------------------+
+ | RENAME FIELDS |
+ | change the names of the fields |
+ +---------------------------------------------------+
+ v v v v v v
+ +---------------------------------------------------+
+ | INJECT NEW FIELDS |
+ | constants, or other field values as variables |
+ +---------------------------------------------------+
+ v v v v v v
+ +---------------------------------------------------+
+ | REWRITE FIELD VALUES |
+ | pipeline multiple rewriting rules to alter |
+ | the values of the fields |
+ +---------------------------------------------------+
+ v v v v v v
+ +---------------------------------------------------+
+ | FILTER FIELDS |
+ | use include and exclude patterns on the field |
+ | names, to select which fields are sent to journal |
+ +---------------------------------------------------+
+ v v v v v v
+ +---------------------------------------------------+
+ | OUTPUT |
+ | generate Journal Export Format |
+ +---------------------------------------------------+
+
+--------------------------------------------------------------------------------
+JOURNAL FIELDS RULES (enforced by systemd-journald)
+
+ - field names can be up to 64 characters
+ - the only allowed field characters are A-Z, 0-9 and underscore
+ - the first character of fields cannot be a digit
+ - protected journal fields start with underscore:
+ * they are accepted by systemd-journal-remote
+ * they are NOT accepted by a local systemd-journald
+
+ For best results, always include these fields:
+
+ MESSAGE=TEXT
+ The MESSAGE is the body of the log entry.
+ This field is what we usually see in our logs.
+
+ PRIORITY=NUMBER
+ PRIORITY sets the severity of the log entry.
+ 0=emerg, 1=alert, 2=crit, 3=err, 4=warn, 5=notice, 6=info, 7=debug
+ - Emergency events (0) are usually broadcast to all terminals.
+ - Emergency, alert, critical, and error (0-3) are usually colored red.
+ - Warning (4) entries are usually colored yellow.
+ - Notice (5) entries are usually bold or have a brighter white color.
+ - Info (6) entries are the default.
+ - Debug (7) entries are usually grayed or dimmed.
+
+ SYSLOG_IDENTIFIER=NAME
+ SYSLOG_IDENTIFIER sets the name of application.
+ Use something descriptive, like: SYSLOG_IDENTIFIER=nginx-logs
+
+You can find the most common fields at 'man systemd.journal-fields'.
+
+```
+
+`log2journal` supports YAML configuration files, like the ones found [in this directory](https://github.com/netdata/netdata/tree/master/collectors/log2journal/log2journal.d).
+
+## `systemd-cat-native` options
+
+Read [the manual of systemd-cat-native](https://github.com/netdata/netdata/blob/master/libnetdata/log/systemd-cat-native.md).