summaryrefslogtreecommitdiffstats
path: root/collectors/log2journal
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-19 02:57:58 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-19 02:57:58 +0000
commitbe1c7e50e1e8809ea56f2c9d472eccd8ffd73a97 (patch)
tree9754ff1ca740f6346cf8483ec915d4054bc5da2d /collectors/log2journal
parentInitial commit. (diff)
downloadnetdata-be1c7e50e1e8809ea56f2c9d472eccd8ffd73a97.tar.xz
netdata-be1c7e50e1e8809ea56f2c9d472eccd8ffd73a97.zip
Adding upstream version 1.44.3.upstream/1.44.3upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to '')
-rw-r--r--collectors/log2journal/Makefile.am17
-rw-r--r--collectors/log2journal/README.md912
-rw-r--r--collectors/log2journal/log2journal-help.c377
-rw-r--r--collectors/log2journal/log2journal-inject.c49
-rw-r--r--collectors/log2journal/log2journal-json.c630
-rw-r--r--collectors/log2journal/log2journal-logfmt.c226
-rw-r--r--collectors/log2journal/log2journal-params.c404
-rw-r--r--collectors/log2journal/log2journal-pattern.c54
-rw-r--r--collectors/log2journal/log2journal-pcre2.c139
-rw-r--r--collectors/log2journal/log2journal-rename.c21
-rw-r--r--collectors/log2journal/log2journal-replace.c111
-rw-r--r--collectors/log2journal/log2journal-rewrite.c51
-rw-r--r--collectors/log2journal/log2journal-yaml.c964
-rw-r--r--collectors/log2journal/log2journal.c569
-rw-r--r--collectors/log2journal/log2journal.d/default.yaml15
-rw-r--r--collectors/log2journal/log2journal.d/nginx-combined.yaml91
-rw-r--r--collectors/log2journal/log2journal.d/nginx-json.yaml164
-rw-r--r--collectors/log2journal/log2journal.h501
-rw-r--r--collectors/log2journal/tests.d/default.output20
-rw-r--r--collectors/log2journal/tests.d/full.output77
-rw-r--r--collectors/log2journal/tests.d/full.yaml76
-rw-r--r--collectors/log2journal/tests.d/json-exclude.output153
-rw-r--r--collectors/log2journal/tests.d/json-include.output54
-rw-r--r--collectors/log2journal/tests.d/json.log3
-rw-r--r--collectors/log2journal/tests.d/json.output294
-rw-r--r--collectors/log2journal/tests.d/logfmt.log5
-rw-r--r--collectors/log2journal/tests.d/logfmt.output37
-rw-r--r--collectors/log2journal/tests.d/logfmt.yaml34
-rw-r--r--collectors/log2journal/tests.d/nginx-combined.log14
-rw-r--r--collectors/log2journal/tests.d/nginx-combined.output210
-rw-r--r--collectors/log2journal/tests.d/nginx-json.log9
-rw-r--r--collectors/log2journal/tests.d/nginx-json.output296
-rwxr-xr-xcollectors/log2journal/tests.sh148
33 files changed, 6725 insertions, 0 deletions
diff --git a/collectors/log2journal/Makefile.am b/collectors/log2journal/Makefile.am
new file mode 100644
index 00000000..b13d2160
--- /dev/null
+++ b/collectors/log2journal/Makefile.am
@@ -0,0 +1,17 @@
+# SPDX-License-Identifier: GPL-3.0-or-later
+
+AUTOMAKE_OPTIONS = subdir-objects
+MAINTAINERCLEANFILES = $(srcdir)/Makefile.in
+
+dist_noinst_DATA = \
+ tests.sh \
+ README.md \
+ tests.d/* \
+ $(NULL)
+
+log2journalconfigdir=$(libconfigdir)/log2journal.d
+dist_log2journalconfig_DATA = \
+ log2journal.d/nginx-combined.yaml \
+ log2journal.d/nginx-json.yaml \
+ log2journal.d/default.yaml \
+ $(NULL)
diff --git a/collectors/log2journal/README.md b/collectors/log2journal/README.md
new file mode 100644
index 00000000..16ccc033
--- /dev/null
+++ b/collectors/log2journal/README.md
@@ -0,0 +1,912 @@
+
+# log2journal
+
+`log2journal` and `systemd-cat-native` can be used to convert a structured log file, such as the ones generated by web servers, into `systemd-journal` entries.
+
+By combining these tools you can create advanced log processing pipelines sending any kind of structured text logs to systemd-journald. This is a simple, but powerful and efficient way to handle log processing.
+
+The process involves the usual piping of shell commands, to get and process the log files in realtime.
+
+The result is like this: nginx logs into systemd-journal:
+
+![image](https://github.com/netdata/netdata/assets/2662304/16b471ff-c5a1-4fcc-bcd5-83551e089f6c)
+
+
+The overall process looks like this:
+
+```bash
+tail -F /var/log/nginx/*.log |\ # outputs log lines
+ log2journal 'PATTERN' |\ # outputs Journal Export Format
+ systemd-cat-native # send to local/remote journald
+```
+
+These are the steps:
+
+1. `tail -F /var/log/nginx/*.log`<br/>this command will tail all `*.log` files in `/var/log/nginx/`. We use `-F` instead of `-f` to ensure that files will still be tailed after log rotation.
+2. `log2joural` is a Netdata program. It reads log entries and extracts fields, according to the PCRE2 pattern it accepts. It can also apply some basic operations on the fields, like injecting new fields or duplicating existing ones or rewriting their values. The output of `log2journal` is in Systemd Journal Export Format, and it looks like this:
+ ```bash
+ KEY1=VALUE1 # << start of the first log line
+ KEY2=VALUE2
+ # << log lines separator
+ KEY1=VALUE1 # << start of the second log line
+ KEY2=VALUE2
+ ```
+3. `systemd-cat-native` is a Netdata program. I can send the logs to a local `systemd-journald` (journal namespaces supported), or to a remote `systemd-journal-remote`.
+
+
+## Processing pipeline
+
+The sequence of processing in Netdata's `log2journal` is designed to methodically transform and prepare log data for export in the systemd Journal Export Format. This transformation occurs through a pipeline of stages, each with a specific role in processing the log entries. Here's a description of each stage in the sequence:
+
+1. **Input**<br/>
+ The tool reads one log line at a time from the input source. It supports different input formats such as JSON, logfmt, and free-form logs defined by PCRE2 patterns.
+
+2. **Extract Fields and Values**<br/>
+ Based on the input format (JSON, logfmt, or custom pattern), it extracts fields and their values from each log line. In the case of JSON and logfmt, it automatically extracts all fields. For custom patterns, it uses PCRE2 regular expressions, and fields are extracted based on sub-expressions defined in the pattern.
+
+3. **Transliteration**<br/>
+ Extracted fields are transliterated to the limited character set accepted by systemd-journal: capitals A-Z, digits 0-9, underscores.
+
+4. **Apply Optional Prefix**<br/>
+ If a prefix is specified, it is added to all keys. This happens before any other processing so that all subsequent matches and manipulations take the prefix into account.
+
+5. **Rename Fields**<br/>
+ Renames fields as specified in the configuration. This is used to change the names of the fields to match desired or required naming conventions.
+
+6. **Inject New Fields**<br/>
+ New fields are injected into the log data. This can include constants or values derived from other fields, using variable substitution.
+
+7. **Rewrite Field Values**<br/>
+ Applies rewriting rules to alter the values of the fields. This can involve complex transformations, including regular expressions and variable substitutions. The rewrite rules can also inject new fields into the data.
+
+8. **Filter Fields**<br/>
+ Fields are filtered based on include and exclude patterns. This stage selects which fields are to be sent to the journal, allowing for selective logging.
+
+9. **Output**<br/>
+ Finally, the processed log data is output in the Journal Export Format. This format is compatible with systemd's journaling system and can be sent to local or remote systemd journal systems, by piping the output of `log2journal` to `systemd-cat-native`.
+
+This pipeline ensures a flexible and comprehensive approach to log processing, allowing for a wide range of modifications and customizations to fit various logging requirements. Each stage builds upon the previous one, enabling complex log transformations and enrichments before the data is exported to the systemd journal.
+
+## Real-life example
+
+We have an nginx server logging in this standard combined log format:
+
+```bash
+ log_format combined '$remote_addr - $remote_user [$time_local] '
+ '"$request" $status $body_bytes_sent '
+ '"$http_referer" "$http_user_agent"';
+```
+
+### Extracting fields with a pattern
+
+First, let's find the right pattern for `log2journal`. We ask ChatGPT:
+
+```
+My nginx log uses this log format:
+
+log_format access '$remote_addr - $remote_user [$time_local] '
+ '"$request" $status $body_bytes_sent '
+ '"$http_referer" "$http_user_agent"';
+
+I want to use `log2joural` to convert this log for systemd-journal.
+`log2journal` accepts a PCRE2 regular expression, using the named groups
+in the pattern as the journal fields to extract from the logs.
+
+Please give me the PCRE2 pattern to extract all the fields from my nginx
+log files.
+```
+
+ChatGPT replies with this:
+
+```regexp
+ (?x) # Enable PCRE2 extended mode
+ ^
+ (?<remote_addr>[^ ]+) \s - \s
+ (?<remote_user>[^ ]+) \s
+ \[
+ (?<time_local>[^\]]+)
+ \]
+ \s+ "
+ (?<request>
+ (?<request_method>[A-Z]+) \s+
+ (?<request_uri>[^ ]+) \s+
+ (?<server_protocol>[^"]+)
+ )
+ " \s+
+ (?<status>\d+) \s+
+ (?<body_bytes_sent>\d+) \s+
+ "(?<http_referer>[^"]*)" \s+
+ "(?<http_user_agent>[^"]*)"
+```
+
+Let's see what the above says:
+
+1. `(?x)`: enable PCRE2 extended mode. In this mode spaces and newlines in the pattern are ignored. To match a space you have to use `\s`. This mode allows us to split the pattern is multiple lines and add comments to it.
+1. `^`: match the beginning of the line
+2. `(?<remote_addr[^ ]+)`: match anything up to the first space (`[^ ]+`), and name it `remote_addr`.
+3. `\s`: match a space
+4. `-`: match a hyphen
+5. and so on...
+
+We edit `nginx.yaml` and add it, like this:
+
+```yaml
+pattern: |
+ (?x) # Enable PCRE2 extended mode
+ ^
+ (?<remote_addr>[^ ]+) \s - \s
+ (?<remote_user>[^ ]+) \s
+ \[
+ (?<time_local>[^\]]+)
+ \]
+ \s+ "
+ (?<request>
+ (?<request_method>[A-Z]+) \s+
+ (?<request_uri>[^ ]+) \s+
+ (?<server_protocol>[^"]+)
+ )
+ " \s+
+ (?<status>\d+) \s+
+ (?<body_bytes_sent>\d+) \s+
+ "(?<http_referer>[^"]*)" \s+
+ "(?<http_user_agent>[^"]*)"
+```
+
+Let's test it with a sample line (instead of `tail`):
+
+```bash
+# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
+BODY_BYTES_SENT=4172
+HTTP_REFERER=-
+HTTP_USER_AGENT=Go-http-client/1.1
+REMOTE_ADDR=1.2.3.4
+REMOTE_USER=-
+REQUEST=GET /index.html HTTP/1.1
+REQUEST_METHOD=GET
+REQUEST_URI=/index.html
+SERVER_PROTOCOL=HTTP/1.1
+STATUS=200
+TIME_LOCAL=19/Nov/2023:00:24:43 +0000
+
+```
+
+As you can see, it extracted all the fields and made them capitals, as systemd-journal expects them.
+
+### Prefixing field names
+
+To make sure the fields are unique for nginx and do not interfere with other applications, we should prefix them with `NGINX_`:
+
+```yaml
+pattern: |
+ (?x) # Enable PCRE2 extended mode
+ ^
+ (?<remote_addr>[^ ]+) \s - \s
+ (?<remote_user>[^ ]+) \s
+ \[
+ (?<time_local>[^\]]+)
+ \]
+ \s+ "
+ (?<request>
+ (?<request_method>[A-Z]+) \s+
+ (?<request_uri>[^ ]+) \s+
+ (?<server_protocol>[^"]+)
+ )
+ " \s+
+ (?<status>\d+) \s+
+ (?<body_bytes_sent>\d+) \s+
+ "(?<http_referer>[^"]*)" \s+
+ "(?<http_user_agent>[^"]*)"
+
+prefix: 'NGINX_' # <<< we added this
+```
+
+And let's try it:
+
+```bash
+# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
+NGINX_BODY_BYTES_SENT=4172
+NGINX_HTTP_REFERER=-
+NGINX_HTTP_USER_AGENT=Go-http-client/1.1
+NGINX_REMOTE_ADDR=1.2.3.4
+NGINX_REMOTE_USER=-
+NGINX_REQUEST=GET /index.html HTTP/1.1
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_URI=/index.html
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_STATUS=200
+NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
+
+```
+
+### Renaming fields
+
+Now, all fields start with `NGINX_` but we want `NGINX_REQUEST` to be the `MESSAGE` of the log line, as we will see it by default in `journalctl` and the Netdata dashboard. Let's rename it:
+
+```yaml
+pattern: |
+ (?x) # Enable PCRE2 extended mode
+ ^
+ (?<remote_addr>[^ ]+) \s - \s
+ (?<remote_user>[^ ]+) \s
+ \[
+ (?<time_local>[^\]]+)
+ \]
+ \s+ "
+ (?<request>
+ (?<request_method>[A-Z]+) \s+
+ (?<request_uri>[^ ]+) \s+
+ (?<server_protocol>[^"]+)
+ )
+ " \s+
+ (?<status>\d+) \s+
+ (?<body_bytes_sent>\d+) \s+
+ "(?<http_referer>[^"]*)" \s+
+ "(?<http_user_agent>[^"]*)"
+
+prefix: 'NGINX_'
+
+rename: # <<< we added this
+ - new_key: MESSAGE # <<< we added this
+ old_key: NGINX_REQUEST # <<< we added this
+```
+
+Let's test it:
+
+```bash
+# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
+MESSAGE=GET /index.html HTTP/1.1 # <<< renamed !
+NGINX_BODY_BYTES_SENT=4172
+NGINX_HTTP_REFERER=-
+NGINX_HTTP_USER_AGENT=Go-http-client/1.1
+NGINX_REMOTE_ADDR=1.2.3.4
+NGINX_REMOTE_USER=-
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_URI=/index.html
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_STATUS=200
+NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
+
+```
+
+### Injecting new fields
+
+To have a complete message in journals we need 3 fields: `MESSAGE`, `PRIORITY` and `SYSLOG_IDENTIFIER`. We have already added `MESSAGE` by renaming `NGINX_REQUEST`. We can also inject a `SYSLOG_IDENTIFIER` and `PRIORITY`.
+
+Ideally, we would want the 5xx errors to be red in our `journalctl` output and the dashboard. To achieve that we need to set the `PRIORITY` field to the right log level. Log priorities are numeric and follow the `syslog` priorities. Checking `/usr/include/sys/syslog.h` we can see these:
+
+```c
+#define LOG_EMERG 0 /* system is unusable */
+#define LOG_ALERT 1 /* action must be taken immediately */
+#define LOG_CRIT 2 /* critical conditions */
+#define LOG_ERR 3 /* error conditions */
+#define LOG_WARNING 4 /* warning conditions */
+#define LOG_NOTICE 5 /* normal but significant condition */
+#define LOG_INFO 6 /* informational */
+#define LOG_DEBUG 7 /* debug-level messages */
+```
+
+Avoid setting priority to 0 (`LOG_EMERG`), because these will be on your terminal (the journal uses `wall` to let you know of such events). A good priority for errors is 3 (red), or 4 (yellow).
+
+To set the PRIORITY field in the output, we can use `NGINX_STATUS`. We will do this in 2 steps: a) inject the priority field as a copy is `NGINX_STATUS` and then b) use a pattern on its value to rewrite it to the priority level we want.
+
+First, let's inject `SYSLOG_IDENTIFIER` and `PRIORITY`:
+
+```yaml
+pattern: |
+ (?x) # Enable PCRE2 extended mode
+ ^
+ (?<remote_addr>[^ ]+) \s - \s
+ (?<remote_user>[^ ]+) \s
+ \[
+ (?<time_local>[^\]]+)
+ \]
+ \s+ "
+ (?<request>
+ (?<request_method>[A-Z]+) \s+
+ (?<request_uri>[^ ]+) \s+
+ (?<server_protocol>[^"]+)
+ )
+ " \s+
+ (?<status>\d+) \s+
+ (?<body_bytes_sent>\d+) \s+
+ "(?<http_referer>[^"]*)" \s+
+ "(?<http_user_agent>[^"]*)"
+
+prefix: 'NGINX_'
+
+rename:
+ - new_key: MESSAGE
+ old_key: NGINX_REQUEST
+
+inject: # <<< we added this
+ - key: PRIORITY # <<< we added this
+ value: '${NGINX_STATUS}' # <<< we added this
+
+ - key: SYSLOG_IDENTIFIER # <<< we added this
+ value: 'nginx-log' # <<< we added this
+```
+
+Let's see what this does:
+
+```bash
+# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
+MESSAGE=GET /index.html HTTP/1.1
+NGINX_BODY_BYTES_SENT=4172
+NGINX_HTTP_REFERER=-
+NGINX_HTTP_USER_AGENT=Go-http-client/1.1
+NGINX_REMOTE_ADDR=1.2.3.4
+NGINX_REMOTE_USER=-
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_URI=/index.html
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_STATUS=200
+NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
+PRIORITY=200 # <<< PRIORITY added
+SYSLOG_IDENTIFIER=nginx-log # <<< SYSLOG_IDENTIFIER added
+
+```
+
+### Rewriting field values
+
+Now we need to rewrite `PRIORITY` to the right syslog level based on its value (`NGINX_STATUS`). We will assign the priority 6 (info) when the status is 1xx, 2xx, 3xx, priority 5 (notice) when status is 4xx, priority 3 (error) when status is 5xx and anything else will go to priority 4 (warning). Let's do it:
+
+```yaml
+pattern: |
+ (?x) # Enable PCRE2 extended mode
+ ^
+ (?<remote_addr>[^ ]+) \s - \s
+ (?<remote_user>[^ ]+) \s
+ \[
+ (?<time_local>[^\]]+)
+ \]
+ \s+ "
+ (?<request>
+ (?<request_method>[A-Z]+) \s+
+ (?<request_uri>[^ ]+) \s+
+ (?<server_protocol>[^"]+)
+ )
+ " \s+
+ (?<status>\d+) \s+
+ (?<body_bytes_sent>\d+) \s+
+ "(?<http_referer>[^"]*)" \s+
+ "(?<http_user_agent>[^"]*)"
+
+prefix: 'NGINX_'
+
+rename:
+ - new_key: MESSAGE
+ old_key: NGINX_REQUEST
+
+inject:
+ - key: PRIORITY
+ value: '${NGINX_STATUS}'
+
+rewrite: # <<< we added this
+ - key: PRIORITY # <<< we added this
+ match: '^[123]' # <<< we added this
+ value: 6 # <<< we added this
+
+ - key: PRIORITY # <<< we added this
+ match: '^4' # <<< we added this
+ value: 5 # <<< we added this
+
+ - key: PRIORITY # <<< we added this
+ match: '^5' # <<< we added this
+ value: 3 # <<< we added this
+
+ - key: PRIORITY # <<< we added this
+ match: '.*' # <<< we added this
+ value: 4 # <<< we added this
+```
+
+Rewrite rules are processed in order and the first matching a field, stops by default processing for this field. This is why the last rule, that matches everything does not always change the priority to 4.
+
+Let's test it:
+
+```bash
+# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
+MESSAGE=GET /index.html HTTP/1.1
+NGINX_BODY_BYTES_SENT=4172
+NGINX_HTTP_REFERER=-
+NGINX_HTTP_USER_AGENT=Go-http-client/1.1
+NGINX_REMOTE_ADDR=1.2.3.4
+NGINX_REMOTE_USER=-
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_URI=/index.html
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_STATUS=200
+NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
+PRIORITY=6 # <<< PRIORITY rewritten here
+SYSLOG_IDENTIFIER=nginx-log
+
+```
+
+Rewrite rules are powerful. You can have named groups in them, like in the main pattern, to extract sub-fields from them, which you can then use in variable substitution. You can use rewrite rules to anonymize the URLs, e.g to remove customer IDs or transaction details from them.
+
+### Sending logs to systemd-journal
+
+Now the message is ready to be sent to a systemd-journal. For this we use `systemd-cat-native`. This command can send such messages to a journal running on the localhost, a local journal namespace, or a `systemd-journal-remote` running on another server. By just appending `| systemd-cat-native` to the command, the message will be sent to the local journal.
+
+
+```bash
+# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml | systemd-cat-native
+# no output
+
+# let's find the message
+# journalctl -r -o verbose SYSLOG_IDENTIFIER=nginx-log
+Wed 2023-12-06 13:23:07.083299 EET [s=5290f0133f25407aaa1e2c451c0e4756;i=57194;b=0dfa96ecc2094cecaa8ec0efcb93b865;m=b133308867;t=60bd59346a289;x=5c1bdacf2b9c4bbd]
+ PRIORITY=6
+ _UID=0
+ _GID=0
+ _CAP_EFFECTIVE=1ffffffffff
+ _SELINUX_CONTEXT=unconfined
+ _BOOT_ID=0dfa96ecc2094cecaa8ec0efcb93b865
+ _MACHINE_ID=355c8eca894d462bbe4c9422caf7a8bb
+ _HOSTNAME=lab-logtest-src
+ _RUNTIME_SCOPE=system
+ _TRANSPORT=journal
+ MESSAGE=GET /index.html HTTP/1.1
+ NGINX_BODY_BYTES_SENT=4172
+ NGINX_HTTP_REFERER=-
+ NGINX_HTTP_USER_AGENT=Go-http-client/1.1
+ NGINX_REMOTE_ADDR=1.2.3.4
+ NGINX_REMOTE_USER=-
+ NGINX_REQUEST_METHOD=GET
+ NGINX_REQUEST_URI=/index.html
+ NGINX_SERVER_PROTOCOL=HTTP/1.1
+ NGINX_STATUS=200
+ NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
+ SYSLOG_IDENTIFIER=nginx-log
+ _PID=114343
+ _COMM=systemd-cat-nat
+ _AUDIT_SESSION=253
+ _AUDIT_LOGINUID=1000
+ _SYSTEMD_CGROUP=/user.slice/user-1000.slice/session-253.scope
+ _SYSTEMD_SESSION=253
+ _SYSTEMD_OWNER_UID=1000
+ _SYSTEMD_UNIT=session-253.scope
+ _SYSTEMD_SLICE=user-1000.slice
+ _SYSTEMD_USER_SLICE=-.slice
+ _SYSTEMD_INVOCATION_ID=c59e33ead8c24880b027e317b89f9f76
+ _SOURCE_REALTIME_TIMESTAMP=1701861787083299
+
+```
+
+So, the log line, with all its fields parsed, ended up in systemd-journal. Now we can send all the nginx logs to systemd-journal like this:
+
+```bash
+tail -F /var/log/nginx/access.log |\
+ log2journal -f nginx.yaml |\
+ systemd-cat-native
+```
+
+## Best practices
+
+**Create a systemd service unit**: Add the above commands to a systemd unit file. When you run it in a systemd unit file you will be able to start/stop it and also see its status. Furthermore you can use the `LogNamespace=` directive of systemd service units to isolate your nginx logs from the logs of the rest of the system. Here is how to do it:
+
+Create the file `/etc/systemd/system/nginx-logs.service` (change `/path/to/nginx.yaml` to the right path):
+
+```
+[Unit]
+Description=NGINX Log to Systemd Journal
+After=network.target
+
+[Service]
+ExecStart=/bin/sh -c 'tail -F /var/log/nginx/access.log | log2journal -f /path/to/nginx.yaml' | systemd-cat-native
+LogNamespace=nginx-logs
+Restart=always
+RestartSec=3
+
+[Install]
+WantedBy=multi-user.target
+```
+
+Reload systemd to grab this file:
+
+```bash
+sudo systemctl daemon-reload
+```
+
+Enable and start the service:
+
+```bash
+sudo systemctl enable nginx-logs.service
+sudo systemctl start nginx-logs.service
+```
+
+To see the logs of the namespace, use:
+
+```bash
+journalctl -f --namespace=nginx-logs
+```
+
+Netdata will automatically pick the new namespace and present it at the list of sources of the dashboard.
+
+You can also instruct `systemd-cat-native` to log to a remote system, sending the logs to a `systemd-journal-remote` instance running on another server. Check [the manual of systemd-cat-native](https://github.com/netdata/netdata/blob/master/libnetdata/log/systemd-cat-native.md).
+
+
+## Performance
+
+`log2journal` and `systemd-cat-native` have been designed to process hundreds of thousands of log lines per second. They both utilize high performance indexing hashtables to speed up lookups, and queues that dynamically adapt to the number of log lines offered, offering a smooth and fast experience under all conditions.
+
+In our tests, the combined CPU utilization of `log2journal` and `systemd-cat-native` versus `promtail` with similar configuration is 1 to 5. So, `log2journal` and `systemd-cat-native` combined, are 5 times faster than `promtail`.
+
+### PCRE2 patterns
+
+The key characteristic that can influence the performance of a logs processing pipeline using these tools, is the quality of the PCRE2 patterns used. Poorly created PCRE2 patterns can make processing significantly slower, or CPU consuming.
+
+Especially the pattern `.*` seems to have the biggest impact on CPU consumption, especially when multiple `.*` are on the same pattern.
+
+Usually we use `.*` to indicate that we need to match everything up to a character, e.g. `.* ` to match up to a space. By replacing it with `[^ ]+` (meaning: match at least a character up to a space), the regular expression engine can be a lot more efficient, reducing the overall CPU utilization significantly.
+
+### Performance of systemd journals
+
+The ingestion pipeline of logs, from `tail` to `systemd-journald` or `systemd-journal-remote` is very efficient in all aspects. CPU utilization is better than any other system we tested and RAM usage is independent of the number of fields indexed, making systemd-journal one of the most efficient log management engines for ingesting high volumes of structured logs.
+
+High fields cardinality does not have a noticable impact on systemd-journal. The amount of fields indexed and the amount of unique values per field, have a linear and predictable result in the resource utilization of `systemd-journald` and `systemd-journal-remote`. This is unlike other logs management solutions, like Loki, that their RAM requirements grow exponentially as the cardinality increases, making it impractical for them to index the amount of information systemd journals can index.
+
+However, the number of fields added to journals influences the overall disk footprint. Less fields means more log entries per journal file, smaller overall disk footprint and faster queries.
+
+systemd-journal files are primarily designed for security and reliability. This comes at the cost of disk footprint. The internal structure of journal files is such that in case of corruption, minimum data loss will incur. To achieve such a unique characteristic, certain data within the files need to be aligned at predefined boundaries, so that in case there is a corruption, non-corrupted parts of the journal file can be recovered.
+
+Despite the fact that systemd-journald employees several techniques to optimize disk footprint, like deduplication of log entries, shared indexes for fields and their values, compression of long log entries, etc. the disk footprint of journal files is generally 10x more compared to other monitoring solutions, like Loki.
+
+This can be improved by storing journal files in a compressed filesystem. In our tests, a compressed filesystem can save up to 75% of the space required by journal files. The journal files will still be bigger than the overall disk footprint of other solutions, but the flexibility (index any number of fields), reliability (minimal potential data loss) and security (tampering protection and sealing) features of systemd-journal justify the difference.
+
+When using versions of systemd prior to 254 and you are centralizing logs to a remote system, `systemd-journal-remote` creates very small files (32MB). This results in increased duplication of information across the files, increasing the overall disk footprint. systemd versions 254+, added options to `systemd-journal-remote` to control the max size per file. This can significantly reduce the duplication of information.
+
+Another limitation of the `systemd-journald` ecosystem is the uncompressed transmission of logs across systems. `systemd-journal-remote` up to version 254 that we tested, accepts encrypted, but uncompressed data. This means that when centralizing logs to a logs server, the bandwidth required will be increased compared to other log management solution.
+
+## Security Considerations
+
+`log2journal` and `systemd-cat-native` are used to convert log files to structured logs in the systemd-journald ecosystem.
+
+Systemd-journal is a logs management solution designed primarily for security and reliability. When configured properly, it can reliably and securely store your logs, ensuring they will available and unchanged for as long as you need them.
+
+When sending logs to a remote system, `systemd-cat-native` can be configured the same way `systemd-journal-upload` is configured, using HTTPS and private keys to encrypt and secure their transmission over the network.
+
+When dealing with sensitive logs, organizations usually follow 2 strategies:
+
+1. Anonymize the logs before storing them, so that the stored logs do not have any sensitive information.
+2. Store the logs in full, including sensitive information, and carefully control who and how has access to them.
+
+Netdata can help in both cases.
+
+If you want to anonymize the logs before storing them, use rewriting rules at the `log2journal` phase to remove sensitive information from them. This process usually means matching the sensitive part and replacing with `XXX` or `CUSTOMER_ID`, or `CREDIT_CARD_NUMBER`, so that the resulting log entries stored in journal files will not include any such sensitive information.
+
+If on other hand your organization prefers to maintain the full logs and control who and how has access on them, use Netdata Cloud to assign roles to your team members and control which roles can access the journal logs in your environment.
+
+## `log2journal` options
+
+```
+
+Netdata log2journal v1.43.0-341-gdac4df856
+
+Convert logs to systemd Journal Export Format.
+
+ - JSON logs: extracts all JSON fields.
+ - logfmt logs: extracts all logfmt fields.
+ - free-form logs: uses PCRE2 patterns to extracts fields.
+
+Usage: ./log2journal [OPTIONS] PATTERN|json
+
+Options:
+
+ --file /path/to/file.yaml or -f /path/to/file.yaml
+ Read yaml configuration file for instructions.
+
+ --config CONFIG_NAME or -c CONFIG_NAME
+ Run with the internal YAML configuration named CONFIG_NAME.
+ Available internal YAML configs:
+
+ nginx-combined nginx-json default
+
+--------------------------------------------------------------------------------
+ INPUT PROCESSING
+
+ PATTERN
+ PATTERN should be a valid PCRE2 regular expression.
+ RE2 regular expressions (like the ones usually used in Go applications),
+ are usually valid PCRE2 patterns too.
+ Sub-expressions without named groups are evaluated, but their matches are
+ not added to the output.
+
+ - JSON mode
+ JSON mode is enabled when the pattern is set to: json
+ Field names are extracted from the JSON logs and are converted to the
+ format expected by Journal Export Format (all caps, only _ is allowed).
+
+ - logfmt mode
+ logfmt mode is enabled when the pattern is set to: logfmt
+ Field names are extracted from the logfmt logs and are converted to the
+ format expected by Journal Export Format (all caps, only _ is allowed).
+
+ All keys extracted from the input, are transliterated to match Journal
+ semantics (capital A-Z, digits 0-9, underscore).
+
+ In a YAML file:
+ ```yaml
+ pattern: 'PCRE2 pattern | json | logfmt'
+ ```
+
+--------------------------------------------------------------------------------
+ GLOBALS
+
+ --prefix PREFIX
+ Prefix all fields with PREFIX. The PREFIX is added before any other
+ processing, so that the extracted keys have to be matched with the PREFIX in
+ them. PREFIX is NOT transliterated and it is assumed to be systemd-journal
+ friendly.
+
+ In a YAML file:
+ ```yaml
+ prefix: 'PREFIX_' # prepend all keys with this prefix.
+ ```
+
+ --filename-key KEY
+ Add a field with KEY as the key and the current filename as value.
+ Automatically detects filenames when piped after 'tail -F',
+ and tail matches multiple filenames.
+ To inject the filename when tailing a single file, use --inject.
+
+ In a YAML file:
+ ```yaml
+ filename:
+ key: KEY
+ ```
+
+--------------------------------------------------------------------------------
+ RENAMING OF KEYS
+
+ --rename NEW=OLD
+ Rename fields. OLD has been transliterated and PREFIX has been added.
+ NEW is assumed to be systemd journal friendly.
+
+ Up to 512 renaming rules are allowed.
+
+ In a YAML file:
+ ```yaml
+ rename:
+ - new_key: KEY1
+ old_key: KEY2 # transliterated with PREFIX added
+ - new_key: KEY3
+ old_key: KEY4 # transliterated with PREFIX added
+ # add as many as required
+ ```
+
+--------------------------------------------------------------------------------
+ INJECTING NEW KEYS
+
+ --inject KEY=VALUE
+ Inject constant fields to the output (both matched and unmatched logs).
+ --inject entries are added to unmatched lines too, when their key is
+ not used in --inject-unmatched (--inject-unmatched override --inject).
+ VALUE can use variable like ${OTHER_KEY} to be replaced with the values
+ of other keys available.
+
+ Up to 512 fields can be injected.
+
+ In a YAML file:
+ ```yaml
+ inject:
+ - key: KEY1
+ value: 'VALUE1'
+ - key: KEY2
+ value: '${KEY3}${KEY4}' # gets the values of KEY3 and KEY4
+ # add as many as required
+ ```
+
+--------------------------------------------------------------------------------
+ REWRITING KEY VALUES
+
+ --rewrite KEY=/MATCH/REPLACE[/OPTIONS]
+ Apply a rewrite rule to the values of a specific key.
+ The first character after KEY= is the separator, which should also
+ be used between the MATCH, REPLACE and OPTIONS.
+
+ OPTIONS can be a comma separated list of `non-empty`, `dont-stop` and
+ `inject`.
+
+ When `non-empty` is given, MATCH is expected to be a variable
+ substitution using `${KEY1}${KEY2}`. Once the substitution is completed
+ the rule is matching the KEY only if the result is not empty.
+ When `non-empty` is not set, the MATCH string is expected to be a PCRE2
+ regular expression to be checked against the KEY value. This PCRE2
+ pattern may include named groups to extract parts of the KEY's value.
+
+ REPLACE supports variable substitution like `${variable}` against MATCH
+ named groups (when MATCH is a PCRE2 pattern) and `${KEY}` against the
+ keys defined so far.
+
+ Example:
+ --rewrite DATE=/^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})$/
+ ${day}/${month}/${year}
+ The above will rewrite dates in the format YYYY-MM-DD to DD/MM/YYYY.
+
+ Only one rewrite rule is applied per key; the sequence of rewrites for a
+ given key, stops once a rule matches it. This allows providing a sequence
+ of independent rewriting rules for the same key, matching the different
+ values the key may get, and also provide a catch-all rewrite rule at the
+ end, for setting the key value if no other rule matched it. The rewrite
+ rule can allow processing more rewrite rules when OPTIONS includes
+ the keyword 'dont-stop'.
+
+ Up to 512 rewriting rules are allowed.
+
+ In a YAML file:
+ ```yaml
+ rewrite:
+ # the order if these rules in important - processed top to bottom
+ - key: KEY1
+ match: 'PCRE2 PATTERN WITH NAMED GROUPS'
+ value: 'all match fields and input keys as ${VARIABLE}'
+ inject: BOOLEAN # yes = inject the field, don't just rewrite it
+ stop: BOOLEAN # no = continue processing, don't stop if matched
+ - key: KEY2
+ non_empty: '${KEY3}${KEY4}' # match only if this evaluates to non empty
+ value: 'all input keys as ${VARIABLE}'
+ inject: BOOLEAN # yes = inject the field, don't just rewrite it
+ stop: BOOLEAN # no = continue processing, don't stop if matched
+ # add as many rewrites as required
+ ```
+
+ By default rewrite rules are applied only on fields already defined.
+ This allows shipping YAML files that include more rewrites than are
+ required for a specific input file.
+ Rewrite rules however allow injecting new fields when OPTIONS include
+ the keyword `inject` or in YAML `inject: yes` is given.
+
+ MATCH on the command line can be empty to define an unconditional rule.
+ Similarly, `match` and `non_empty` can be omitted in the YAML file.
+--------------------------------------------------------------------------------
+ UNMATCHED LINES
+
+ --unmatched-key KEY
+ Include unmatched log entries in the output with KEY as the field name.
+ Use this to include unmatched entries to the output stream.
+ Usually it should be set to --unmatched-key=MESSAGE so that the
+ unmatched entry will appear as the log message in the journals.
+ Use --inject-unmatched to inject additional fields to unmatched lines.
+
+ In a YAML file:
+ ```yaml
+ unmatched:
+ key: MESSAGE # inject the error log as MESSAGE
+ ```
+
+ --inject-unmatched LINE
+ Inject lines into the output for each unmatched log entry.
+ Usually, --inject-unmatched=PRIORITY=3 is needed to mark the unmatched
+ lines as errors, so that they can easily be spotted in the journals.
+
+ Up to 512 such lines can be injected.
+
+ In a YAML file:
+ ```yaml
+ unmatched:
+ key: MESSAGE # inject the error log as MESSAGE
+ inject::
+ - key: KEY1
+ value: 'VALUE1'
+ # add as many constants as required
+ ```
+
+--------------------------------------------------------------------------------
+ FILTERING
+
+ --include PATTERN
+ Include only keys matching the PCRE2 PATTERN.
+ Useful when parsing JSON of logfmt logs, to include only the keys given.
+ The keys are matched after the PREFIX has been added to them.
+
+ --exclude PATTERN
+ Exclude the keys matching the PCRE2 PATTERN.
+ Useful when parsing JSON of logfmt logs, to exclude some of the keys given.
+ The keys are matched after the PREFIX has been added to them.
+
+ When both include and exclude patterns are set and both match a key,
+ exclude wins and the key will not be added, like a pipeline, we first
+ include it and then exclude it.
+
+ In a YAML file:
+ ```yaml
+ filter:
+ include: 'PCRE2 PATTERN MATCHING KEY NAMES TO INCLUDE'
+ exclude: 'PCRE2 PATTERN MATCHING KEY NAMES TO EXCLUDE'
+ ```
+
+--------------------------------------------------------------------------------
+ OTHER
+
+ -h, or --help
+ Display this help and exit.
+
+ --show-config
+ Show the configuration in YAML format before starting the job.
+ This is also an easy way to convert command line parameters to yaml.
+
+The program accepts all parameters as both --option=value and --option value.
+
+The maximum log line length accepted is 1048576 characters.
+
+PIPELINE AND SEQUENCE OF PROCESSING
+
+This is a simple diagram of the pipeline taking place:
+
+ +---------------------------------------------------+
+ | INPUT |
+ | read one log line at a time |
+ +---------------------------------------------------+
+ v v v v v v
+ +---------------------------------------------------+
+ | EXTRACT FIELDS AND VALUES |
+ | JSON, logfmt, or pattern based |
+ | (apply optional PREFIX - all keys use capitals) |
+ +---------------------------------------------------+
+ v v v v v v
+ +---------------------------------------------------+
+ | RENAME FIELDS |
+ | change the names of the fields |
+ +---------------------------------------------------+
+ v v v v v v
+ +---------------------------------------------------+
+ | INJECT NEW FIELDS |
+ | constants, or other field values as variables |
+ +---------------------------------------------------+
+ v v v v v v
+ +---------------------------------------------------+
+ | REWRITE FIELD VALUES |
+ | pipeline multiple rewriting rules to alter |
+ | the values of the fields |
+ +---------------------------------------------------+
+ v v v v v v
+ +---------------------------------------------------+
+ | FILTER FIELDS |
+ | use include and exclude patterns on the field |
+ | names, to select which fields are sent to journal |
+ +---------------------------------------------------+
+ v v v v v v
+ +---------------------------------------------------+
+ | OUTPUT |
+ | generate Journal Export Format |
+ +---------------------------------------------------+
+
+--------------------------------------------------------------------------------
+JOURNAL FIELDS RULES (enforced by systemd-journald)
+
+ - field names can be up to 64 characters
+ - the only allowed field characters are A-Z, 0-9 and underscore
+ - the first character of fields cannot be a digit
+ - protected journal fields start with underscore:
+ * they are accepted by systemd-journal-remote
+ * they are NOT accepted by a local systemd-journald
+
+ For best results, always include these fields:
+
+ MESSAGE=TEXT
+ The MESSAGE is the body of the log entry.
+ This field is what we usually see in our logs.
+
+ PRIORITY=NUMBER
+ PRIORITY sets the severity of the log entry.
+ 0=emerg, 1=alert, 2=crit, 3=err, 4=warn, 5=notice, 6=info, 7=debug
+ - Emergency events (0) are usually broadcast to all terminals.
+ - Emergency, alert, critical, and error (0-3) are usually colored red.
+ - Warning (4) entries are usually colored yellow.
+ - Notice (5) entries are usually bold or have a brighter white color.
+ - Info (6) entries are the default.
+ - Debug (7) entries are usually grayed or dimmed.
+
+ SYSLOG_IDENTIFIER=NAME
+ SYSLOG_IDENTIFIER sets the name of application.
+ Use something descriptive, like: SYSLOG_IDENTIFIER=nginx-logs
+
+You can find the most common fields at 'man systemd.journal-fields'.
+
+```
+
+`log2journal` supports YAML configuration files, like the ones found [in this directory](https://github.com/netdata/netdata/tree/master/collectors/log2journal/log2journal.d).
+
+## `systemd-cat-native` options
+
+Read [the manual of systemd-cat-native](https://github.com/netdata/netdata/blob/master/libnetdata/log/systemd-cat-native.md).
diff --git a/collectors/log2journal/log2journal-help.c b/collectors/log2journal/log2journal-help.c
new file mode 100644
index 00000000..21be948e
--- /dev/null
+++ b/collectors/log2journal/log2journal-help.c
@@ -0,0 +1,377 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "log2journal.h"
+
+static void config_dir_print_available(void) {
+ const char *path = LOG2JOURNAL_CONFIG_PATH;
+ DIR *dir;
+ struct dirent *entry;
+
+ dir = opendir(path);
+
+ if (dir == NULL) {
+ log2stderr(" >>> Cannot open directory:\n %s", path);
+ return;
+ }
+
+ size_t column_width = 80;
+ size_t current_columns = 7; // Start with 7 spaces for the first line
+
+ while ((entry = readdir(dir))) {
+ if (entry->d_type == DT_REG) { // Check if it's a regular file
+ const char *file_name = entry->d_name;
+ size_t len = strlen(file_name);
+ if (len >= 5 && strcmp(file_name + len - 5, ".yaml") == 0) {
+ // Remove the ".yaml" extension
+ len -= 5;
+ if (current_columns == 7) {
+ printf(" "); // Print 7 spaces at the beginning of a new line
+ }
+ if (current_columns + len + 1 > column_width) {
+ // Start a new line if the current line is full
+ printf("\n "); // Print newline and 7 spaces
+ current_columns = 7;
+ }
+ printf("%.*s ", (int)len, file_name); // Print the filename without extension
+ current_columns += len + 1; // Add filename length and a space
+ }
+ }
+ }
+
+ closedir(dir);
+ printf("\n"); // Add a newline at the end
+}
+
+void log_job_command_line_help(const char *name) {
+ printf("\n");
+ printf("Netdata log2journal " PACKAGE_VERSION "\n");
+ printf("\n");
+ printf("Convert logs to systemd Journal Export Format.\n");
+ printf("\n");
+ printf(" - JSON logs: extracts all JSON fields.\n");
+ printf(" - logfmt logs: extracts all logfmt fields.\n");
+ printf(" - free-form logs: uses PCRE2 patterns to extracts fields.\n");
+ printf("\n");
+ printf("Usage: %s [OPTIONS] PATTERN|json\n", name);
+ printf("\n");
+ printf("Options:\n");
+ printf("\n");
+#ifdef HAVE_LIBYAML
+ printf(" --file /path/to/file.yaml or -f /path/to/file.yaml\n");
+ printf(" Read yaml configuration file for instructions.\n");
+ printf("\n");
+ printf(" --config CONFIG_NAME or -c CONFIG_NAME\n");
+ printf(" Run with the internal YAML configuration named CONFIG_NAME.\n");
+ printf(" Available internal YAML configs:\n");
+ printf("\n");
+ config_dir_print_available();
+ printf("\n");
+#else
+ printf(" IMPORTANT:\n");
+ printf(" YAML configuration parsing is not compiled in this binary.\n");
+ printf("\n");
+#endif
+ printf("--------------------------------------------------------------------------------\n");
+ printf(" INPUT PROCESSING\n");
+ printf("\n");
+ printf(" PATTERN\n");
+ printf(" PATTERN should be a valid PCRE2 regular expression.\n");
+ printf(" RE2 regular expressions (like the ones usually used in Go applications),\n");
+ printf(" are usually valid PCRE2 patterns too.\n");
+ printf(" Sub-expressions without named groups are evaluated, but their matches are\n");
+ printf(" not added to the output.\n");
+ printf("\n");
+ printf(" - JSON mode\n");
+ printf(" JSON mode is enabled when the pattern is set to: json\n");
+ printf(" Field names are extracted from the JSON logs and are converted to the\n");
+ printf(" format expected by Journal Export Format (all caps, only _ is allowed).\n");
+ printf("\n");
+ printf(" - logfmt mode\n");
+ printf(" logfmt mode is enabled when the pattern is set to: logfmt\n");
+ printf(" Field names are extracted from the logfmt logs and are converted to the\n");
+ printf(" format expected by Journal Export Format (all caps, only _ is allowed).\n");
+ printf("\n");
+ printf(" All keys extracted from the input, are transliterated to match Journal\n");
+ printf(" semantics (capital A-Z, digits 0-9, underscore).\n");
+ printf("\n");
+ printf(" In a YAML file:\n");
+ printf(" ```yaml\n");
+ printf(" pattern: 'PCRE2 pattern | json | logfmt'\n");
+ printf(" ```\n");
+ printf("\n");
+ printf("--------------------------------------------------------------------------------\n");
+ printf(" GLOBALS\n");
+ printf("\n");
+ printf(" --prefix PREFIX\n");
+ printf(" Prefix all fields with PREFIX. The PREFIX is added before any other\n");
+ printf(" processing, so that the extracted keys have to be matched with the PREFIX in\n");
+ printf(" them. PREFIX is NOT transliterated and it is assumed to be systemd-journal\n");
+ printf(" friendly.\n");
+ printf("\n");
+ printf(" In a YAML file:\n");
+ printf(" ```yaml\n");
+ printf(" prefix: 'PREFIX_' # prepend all keys with this prefix.\n");
+ printf(" ```\n");
+ printf("\n");
+ printf(" --filename-key KEY\n");
+ printf(" Add a field with KEY as the key and the current filename as value.\n");
+ printf(" Automatically detects filenames when piped after 'tail -F',\n");
+ printf(" and tail matches multiple filenames.\n");
+ printf(" To inject the filename when tailing a single file, use --inject.\n");
+ printf("\n");
+ printf(" In a YAML file:\n");
+ printf(" ```yaml\n");
+ printf(" filename:\n");
+ printf(" key: KEY\n");
+ printf(" ```\n");
+ printf("\n");
+ printf("--------------------------------------------------------------------------------\n");
+ printf(" RENAMING OF KEYS\n");
+ printf("\n");
+ printf(" --rename NEW=OLD\n");
+ printf(" Rename fields. OLD has been transliterated and PREFIX has been added.\n");
+ printf(" NEW is assumed to be systemd journal friendly.\n");
+ printf("\n");
+ printf(" Up to %d renaming rules are allowed.\n", MAX_RENAMES);
+ printf("\n");
+ printf(" In a YAML file:\n");
+ printf(" ```yaml\n");
+ printf(" rename:\n");
+ printf(" - new_key: KEY1\n");
+ printf(" old_key: KEY2 # transliterated with PREFIX added\n");
+ printf(" - new_key: KEY3\n");
+ printf(" old_key: KEY4 # transliterated with PREFIX added\n");
+ printf(" # add as many as required\n");
+ printf(" ```\n");
+ printf("\n");
+ printf("--------------------------------------------------------------------------------\n");
+ printf(" INJECTING NEW KEYS\n");
+ printf("\n");
+ printf(" --inject KEY=VALUE\n");
+ printf(" Inject constant fields to the output (both matched and unmatched logs).\n");
+ printf(" --inject entries are added to unmatched lines too, when their key is\n");
+ printf(" not used in --inject-unmatched (--inject-unmatched override --inject).\n");
+ printf(" VALUE can use variable like ${OTHER_KEY} to be replaced with the values\n");
+ printf(" of other keys available.\n");
+ printf("\n");
+ printf(" Up to %d fields can be injected.\n", MAX_INJECTIONS);
+ printf("\n");
+ printf(" In a YAML file:\n");
+ printf(" ```yaml\n");
+ printf(" inject:\n");
+ printf(" - key: KEY1\n");
+ printf(" value: 'VALUE1'\n");
+ printf(" - key: KEY2\n");
+ printf(" value: '${KEY3}${KEY4}' # gets the values of KEY3 and KEY4\n");
+ printf(" # add as many as required\n");
+ printf(" ```\n");
+ printf("\n");
+ printf("--------------------------------------------------------------------------------\n");
+ printf(" REWRITING KEY VALUES\n");
+ printf("\n");
+ printf(" --rewrite KEY=/MATCH/REPLACE[/OPTIONS]\n");
+ printf(" Apply a rewrite rule to the values of a specific key.\n");
+ printf(" The first character after KEY= is the separator, which should also\n");
+ printf(" be used between the MATCH, REPLACE and OPTIONS.\n");
+ printf("\n");
+ printf(" OPTIONS can be a comma separated list of `non-empty`, `dont-stop` and\n");
+ printf(" `inject`.\n");
+ printf("\n");
+ printf(" When `non-empty` is given, MATCH is expected to be a variable\n");
+ printf(" substitution using `${KEY1}${KEY2}`. Once the substitution is completed\n");
+ printf(" the rule is matching the KEY only if the result is not empty.\n");
+ printf(" When `non-empty` is not set, the MATCH string is expected to be a PCRE2\n");
+ printf(" regular expression to be checked against the KEY value. This PCRE2\n");
+ printf(" pattern may include named groups to extract parts of the KEY's value.\n");
+ printf("\n");
+ printf(" REPLACE supports variable substitution like `${variable}` against MATCH\n");
+ printf(" named groups (when MATCH is a PCRE2 pattern) and `${KEY}` against the\n");
+ printf(" keys defined so far.\n");
+ printf("\n");
+ printf(" Example:\n");
+ printf(" --rewrite DATE=/^(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})$/\n");
+ printf(" ${day}/${month}/${year}\n");
+ printf(" The above will rewrite dates in the format YYYY-MM-DD to DD/MM/YYYY.\n");
+ printf("\n");
+ printf(" Only one rewrite rule is applied per key; the sequence of rewrites for a\n");
+ printf(" given key, stops once a rule matches it. This allows providing a sequence\n");
+ printf(" of independent rewriting rules for the same key, matching the different\n");
+ printf(" values the key may get, and also provide a catch-all rewrite rule at the\n");
+ printf(" end, for setting the key value if no other rule matched it. The rewrite\n");
+ printf(" rule can allow processing more rewrite rules when OPTIONS includes\n");
+ printf(" the keyword 'dont-stop'.\n");
+ printf("\n");
+ printf(" Up to %d rewriting rules are allowed.\n", MAX_REWRITES);
+ printf("\n");
+ printf(" In a YAML file:\n");
+ printf(" ```yaml\n");
+ printf(" rewrite:\n");
+ printf(" # the order if these rules in important - processed top to bottom\n");
+ printf(" - key: KEY1\n");
+ printf(" match: 'PCRE2 PATTERN WITH NAMED GROUPS'\n");
+ printf(" value: 'all match fields and input keys as ${VARIABLE}'\n");
+ printf(" inject: BOOLEAN # yes = inject the field, don't just rewrite it\n");
+ printf(" stop: BOOLEAN # no = continue processing, don't stop if matched\n");
+ printf(" - key: KEY2\n");
+ printf(" non_empty: '${KEY3}${KEY4}' # match only if this evaluates to non empty\n");
+ printf(" value: 'all input keys as ${VARIABLE}'\n");
+ printf(" inject: BOOLEAN # yes = inject the field, don't just rewrite it\n");
+ printf(" stop: BOOLEAN # no = continue processing, don't stop if matched\n");
+ printf(" # add as many rewrites as required\n");
+ printf(" ```\n");
+ printf("\n");
+ printf(" By default rewrite rules are applied only on fields already defined.\n");
+ printf(" This allows shipping YAML files that include more rewrites than are\n");
+ printf(" required for a specific input file.\n");
+ printf(" Rewrite rules however allow injecting new fields when OPTIONS include\n");
+ printf(" the keyword `inject` or in YAML `inject: yes` is given.\n");
+ printf("\n");
+ printf(" MATCH on the command line can be empty to define an unconditional rule.\n");
+ printf(" Similarly, `match` and `non_empty` can be omitted in the YAML file.");
+ printf("\n");
+ printf("--------------------------------------------------------------------------------\n");
+ printf(" UNMATCHED LINES\n");
+ printf("\n");
+ printf(" --unmatched-key KEY\n");
+ printf(" Include unmatched log entries in the output with KEY as the field name.\n");
+ printf(" Use this to include unmatched entries to the output stream.\n");
+ printf(" Usually it should be set to --unmatched-key=MESSAGE so that the\n");
+ printf(" unmatched entry will appear as the log message in the journals.\n");
+ printf(" Use --inject-unmatched to inject additional fields to unmatched lines.\n");
+ printf("\n");
+ printf(" In a YAML file:\n");
+ printf(" ```yaml\n");
+ printf(" unmatched:\n");
+ printf(" key: MESSAGE # inject the error log as MESSAGE\n");
+ printf(" ```\n");
+ printf("\n");
+ printf(" --inject-unmatched LINE\n");
+ printf(" Inject lines into the output for each unmatched log entry.\n");
+ printf(" Usually, --inject-unmatched=PRIORITY=3 is needed to mark the unmatched\n");
+ printf(" lines as errors, so that they can easily be spotted in the journals.\n");
+ printf("\n");
+ printf(" Up to %d such lines can be injected.\n", MAX_INJECTIONS);
+ printf("\n");
+ printf(" In a YAML file:\n");
+ printf(" ```yaml\n");
+ printf(" unmatched:\n");
+ printf(" key: MESSAGE # inject the error log as MESSAGE\n");
+ printf(" inject::\n");
+ printf(" - key: KEY1\n");
+ printf(" value: 'VALUE1'\n");
+ printf(" # add as many constants as required\n");
+ printf(" ```\n");
+ printf("\n");
+ printf("--------------------------------------------------------------------------------\n");
+ printf(" FILTERING\n");
+ printf("\n");
+ printf(" --include PATTERN\n");
+ printf(" Include only keys matching the PCRE2 PATTERN.\n");
+ printf(" Useful when parsing JSON of logfmt logs, to include only the keys given.\n");
+ printf(" The keys are matched after the PREFIX has been added to them.\n");
+ printf("\n");
+ printf(" --exclude PATTERN\n");
+ printf(" Exclude the keys matching the PCRE2 PATTERN.\n");
+ printf(" Useful when parsing JSON of logfmt logs, to exclude some of the keys given.\n");
+ printf(" The keys are matched after the PREFIX has been added to them.\n");
+ printf("\n");
+ printf(" When both include and exclude patterns are set and both match a key,\n");
+ printf(" exclude wins and the key will not be added, like a pipeline, we first\n");
+ printf(" include it and then exclude it.\n");
+ printf("\n");
+ printf(" In a YAML file:\n");
+ printf(" ```yaml\n");
+ printf(" filter:\n");
+ printf(" include: 'PCRE2 PATTERN MATCHING KEY NAMES TO INCLUDE'\n");
+ printf(" exclude: 'PCRE2 PATTERN MATCHING KEY NAMES TO EXCLUDE'\n");
+ printf(" ```\n");
+ printf("\n");
+ printf("--------------------------------------------------------------------------------\n");
+ printf(" OTHER\n");
+ printf("\n");
+ printf(" -h, or --help\n");
+ printf(" Display this help and exit.\n");
+ printf("\n");
+ printf(" --show-config\n");
+ printf(" Show the configuration in YAML format before starting the job.\n");
+ printf(" This is also an easy way to convert command line parameters to yaml.\n");
+ printf("\n");
+ printf("The program accepts all parameters as both --option=value and --option value.\n");
+ printf("\n");
+ printf("The maximum log line length accepted is %d characters.\n", MAX_LINE_LENGTH);
+ printf("\n");
+ printf("PIPELINE AND SEQUENCE OF PROCESSING\n");
+ printf("\n");
+ printf("This is a simple diagram of the pipeline taking place:\n");
+ printf(" \n");
+ printf(" +---------------------------------------------------+ \n");
+ printf(" | INPUT | \n");
+ printf(" | read one log line at a time | \n");
+ printf(" +---------------------------------------------------+ \n");
+ printf(" v v v v v v \n");
+ printf(" +---------------------------------------------------+ \n");
+ printf(" | EXTRACT FIELDS AND VALUES | \n");
+ printf(" | JSON, logfmt, or pattern based | \n");
+ printf(" | (apply optional PREFIX - all keys use capitals) | \n");
+ printf(" +---------------------------------------------------+ \n");
+ printf(" v v v v v v \n");
+ printf(" +---------------------------------------------------+ \n");
+ printf(" | RENAME FIELDS | \n");
+ printf(" | change the names of the fields | \n");
+ printf(" +---------------------------------------------------+ \n");
+ printf(" v v v v v v \n");
+ printf(" +---------------------------------------------------+ \n");
+ printf(" | INJECT NEW FIELDS | \n");
+ printf(" | constants, or other field values as variables | \n");
+ printf(" +---------------------------------------------------+ \n");
+ printf(" v v v v v v \n");
+ printf(" +---------------------------------------------------+ \n");
+ printf(" | REWRITE FIELD VALUES | \n");
+ printf(" | pipeline multiple rewriting rules to alter | \n");
+ printf(" | the values of the fields | \n");
+ printf(" +---------------------------------------------------+ \n");
+ printf(" v v v v v v \n");
+ printf(" +---------------------------------------------------+ \n");
+ printf(" | FILTER FIELDS | \n");
+ printf(" | use include and exclude patterns on the field | \n");
+ printf(" | names, to select which fields are sent to journal | \n");
+ printf(" +---------------------------------------------------+ \n");
+ printf(" v v v v v v \n");
+ printf(" +---------------------------------------------------+ \n");
+ printf(" | OUTPUT | \n");
+ printf(" | generate Journal Export Format | \n");
+ printf(" +---------------------------------------------------+ \n");
+ printf(" \n");
+ printf("--------------------------------------------------------------------------------\n");
+ printf("JOURNAL FIELDS RULES (enforced by systemd-journald)\n");
+ printf("\n");
+ printf(" - field names can be up to 64 characters\n");
+ printf(" - the only allowed field characters are A-Z, 0-9 and underscore\n");
+ printf(" - the first character of fields cannot be a digit\n");
+ printf(" - protected journal fields start with underscore:\n");
+ printf(" * they are accepted by systemd-journal-remote\n");
+ printf(" * they are NOT accepted by a local systemd-journald\n");
+ printf("\n");
+ printf(" For best results, always include these fields:\n");
+ printf("\n");
+ printf(" MESSAGE=TEXT\n");
+ printf(" The MESSAGE is the body of the log entry.\n");
+ printf(" This field is what we usually see in our logs.\n");
+ printf("\n");
+ printf(" PRIORITY=NUMBER\n");
+ printf(" PRIORITY sets the severity of the log entry.\n");
+ printf(" 0=emerg, 1=alert, 2=crit, 3=err, 4=warn, 5=notice, 6=info, 7=debug\n");
+ printf(" - Emergency events (0) are usually broadcast to all terminals.\n");
+ printf(" - Emergency, alert, critical, and error (0-3) are usually colored red.\n");
+ printf(" - Warning (4) entries are usually colored yellow.\n");
+ printf(" - Notice (5) entries are usually bold or have a brighter white color.\n");
+ printf(" - Info (6) entries are the default.\n");
+ printf(" - Debug (7) entries are usually grayed or dimmed.\n");
+ printf("\n");
+ printf(" SYSLOG_IDENTIFIER=NAME\n");
+ printf(" SYSLOG_IDENTIFIER sets the name of application.\n");
+ printf(" Use something descriptive, like: SYSLOG_IDENTIFIER=nginx-logs\n");
+ printf("\n");
+ printf("You can find the most common fields at 'man systemd.journal-fields'.\n");
+ printf("\n");
+}
diff --git a/collectors/log2journal/log2journal-inject.c b/collectors/log2journal/log2journal-inject.c
new file mode 100644
index 00000000..45158066
--- /dev/null
+++ b/collectors/log2journal/log2journal-inject.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "log2journal.h"
+
+void injection_cleanup(INJECTION *inj) {
+ hashed_key_cleanup(&inj->key);
+ replace_pattern_cleanup(&inj->value);
+}
+
+static inline bool log_job_injection_replace(INJECTION *inj, const char *key, size_t key_len, const char *value, size_t value_len) {
+ if(key_len > JOURNAL_MAX_KEY_LEN)
+ log2stderr("WARNING: injection key '%.*s' is too long for journal. Will be truncated.", (int)key_len, key);
+
+ if(value_len > JOURNAL_MAX_VALUE_LEN)
+ log2stderr("WARNING: injection value of key '%.*s' is too long for journal. Will be truncated.", (int)key_len, key);
+
+ hashed_key_len_set(&inj->key, key, key_len);
+ char *v = strndupz(value, value_len);
+ bool ret = replace_pattern_set(&inj->value, v);
+ freez(v);
+
+ return ret;
+}
+
+bool log_job_injection_add(LOG_JOB *jb, const char *key, size_t key_len, const char *value, size_t value_len, bool unmatched) {
+ if (unmatched) {
+ if (jb->unmatched.injections.used >= MAX_INJECTIONS) {
+ log2stderr("Error: too many unmatched injections. You can inject up to %d lines.", MAX_INJECTIONS);
+ return false;
+ }
+ }
+ else {
+ if (jb->injections.used >= MAX_INJECTIONS) {
+ log2stderr("Error: too many injections. You can inject up to %d lines.", MAX_INJECTIONS);
+ return false;
+ }
+ }
+
+ bool ret;
+ if (unmatched) {
+ ret = log_job_injection_replace(&jb->unmatched.injections.keys[jb->unmatched.injections.used++],
+ key, key_len, value, value_len);
+ } else {
+ ret = log_job_injection_replace(&jb->injections.keys[jb->injections.used++],
+ key, key_len, value, value_len);
+ }
+
+ return ret;
+}
diff --git a/collectors/log2journal/log2journal-json.c b/collectors/log2journal/log2journal-json.c
new file mode 100644
index 00000000..2ca294e4
--- /dev/null
+++ b/collectors/log2journal/log2journal-json.c
@@ -0,0 +1,630 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "log2journal.h"
+
+#define JSON_ERROR_LINE_MAX 1024
+#define JSON_KEY_MAX 1024
+#define JSON_DEPTH_MAX 100
+
+struct log_json_state {
+ LOG_JOB *jb;
+
+ const char *line;
+ uint32_t pos;
+ uint32_t depth;
+ char *stack[JSON_DEPTH_MAX];
+
+ char key[JSON_KEY_MAX];
+ char msg[JSON_ERROR_LINE_MAX];
+};
+
+static inline bool json_parse_object(LOG_JSON_STATE *js);
+static inline bool json_parse_array(LOG_JSON_STATE *js);
+
+#define json_current_pos(js) &(js)->line[(js)->pos]
+#define json_consume_char(js) ++(js)->pos
+
+static inline void json_process_key_value(LOG_JSON_STATE *js, const char *value, size_t len) {
+ log_job_send_extracted_key_value(js->jb, js->key, value, len);
+}
+
+static inline void json_skip_spaces(LOG_JSON_STATE *js) {
+ const char *s = json_current_pos(js);
+ const char *start = s;
+
+ while(isspace(*s)) s++;
+
+ js->pos += s - start;
+}
+
+static inline bool json_expect_char_after_white_space(LOG_JSON_STATE *js, const char *expected) {
+ json_skip_spaces(js);
+
+ const char *s = json_current_pos(js);
+ for(const char *e = expected; *e ;e++) {
+ if (*s == *e)
+ return true;
+ }
+
+ snprintf(js->msg, sizeof(js->msg),
+ "JSON PARSER: character '%c' is not one of the expected characters (%s), at pos %zu",
+ *s ? *s : '?', expected, js->pos);
+
+ return false;
+}
+
+static inline bool json_parse_null(LOG_JSON_STATE *js) {
+ const char *s = json_current_pos(js);
+ if (strncmp(s, "null", 4) == 0) {
+ json_process_key_value(js, "null", 4);
+ js->pos += 4;
+ return true;
+ }
+ else {
+ snprintf(js->msg, sizeof(js->msg),
+ "JSON PARSER: expected 'null', found '%.4s' at position %zu", s, js->pos);
+ return false;
+ }
+}
+
+static inline bool json_parse_true(LOG_JSON_STATE *js) {
+ const char *s = json_current_pos(js);
+ if (strncmp(s, "true", 4) == 0) {
+ json_process_key_value(js, "true", 4);
+ js->pos += 4;
+ return true;
+ }
+ else {
+ snprintf(js->msg, sizeof(js->msg),
+ "JSON PARSER: expected 'true', found '%.4s' at position %zu", s, js->pos);
+ return false;
+ }
+}
+
+static inline bool json_parse_false(LOG_JSON_STATE *js) {
+ const char *s = json_current_pos(js);
+ if (strncmp(s, "false", 5) == 0) {
+ json_process_key_value(js, "false", 5);
+ js->pos += 5;
+ return true;
+ }
+ else {
+ snprintf(js->msg, sizeof(js->msg),
+ "JSON PARSER: expected 'false', found '%.4s' at position %zu", s, js->pos);
+ return false;
+ }
+}
+
+static inline bool json_parse_number(LOG_JSON_STATE *js) {
+ static __thread char value[8192];
+
+ value[0] = '\0';
+ char *d = value;
+ const char *s = json_current_pos(js);
+ size_t remaining = sizeof(value) - 1; // Reserve space for null terminator
+
+ // Optional minus sign
+ if (*s == '-') {
+ *d++ = *s++;
+ remaining--;
+ }
+
+ // Digits before decimal point
+ while (*s >= '0' && *s <= '9') {
+ if (remaining < 2) {
+ snprintf(js->msg, sizeof(js->msg), "JSON PARSER: truncated number value at pos %zu", js->pos);
+ return false;
+ }
+ *d++ = *s++;
+ remaining--;
+ }
+
+ // Decimal point and fractional part
+ if (*s == '.') {
+ *d++ = *s++;
+ remaining--;
+
+ while (*s >= '0' && *s <= '9') {
+ if (remaining < 2) {
+ snprintf(js->msg, sizeof(js->msg), "JSON PARSER: truncated fractional part at pos %zu", js->pos);
+ return false;
+ }
+ *d++ = *s++;
+ remaining--;
+ }
+ }
+
+ // Exponent part
+ if (*s == 'e' || *s == 'E') {
+ *d++ = *s++;
+ remaining--;
+
+ // Optional sign in exponent
+ if (*s == '+' || *s == '-') {
+ *d++ = *s++;
+ remaining--;
+ }
+
+ while (*s >= '0' && *s <= '9') {
+ if (remaining < 2) {
+ snprintf(js->msg, sizeof(js->msg), "JSON PARSER: truncated exponent at pos %zu", js->pos);
+ return false;
+ }
+ *d++ = *s++;
+ remaining--;
+ }
+ }
+
+ *d = '\0';
+ js->pos += d - value;
+
+ if (d > value) {
+ json_process_key_value(js, value, d - value);
+ return true;
+ } else {
+ snprintf(js->msg, sizeof(js->msg), "JSON PARSER: invalid number format at pos %zu", js->pos);
+ return false;
+ }
+}
+
+static inline bool encode_utf8(unsigned codepoint, char **d, size_t *remaining) {
+ if (codepoint <= 0x7F) {
+ // 1-byte sequence
+ if (*remaining < 2) return false; // +1 for the null
+ *(*d)++ = (char)codepoint;
+ (*remaining)--;
+ }
+ else if (codepoint <= 0x7FF) {
+ // 2-byte sequence
+ if (*remaining < 3) return false; // +1 for the null
+ *(*d)++ = (char)(0xC0 | ((codepoint >> 6) & 0x1F));
+ *(*d)++ = (char)(0x80 | (codepoint & 0x3F));
+ (*remaining) -= 2;
+ }
+ else if (codepoint <= 0xFFFF) {
+ // 3-byte sequence
+ if (*remaining < 4) return false; // +1 for the null
+ *(*d)++ = (char)(0xE0 | ((codepoint >> 12) & 0x0F));
+ *(*d)++ = (char)(0x80 | ((codepoint >> 6) & 0x3F));
+ *(*d)++ = (char)(0x80 | (codepoint & 0x3F));
+ (*remaining) -= 3;
+ }
+ else if (codepoint <= 0x10FFFF) {
+ // 4-byte sequence
+ if (*remaining < 5) return false; // +1 for the null
+ *(*d)++ = (char)(0xF0 | ((codepoint >> 18) & 0x07));
+ *(*d)++ = (char)(0x80 | ((codepoint >> 12) & 0x3F));
+ *(*d)++ = (char)(0x80 | ((codepoint >> 6) & 0x3F));
+ *(*d)++ = (char)(0x80 | (codepoint & 0x3F));
+ (*remaining) -= 4;
+ }
+ else
+ // Invalid code point
+ return false;
+
+ return true;
+}
+
+size_t parse_surrogate(const char *s, char *d, size_t *remaining) {
+ if (s[0] != '\\' || (s[1] != 'u' && s[1] != 'U')) {
+ return 0; // Not a valid Unicode escape sequence
+ }
+
+ char hex[9] = {0}; // Buffer for the hexadecimal value
+ unsigned codepoint;
+
+ if (s[1] == 'u') {
+ // Handle \uXXXX
+ if (!isxdigit(s[2]) || !isxdigit(s[3]) || !isxdigit(s[4]) || !isxdigit(s[5])) {
+ return 0; // Not a valid \uXXXX sequence
+ }
+
+ hex[0] = s[2];
+ hex[1] = s[3];
+ hex[2] = s[4];
+ hex[3] = s[5];
+ codepoint = (unsigned)strtoul(hex, NULL, 16);
+
+ if (codepoint >= 0xD800 && codepoint <= 0xDBFF) {
+ // Possible start of surrogate pair
+ if (s[6] == '\\' && s[7] == 'u' && isxdigit(s[8]) && isxdigit(s[9]) &&
+ isxdigit(s[10]) && isxdigit(s[11])) {
+ // Valid low surrogate
+ unsigned low_surrogate = strtoul(&s[8], NULL, 16);
+ if (low_surrogate < 0xDC00 || low_surrogate > 0xDFFF) {
+ return 0; // Invalid low surrogate
+ }
+ codepoint = 0x10000 + ((codepoint - 0xD800) << 10) + (low_surrogate - 0xDC00);
+ return encode_utf8(codepoint, &d, remaining) ? 12 : 0; // \uXXXX\uXXXX
+ }
+ }
+
+ // Single \uXXXX
+ return encode_utf8(codepoint, &d, remaining) ? 6 : 0;
+ }
+ else {
+ // Handle \UXXXXXXXX
+ for (int i = 2; i < 10; i++) {
+ if (!isxdigit(s[i])) {
+ return 0; // Not a valid \UXXXXXXXX sequence
+ }
+ hex[i - 2] = s[i];
+ }
+ codepoint = (unsigned)strtoul(hex, NULL, 16);
+ return encode_utf8(codepoint, &d, remaining) ? 10 : 0; // \UXXXXXXXX
+ }
+}
+
+static inline void copy_newline(LOG_JSON_STATE *js __maybe_unused, char **d, size_t *remaining) {
+ if(*remaining > 3) {
+ *(*d)++ = '\\';
+ *(*d)++ = 'n';
+ (*remaining) -= 2;
+ }
+}
+
+static inline void copy_tab(LOG_JSON_STATE *js __maybe_unused, char **d, size_t *remaining) {
+ if(*remaining > 3) {
+ *(*d)++ = '\\';
+ *(*d)++ = 't';
+ (*remaining) -= 2;
+ }
+}
+
+static inline bool json_parse_string(LOG_JSON_STATE *js) {
+ static __thread char value[JOURNAL_MAX_VALUE_LEN];
+
+ if(!json_expect_char_after_white_space(js, "\""))
+ return false;
+
+ json_consume_char(js);
+
+ value[0] = '\0';
+ char *d = value;
+ const char *s = json_current_pos(js);
+ size_t remaining = sizeof(value);
+
+ while (*s && *s != '"') {
+ char c;
+
+ if (*s == '\\') {
+ s++;
+
+ switch (*s) {
+ case 'n':
+ copy_newline(js, &d, &remaining);
+ s++;
+ continue;
+
+ case 't':
+ copy_tab(js, &d, &remaining);
+ s++;
+ continue;
+
+ case 'f':
+ case 'b':
+ case 'r':
+ c = ' ';
+ s++;
+ break;
+
+ case 'u': {
+ size_t old_remaining = remaining;
+ size_t consumed = parse_surrogate(s - 1, d, &remaining);
+ if (consumed > 0) {
+ s += consumed - 1; // -1 because we already incremented s after '\\'
+ d += old_remaining - remaining;
+ continue;
+ }
+ else {
+ *d++ = '\\';
+ remaining--;
+ c = *s++;
+ }
+ }
+ break;
+
+ default:
+ c = *s++;
+ break;
+ }
+ }
+ else
+ c = *s++;
+
+ if(remaining < 2) {
+ snprintf(js->msg, sizeof(js->msg),
+ "JSON PARSER: truncated string value at pos %zu", js->pos);
+ return false;
+ }
+ else {
+ *d++ = c;
+ remaining--;
+ }
+ }
+ *d = '\0';
+ js->pos += s - json_current_pos(js);
+
+ if(!json_expect_char_after_white_space(js, "\""))
+ return false;
+
+ json_consume_char(js);
+
+ if(d > value)
+ json_process_key_value(js, value, d - value);
+
+ return true;
+}
+
+static inline bool json_parse_key_and_push(LOG_JSON_STATE *js) {
+ if (!json_expect_char_after_white_space(js, "\""))
+ return false;
+
+ if(js->depth >= JSON_DEPTH_MAX - 1) {
+ snprintf(js->msg, sizeof(js->msg),
+ "JSON PARSER: object too deep, at pos %zu", js->pos);
+ return false;
+ }
+
+ json_consume_char(js);
+
+ char *d = js->stack[js->depth];
+ if(js->depth)
+ *d++ = '_';
+
+ size_t remaining = sizeof(js->key) - (d - js->key);
+
+ const char *s = json_current_pos(js);
+ char last_c = '\0';
+ while(*s && *s != '\"') {
+ char c;
+
+ if (*s == '\\') {
+ s++;
+ c = (char)((*s == 'u') ? '_' : journal_key_characters_map[(unsigned char)*s]);
+ s += (*s == 'u') ? 5 : 1;
+ }
+ else
+ c = journal_key_characters_map[(unsigned char)*s++];
+
+ if(c == '_' && last_c == '_')
+ continue;
+ else {
+ if(remaining < 2) {
+ snprintf(js->msg, sizeof(js->msg),
+ "JSON PARSER: key buffer full - keys are too long, at pos %zu", js->pos);
+ return false;
+ }
+ *d++ = c;
+ remaining--;
+ }
+
+ last_c = c;
+ }
+ *d = '\0';
+ js->pos += s - json_current_pos(js);
+
+ if (!json_expect_char_after_white_space(js, "\""))
+ return false;
+
+ json_consume_char(js);
+
+ js->stack[++js->depth] = d;
+
+ return true;
+}
+
+static inline bool json_key_pop(LOG_JSON_STATE *js) {
+ if(js->depth <= 0) {
+ snprintf(js->msg, sizeof(js->msg),
+ "JSON PARSER: cannot pop a key at depth %zu, at pos %zu", js->depth, js->pos);
+ return false;
+ }
+
+ char *k = js->stack[js->depth--];
+ *k = '\0';
+ return true;
+}
+
+static inline bool json_parse_value(LOG_JSON_STATE *js) {
+ if(!json_expect_char_after_white_space(js, "-.0123456789tfn\"{["))
+ return false;
+
+ const char *s = json_current_pos(js);
+ switch(*s) {
+ case '-':
+ case '0':
+ case '1':
+ case '2':
+ case '3':
+ case '4':
+ case '5':
+ case '6':
+ case '7':
+ case '8':
+ case '9':
+ return json_parse_number(js);
+
+ case 't':
+ return json_parse_true(js);
+
+ case 'f':
+ return json_parse_false(js);
+
+ case 'n':
+ return json_parse_null(js);
+
+ case '"':
+ return json_parse_string(js);
+
+ case '{':
+ return json_parse_object(js);
+
+ case '[':
+ return json_parse_array(js);
+ }
+
+ snprintf(js->msg, sizeof(js->msg),
+ "JSON PARSER: unexpected character at pos %zu", js->pos);
+ return false;
+}
+
+static inline bool json_key_index_and_push(LOG_JSON_STATE *js, size_t index) {
+ char *d = js->stack[js->depth];
+ if(js->depth > 0) {
+ *d++ = '_';
+ }
+
+ // Convert index to string manually
+ char temp[32];
+ char *t = temp + sizeof(temp) - 1; // Start at the end of the buffer
+ *t = '\0';
+
+ do {
+ *--t = (char)((index % 10) + '0');
+ index /= 10;
+ } while (index > 0);
+
+ size_t remaining = sizeof(js->key) - (d - js->key);
+
+ // Append the index to the key
+ while (*t) {
+ if(remaining < 2) {
+ snprintf(js->msg, sizeof(js->msg),
+ "JSON PARSER: key buffer full - keys are too long, at pos %zu", js->pos);
+ return false;
+ }
+
+ *d++ = *t++;
+ remaining--;
+ }
+
+ *d = '\0'; // Null-terminate the key
+ js->stack[++js->depth] = d;
+
+ return true;
+}
+
+static inline bool json_parse_array(LOG_JSON_STATE *js) {
+ if(!json_expect_char_after_white_space(js, "["))
+ return false;
+
+ json_consume_char(js);
+
+ size_t index = 0;
+ do {
+ if(!json_key_index_and_push(js, index))
+ return false;
+
+ if(!json_parse_value(js))
+ return false;
+
+ json_key_pop(js);
+
+ if(!json_expect_char_after_white_space(js, ",]"))
+ return false;
+
+ const char *s = json_current_pos(js);
+ json_consume_char(js);
+ if(*s == ',') {
+ index++;
+ continue;
+ }
+ else // }
+ break;
+
+ } while(true);
+
+ return true;
+}
+
+static inline bool json_parse_object(LOG_JSON_STATE *js) {
+ if(!json_expect_char_after_white_space(js, "{"))
+ return false;
+
+ json_consume_char(js);
+
+ do {
+ if (!json_expect_char_after_white_space(js, "\""))
+ return false;
+
+ if(!json_parse_key_and_push(js))
+ return false;
+
+ if(!json_expect_char_after_white_space(js, ":"))
+ return false;
+
+ json_consume_char(js);
+
+ if(!json_parse_value(js))
+ return false;
+
+ json_key_pop(js);
+
+ if(!json_expect_char_after_white_space(js, ",}"))
+ return false;
+
+ const char *s = json_current_pos(js);
+ json_consume_char(js);
+ if(*s == ',')
+ continue;
+ else // }
+ break;
+
+ } while(true);
+
+ return true;
+}
+
+LOG_JSON_STATE *json_parser_create(LOG_JOB *jb) {
+ LOG_JSON_STATE *js = mallocz(sizeof(LOG_JSON_STATE));
+ memset(js, 0, sizeof(LOG_JSON_STATE));
+ js->jb = jb;
+
+ if(jb->prefix)
+ copy_to_buffer(js->key, sizeof(js->key), js->jb->prefix, strlen(js->jb->prefix));
+
+ js->stack[0] = &js->key[strlen(js->key)];
+
+ return js;
+}
+
+void json_parser_destroy(LOG_JSON_STATE *js) {
+ if(js)
+ freez(js);
+}
+
+const char *json_parser_error(LOG_JSON_STATE *js) {
+ return js->msg;
+}
+
+bool json_parse_document(LOG_JSON_STATE *js, const char *txt) {
+ js->line = txt;
+ js->pos = 0;
+ js->msg[0] = '\0';
+ js->stack[0][0] = '\0';
+ js->depth = 0;
+
+ if(!json_parse_object(js))
+ return false;
+
+ json_skip_spaces(js);
+ const char *s = json_current_pos(js);
+
+ if(*s) {
+ snprintf(js->msg, sizeof(js->msg),
+ "JSON PARSER: excess characters found after document is finished, at pos %zu", js->pos);
+ return false;
+ }
+
+ return true;
+}
+
+void json_test(void) {
+ LOG_JOB jb = { .prefix = "NIGNX_" };
+ LOG_JSON_STATE *json = json_parser_create(&jb);
+
+ json_parse_document(json, "{\"value\":\"\\u\\u039A\\u03B1\\u03BB\\u03B7\\u03BC\\u03AD\\u03C1\\u03B1\"}");
+
+ json_parser_destroy(json);
+}
diff --git a/collectors/log2journal/log2journal-logfmt.c b/collectors/log2journal/log2journal-logfmt.c
new file mode 100644
index 00000000..5966cce9
--- /dev/null
+++ b/collectors/log2journal/log2journal-logfmt.c
@@ -0,0 +1,226 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "log2journal.h"
+
+#define LOGFMT_ERROR_LINE_MAX 1024
+#define LOGFMT_KEY_MAX 1024
+
+struct logfmt_state {
+ LOG_JOB *jb;
+
+ const char *line;
+ uint32_t pos;
+ uint32_t key_start;
+
+ char key[LOGFMT_KEY_MAX];
+ char msg[LOGFMT_ERROR_LINE_MAX];
+};
+
+#define logfmt_current_pos(lfs) &(lfs)->line[(lfs)->pos]
+#define logfmt_consume_char(lfs) ++(lfs)->pos
+
+static inline void logfmt_process_key_value(LOGFMT_STATE *lfs, const char *value, size_t len) {
+ log_job_send_extracted_key_value(lfs->jb, lfs->key, value, len);
+}
+
+static inline void logfmt_skip_spaces(LOGFMT_STATE *lfs) {
+ const char *s = logfmt_current_pos(lfs);
+ const char *start = s;
+
+ while(isspace(*s)) s++;
+
+ lfs->pos += s - start;
+}
+
+static inline void copy_newline(LOGFMT_STATE *lfs __maybe_unused, char **d, size_t *remaining) {
+ if(*remaining > 3) {
+ *(*d)++ = '\\';
+ *(*d)++ = 'n';
+ (*remaining) -= 2;
+ }
+}
+
+static inline void copy_tab(LOGFMT_STATE *lfs __maybe_unused, char **d, size_t *remaining) {
+ if(*remaining > 3) {
+ *(*d)++ = '\\';
+ *(*d)++ = 't';
+ (*remaining) -= 2;
+ }
+}
+
+static inline bool logftm_parse_value(LOGFMT_STATE *lfs) {
+ static __thread char value[JOURNAL_MAX_VALUE_LEN];
+
+ char quote = '\0';
+ const char *s = logfmt_current_pos(lfs);
+ if(*s == '\"' || *s == '\'') {
+ quote = *s;
+ logfmt_consume_char(lfs);
+ }
+
+ value[0] = '\0';
+ char *d = value;
+ s = logfmt_current_pos(lfs);
+ size_t remaining = sizeof(value);
+
+ char end_char = (char)(quote == '\0' ? ' ' : quote);
+ while (*s && *s != end_char) {
+ char c;
+
+ if (*s == '\\') {
+ s++;
+
+ switch (*s) {
+ case 'n':
+ copy_newline(lfs, &d, &remaining);
+ s++;
+ continue;
+
+ case 't':
+ copy_tab(lfs, &d, &remaining);
+ s++;
+ continue;
+
+ case 'f':
+ case 'b':
+ case 'r':
+ c = ' ';
+ s++;
+ break;
+
+ default:
+ c = *s++;
+ break;
+ }
+ }
+ else
+ c = *s++;
+
+ if(remaining < 2) {
+ snprintf(lfs->msg, sizeof(lfs->msg),
+ "LOGFMT PARSER: truncated string value at pos %zu", lfs->pos);
+ return false;
+ }
+ else {
+ *d++ = c;
+ remaining--;
+ }
+ }
+ *d = '\0';
+ lfs->pos += s - logfmt_current_pos(lfs);
+
+ s = logfmt_current_pos(lfs);
+
+ if(quote != '\0') {
+ if (*s != quote) {
+ snprintf(lfs->msg, sizeof(lfs->msg),
+ "LOGFMT PARSER: missing quote at pos %zu: '%s'",
+ lfs->pos, s);
+ return false;
+ }
+ else
+ logfmt_consume_char(lfs);
+ }
+
+ if(d > value)
+ logfmt_process_key_value(lfs, value, d - value);
+
+ return true;
+}
+
+static inline bool logfmt_parse_key(LOGFMT_STATE *lfs) {
+ logfmt_skip_spaces(lfs);
+
+ char *d = &lfs->key[lfs->key_start];
+
+ size_t remaining = sizeof(lfs->key) - (d - lfs->key);
+
+ const char *s = logfmt_current_pos(lfs);
+ char last_c = '\0';
+ while(*s && *s != '=') {
+ char c;
+
+ if (*s == '\\')
+ s++;
+
+ c = journal_key_characters_map[(unsigned char)*s++];
+
+ if(c == '_' && last_c == '_')
+ continue;
+ else {
+ if(remaining < 2) {
+ snprintf(lfs->msg, sizeof(lfs->msg),
+ "LOGFMT PARSER: key buffer full - keys are too long, at pos %zu", lfs->pos);
+ return false;
+ }
+ *d++ = c;
+ remaining--;
+ }
+
+ last_c = c;
+ }
+ *d = '\0';
+ lfs->pos += s - logfmt_current_pos(lfs);
+
+ s = logfmt_current_pos(lfs);
+ if(*s != '=') {
+ snprintf(lfs->msg, sizeof(lfs->msg),
+ "LOGFMT PARSER: key is missing the equal sign, at pos %zu", lfs->pos);
+ return false;
+ }
+
+ logfmt_consume_char(lfs);
+
+ return true;
+}
+
+LOGFMT_STATE *logfmt_parser_create(LOG_JOB *jb) {
+ LOGFMT_STATE *lfs = mallocz(sizeof(LOGFMT_STATE));
+ memset(lfs, 0, sizeof(LOGFMT_STATE));
+ lfs->jb = jb;
+
+ if(jb->prefix)
+ lfs->key_start = copy_to_buffer(lfs->key, sizeof(lfs->key), lfs->jb->prefix, strlen(lfs->jb->prefix));
+
+ return lfs;
+}
+
+void logfmt_parser_destroy(LOGFMT_STATE *lfs) {
+ if(lfs)
+ freez(lfs);
+}
+
+const char *logfmt_parser_error(LOGFMT_STATE *lfs) {
+ return lfs->msg;
+}
+
+bool logfmt_parse_document(LOGFMT_STATE *lfs, const char *txt) {
+ lfs->line = txt;
+ lfs->pos = 0;
+ lfs->msg[0] = '\0';
+
+ const char *s;
+ do {
+ if(!logfmt_parse_key(lfs))
+ return false;
+
+ if(!logftm_parse_value(lfs))
+ return false;
+
+ logfmt_skip_spaces(lfs);
+
+ s = logfmt_current_pos(lfs);
+ } while(*s);
+
+ return true;
+}
+
+
+void logfmt_test(void) {
+ LOG_JOB jb = { .prefix = "NIGNX_" };
+ LOGFMT_STATE *logfmt = logfmt_parser_create(&jb);
+
+ logfmt_parse_document(logfmt, "x=1 y=2 z=\"3 \\ 4\" 5 ");
+
+ logfmt_parser_destroy(logfmt);
+}
diff --git a/collectors/log2journal/log2journal-params.c b/collectors/log2journal/log2journal-params.c
new file mode 100644
index 00000000..a7bb3e26
--- /dev/null
+++ b/collectors/log2journal/log2journal-params.c
@@ -0,0 +1,404 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "log2journal.h"
+
+// ----------------------------------------------------------------------------
+
+void log_job_init(LOG_JOB *jb) {
+ memset(jb, 0, sizeof(*jb));
+ simple_hashtable_init_KEY(&jb->hashtable, 32);
+ hashed_key_set(&jb->line.key, "LINE");
+}
+
+static void simple_hashtable_cleanup_allocated_keys(SIMPLE_HASHTABLE_KEY *ht) {
+ SIMPLE_HASHTABLE_FOREACH_READ_ONLY(ht, sl, _KEY) {
+ HASHED_KEY *k = SIMPLE_HASHTABLE_FOREACH_READ_ONLY_VALUE(sl);
+ if(k && k->flags & HK_HASHTABLE_ALLOCATED) {
+ // the order of these statements is important!
+ simple_hashtable_del_slot_KEY(ht, sl); // remove any references to n
+ hashed_key_cleanup(k); // cleanup the internals of n
+ freez(k); // free n
+ }
+ }
+}
+
+void log_job_cleanup(LOG_JOB *jb) {
+ hashed_key_cleanup(&jb->line.key);
+
+ if(jb->prefix) {
+ freez((void *) jb->prefix);
+ jb->prefix = NULL;
+ }
+
+ if(jb->pattern) {
+ freez((void *) jb->pattern);
+ jb->pattern = NULL;
+ }
+
+ for(size_t i = 0; i < jb->injections.used ;i++)
+ injection_cleanup(&jb->injections.keys[i]);
+
+ for(size_t i = 0; i < jb->unmatched.injections.used ;i++)
+ injection_cleanup(&jb->unmatched.injections.keys[i]);
+
+ for(size_t i = 0; i < jb->renames.used ;i++)
+ rename_cleanup(&jb->renames.array[i]);
+
+ for(size_t i = 0; i < jb->rewrites.used; i++)
+ rewrite_cleanup(&jb->rewrites.array[i]);
+
+ txt_cleanup(&jb->rewrites.tmp);
+ txt_cleanup(&jb->filename.current);
+
+ simple_hashtable_cleanup_allocated_keys(&jb->hashtable);
+ simple_hashtable_destroy_KEY(&jb->hashtable);
+
+ // remove references to everything else, to reveal them in valgrind
+ memset(jb, 0, sizeof(*jb));
+}
+
+// ----------------------------------------------------------------------------
+
+bool log_job_filename_key_set(LOG_JOB *jb, const char *key, size_t key_len) {
+ if(!key || !*key) {
+ log2stderr("filename key cannot be empty.");
+ return false;
+ }
+
+ hashed_key_len_set(&jb->filename.key, key, key_len);
+
+ return true;
+}
+
+bool log_job_key_prefix_set(LOG_JOB *jb, const char *prefix, size_t prefix_len) {
+ if(!prefix || !*prefix) {
+ log2stderr("filename key cannot be empty.");
+ return false;
+ }
+
+ if(jb->prefix)
+ freez((char*)jb->prefix);
+
+ jb->prefix = strndupz(prefix, prefix_len);
+
+ return true;
+}
+
+bool log_job_pattern_set(LOG_JOB *jb, const char *pattern, size_t pattern_len) {
+ if(!pattern || !*pattern) {
+ log2stderr("filename key cannot be empty.");
+ return false;
+ }
+
+ if(jb->pattern)
+ freez((char*)jb->pattern);
+
+ jb->pattern = strndupz(pattern, pattern_len);
+
+ return true;
+}
+
+bool log_job_include_pattern_set(LOG_JOB *jb, const char *pattern, size_t pattern_len) {
+ if(jb->filter.include.re) {
+ log2stderr("FILTER INCLUDE: there is already an include filter set");
+ return false;
+ }
+
+ if(!search_pattern_set(&jb->filter.include, pattern, pattern_len)) {
+ log2stderr("FILTER INCLUDE: failed: %s", jb->filter.include.error.txt);
+ return false;
+ }
+
+ return true;
+}
+
+bool log_job_exclude_pattern_set(LOG_JOB *jb, const char *pattern, size_t pattern_len) {
+ if(jb->filter.exclude.re) {
+ log2stderr("FILTER INCLUDE: there is already an exclude filter set");
+ return false;
+ }
+
+ if(!search_pattern_set(&jb->filter.exclude, pattern, pattern_len)) {
+ log2stderr("FILTER EXCLUDE: failed: %s", jb->filter.exclude.error.txt);
+ return false;
+ }
+
+ return true;
+}
+
+// ----------------------------------------------------------------------------
+
+static bool parse_rename(LOG_JOB *jb, const char *param) {
+ // Search for '=' in param
+ const char *equal_sign = strchr(param, '=');
+ if (!equal_sign || equal_sign == param) {
+ log2stderr("Error: Invalid rename format, '=' not found in %s", param);
+ return false;
+ }
+
+ const char *new_key = param;
+ size_t new_key_len = equal_sign - new_key;
+
+ const char *old_key = equal_sign + 1;
+ size_t old_key_len = strlen(old_key);
+
+ return log_job_rename_add(jb, new_key, new_key_len, old_key, old_key_len);
+}
+
+static bool is_symbol(char c) {
+ return !isalpha(c) && !isdigit(c) && !iscntrl(c);
+}
+
+struct {
+ const char *keyword;
+ int action;
+ RW_FLAGS flag;
+} rewrite_flags[] = {
+ {"match", 1, RW_MATCH_PCRE2},
+ {"match", 0, RW_MATCH_NON_EMPTY},
+
+ {"regex", 1, RW_MATCH_PCRE2},
+ {"regex", 0, RW_MATCH_NON_EMPTY},
+
+ {"pcre2", 1, RW_MATCH_PCRE2},
+ {"pcre2", 0, RW_MATCH_NON_EMPTY},
+
+ {"non_empty", 1, RW_MATCH_NON_EMPTY},
+ {"non_empty", 0, RW_MATCH_PCRE2},
+
+ {"non-empty", 1, RW_MATCH_NON_EMPTY},
+ {"non-empty", 0, RW_MATCH_PCRE2},
+
+ {"not_empty", 1, RW_MATCH_NON_EMPTY},
+ {"not_empty", 0, RW_MATCH_PCRE2},
+
+ {"not-empty", 1, RW_MATCH_NON_EMPTY},
+ {"not-empty", 0, RW_MATCH_PCRE2},
+
+ {"stop", 0, RW_DONT_STOP},
+ {"no-stop", 1, RW_DONT_STOP},
+ {"no_stop", 1, RW_DONT_STOP},
+ {"dont-stop", 1, RW_DONT_STOP},
+ {"dont_stop", 1, RW_DONT_STOP},
+ {"continue", 1, RW_DONT_STOP},
+ {"inject", 1, RW_INJECT},
+ {"existing", 0, RW_INJECT},
+};
+
+RW_FLAGS parse_rewrite_flags(const char *options) {
+ RW_FLAGS flags = RW_MATCH_PCRE2; // Default option
+
+ // Tokenize the input options using ","
+ char *token;
+ char *optionsCopy = strdup(options); // Make a copy to avoid modifying the original
+ token = strtok(optionsCopy, ",");
+
+ while (token != NULL) {
+ // Find the keyword-action mapping
+ bool found = false;
+
+ for (size_t i = 0; i < sizeof(rewrite_flags) / sizeof(rewrite_flags[0]); i++) {
+ if (strcmp(token, rewrite_flags[i].keyword) == 0) {
+ if (rewrite_flags[i].action == 1) {
+ flags |= rewrite_flags[i].flag; // Set the flag
+ } else {
+ flags &= ~rewrite_flags[i].flag; // Unset the flag
+ }
+
+ found = true;
+ }
+ }
+
+ if(!found)
+ log2stderr("Warning: rewrite options '%s' is not understood.", token);
+
+ // Get the next token
+ token = strtok(NULL, ",");
+ }
+
+ free(optionsCopy); // Free the copied string
+
+ return flags;
+}
+
+
+static bool parse_rewrite(LOG_JOB *jb, const char *param) {
+ // Search for '=' in param
+ const char *equal_sign = strchr(param, '=');
+ if (!equal_sign || equal_sign == param) {
+ log2stderr("Error: Invalid rewrite format, '=' not found in %s", param);
+ return false;
+ }
+
+ // Get the next character as the separator
+ char separator = *(equal_sign + 1);
+ if (!separator || !is_symbol(separator)) {
+ log2stderr("Error: rewrite separator not found after '=', or is not one of /\\|-# in: %s", param);
+ return false;
+ }
+
+ // Find the next occurrence of the separator
+ const char *second_separator = strchr(equal_sign + 2, separator);
+ if (!second_separator) {
+ log2stderr("Error: rewrite second separator not found in: %s", param);
+ return false;
+ }
+
+ // Check if the search pattern is empty
+ if (equal_sign + 1 == second_separator) {
+ log2stderr("Error: rewrite search pattern is empty in: %s", param);
+ return false;
+ }
+
+ // Check if the replacement pattern is empty
+ if (*(second_separator + 1) == '\0') {
+ log2stderr("Error: rewrite replacement pattern is empty in: %s", param);
+ return false;
+ }
+
+ RW_FLAGS flags = RW_MATCH_PCRE2;
+ const char *third_separator = strchr(second_separator + 1, separator);
+ if(third_separator)
+ flags = parse_rewrite_flags(third_separator + 1);
+
+ // Extract key, search pattern, and replacement pattern
+ char *key = strndupz(param, equal_sign - param);
+ char *search_pattern = strndupz(equal_sign + 2, second_separator - (equal_sign + 2));
+ char *replace_pattern = third_separator ? strndup(second_separator + 1, third_separator - (second_separator + 1)) : strdupz(second_separator + 1);
+
+ if(!*search_pattern)
+ flags &= ~RW_MATCH_PCRE2;
+
+ bool ret = log_job_rewrite_add(jb, key, flags, search_pattern, replace_pattern);
+
+ freez(key);
+ freez(search_pattern);
+ freez(replace_pattern);
+
+ return ret;
+}
+
+static bool parse_inject(LOG_JOB *jb, const char *value, bool unmatched) {
+ const char *equal = strchr(value, '=');
+ if (!equal) {
+ log2stderr("Error: injection '%s' does not have an equal sign.", value);
+ return false;
+ }
+
+ const char *key = value;
+ const char *val = equal + 1;
+ log_job_injection_add(jb, key, equal - key, val, strlen(val), unmatched);
+
+ return true;
+}
+
+bool log_job_command_line_parse_parameters(LOG_JOB *jb, int argc, char **argv) {
+ for (int i = 1; i < argc; i++) {
+ char *arg = argv[i];
+ if (strcmp(arg, "--help") == 0 || strcmp(arg, "-h") == 0) {
+ log_job_command_line_help(argv[0]);
+ exit(0);
+ }
+#if defined(NETDATA_DEV_MODE) || defined(NETDATA_INTERNAL_CHECKS)
+ else if(strcmp(arg, "--test") == 0) {
+ // logfmt_test();
+ json_test();
+ exit(1);
+ }
+#endif
+ else if (strcmp(arg, "--show-config") == 0) {
+ jb->show_config = true;
+ }
+ else {
+ char buffer[1024];
+ char *param = NULL;
+ char *value = NULL;
+
+ char *equal_sign = strchr(arg, '=');
+ if (equal_sign) {
+ copy_to_buffer(buffer, sizeof(buffer), arg, equal_sign - arg);
+ param = buffer;
+ value = equal_sign + 1;
+ }
+ else {
+ param = arg;
+ if (i + 1 < argc) {
+ value = argv[++i];
+ }
+ else {
+ if (!jb->pattern) {
+ log_job_pattern_set(jb, arg, strlen(arg));
+ continue;
+ } else {
+ log2stderr("Error: Multiple patterns detected. Specify only one pattern. The first is '%s', the second is '%s'", jb->pattern, arg);
+ return false;
+ }
+ }
+ }
+
+ if (strcmp(param, "--filename-key") == 0) {
+ if(!log_job_filename_key_set(jb, value, value ? strlen(value) : 0))
+ return false;
+ }
+ else if (strcmp(param, "--prefix") == 0) {
+ if(!log_job_key_prefix_set(jb, value, value ? strlen(value) : 0))
+ return false;
+ }
+#ifdef HAVE_LIBYAML
+ else if (strcmp(param, "-f") == 0 || strcmp(param, "--file") == 0) {
+ if (!yaml_parse_file(value, jb))
+ return false;
+ }
+ else if (strcmp(param, "-c") == 0 || strcmp(param, "--config") == 0) {
+ if (!yaml_parse_config(value, jb))
+ return false;
+ }
+#endif
+ else if (strcmp(param, "--unmatched-key") == 0)
+ hashed_key_set(&jb->unmatched.key, value);
+ else if (strcmp(param, "--inject") == 0) {
+ if (!parse_inject(jb, value, false))
+ return false;
+ }
+ else if (strcmp(param, "--inject-unmatched") == 0) {
+ if (!parse_inject(jb, value, true))
+ return false;
+ }
+ else if (strcmp(param, "--rewrite") == 0) {
+ if (!parse_rewrite(jb, value))
+ return false;
+ }
+ else if (strcmp(param, "--rename") == 0) {
+ if (!parse_rename(jb, value))
+ return false;
+ }
+ else if (strcmp(param, "--include") == 0) {
+ if (!log_job_include_pattern_set(jb, value, strlen(value)))
+ return false;
+ }
+ else if (strcmp(param, "--exclude") == 0) {
+ if (!log_job_exclude_pattern_set(jb, value, strlen(value)))
+ return false;
+ }
+ else {
+ i--;
+ if (!jb->pattern) {
+ log_job_pattern_set(jb, arg, strlen(arg));
+ continue;
+ } else {
+ log2stderr("Error: Multiple patterns detected. Specify only one pattern. The first is '%s', the second is '%s'", jb->pattern, arg);
+ return false;
+ }
+ }
+ }
+ }
+
+ // Check if a pattern is set and exactly one pattern is specified
+ if (!jb->pattern) {
+ log2stderr("Warning: pattern not specified. Try the default config with: -c default");
+ log_job_command_line_help(argv[0]);
+ return false;
+ }
+
+ return true;
+}
diff --git a/collectors/log2journal/log2journal-pattern.c b/collectors/log2journal/log2journal-pattern.c
new file mode 100644
index 00000000..4b7e9026
--- /dev/null
+++ b/collectors/log2journal/log2journal-pattern.c
@@ -0,0 +1,54 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "log2journal.h"
+
+void search_pattern_cleanup(SEARCH_PATTERN *sp) {
+ if(sp->pattern) {
+ freez((void *)sp->pattern);
+ sp->pattern = NULL;
+ }
+
+ if(sp->re) {
+ pcre2_code_free(sp->re);
+ sp->re = NULL;
+ }
+
+ if(sp->match_data) {
+ pcre2_match_data_free(sp->match_data);
+ sp->match_data = NULL;
+ }
+
+ txt_cleanup(&sp->error);
+}
+
+static void pcre2_error_message(SEARCH_PATTERN *sp, int rc, int pos) {
+ char msg[1024];
+ pcre2_get_error_in_buffer(msg, sizeof(msg), rc, pos);
+ txt_replace(&sp->error, msg, strlen(msg));
+}
+
+static inline bool compile_pcre2(SEARCH_PATTERN *sp) {
+ int error_number;
+ PCRE2_SIZE error_offset;
+ PCRE2_SPTR pattern_ptr = (PCRE2_SPTR)sp->pattern;
+
+ sp->re = pcre2_compile(pattern_ptr, PCRE2_ZERO_TERMINATED, 0, &error_number, &error_offset, NULL);
+ if (!sp->re) {
+ pcre2_error_message(sp, error_number, (int) error_offset);
+ return false;
+ }
+
+ return true;
+}
+
+bool search_pattern_set(SEARCH_PATTERN *sp, const char *search_pattern, size_t search_pattern_len) {
+ search_pattern_cleanup(sp);
+
+ sp->pattern = strndupz(search_pattern, search_pattern_len);
+ if (!compile_pcre2(sp))
+ return false;
+
+ sp->match_data = pcre2_match_data_create_from_pattern(sp->re, NULL);
+
+ return true;
+}
diff --git a/collectors/log2journal/log2journal-pcre2.c b/collectors/log2journal/log2journal-pcre2.c
new file mode 100644
index 00000000..185e6910
--- /dev/null
+++ b/collectors/log2journal/log2journal-pcre2.c
@@ -0,0 +1,139 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "log2journal.h"
+
+#define PCRE2_ERROR_LINE_MAX 1024
+#define PCRE2_KEY_MAX 1024
+
+struct pcre2_state {
+ LOG_JOB *jb;
+
+ const char *line;
+ uint32_t pos;
+ uint32_t key_start;
+
+ pcre2_code *re;
+ pcre2_match_data *match_data;
+
+ char key[PCRE2_KEY_MAX];
+ char msg[PCRE2_ERROR_LINE_MAX];
+};
+
+static inline void copy_and_convert_key(PCRE2_STATE *pcre2, const char *key) {
+ char *d = &pcre2->key[pcre2->key_start];
+ size_t remaining = sizeof(pcre2->key) - pcre2->key_start;
+
+ while(remaining >= 2 && *key) {
+ *d = journal_key_characters_map[(unsigned) (*key)];
+ remaining--;
+ key++;
+ d++;
+ }
+
+ *d = '\0';
+}
+
+static inline void jb_traverse_pcre2_named_groups_and_send_keys(PCRE2_STATE *pcre2, pcre2_code *re, pcre2_match_data *match_data, char *line) {
+ PCRE2_SIZE *ovector = pcre2_get_ovector_pointer(match_data);
+ uint32_t names_count;
+ pcre2_pattern_info(re, PCRE2_INFO_NAMECOUNT, &names_count);
+
+ if (names_count > 0) {
+ PCRE2_SPTR name_table;
+ pcre2_pattern_info(re, PCRE2_INFO_NAMETABLE, &name_table);
+ uint32_t name_entry_size;
+ pcre2_pattern_info(re, PCRE2_INFO_NAMEENTRYSIZE, &name_entry_size);
+
+ const unsigned char *table_ptr = name_table;
+ for (uint32_t i = 0; i < names_count; i++) {
+ int n = (table_ptr[0] << 8) | table_ptr[1];
+ const char *group_name = (const char *)(table_ptr + 2);
+
+ PCRE2_SIZE start_offset = ovector[2 * n];
+ PCRE2_SIZE end_offset = ovector[2 * n + 1];
+ PCRE2_SIZE group_length = end_offset - start_offset;
+
+ copy_and_convert_key(pcre2, group_name);
+ log_job_send_extracted_key_value(pcre2->jb, pcre2->key, line + start_offset, group_length);
+
+ table_ptr += name_entry_size;
+ }
+ }
+}
+
+void pcre2_get_error_in_buffer(char *msg, size_t msg_len, int rc, int pos) {
+ int l;
+
+ if(pos >= 0)
+ l = snprintf(msg, msg_len, "PCRE2 error %d at pos %d on: ", rc, pos);
+ else
+ l = snprintf(msg, msg_len, "PCRE2 error %d on: ", rc);
+
+ pcre2_get_error_message(rc, (PCRE2_UCHAR *)&msg[l], msg_len - l);
+}
+
+static void pcre2_error_message(PCRE2_STATE *pcre2, int rc, int pos) {
+ pcre2_get_error_in_buffer(pcre2->msg, sizeof(pcre2->msg), rc, pos);
+}
+
+bool pcre2_has_error(PCRE2_STATE *pcre2) {
+ return !pcre2->re || pcre2->msg[0];
+}
+
+PCRE2_STATE *pcre2_parser_create(LOG_JOB *jb) {
+ PCRE2_STATE *pcre2 = mallocz(sizeof(PCRE2_STATE));
+ memset(pcre2, 0, sizeof(PCRE2_STATE));
+ pcre2->jb = jb;
+
+ if(jb->prefix)
+ pcre2->key_start = copy_to_buffer(pcre2->key, sizeof(pcre2->key), pcre2->jb->prefix, strlen(pcre2->jb->prefix));
+
+ int rc;
+ PCRE2_SIZE pos;
+ pcre2->re = pcre2_compile((PCRE2_SPTR)jb->pattern, PCRE2_ZERO_TERMINATED, 0, &rc, &pos, NULL);
+ if (!pcre2->re) {
+ pcre2_error_message(pcre2, rc, pos);
+ return pcre2;
+ }
+
+ pcre2->match_data = pcre2_match_data_create_from_pattern(pcre2->re, NULL);
+
+ return pcre2;
+}
+
+void pcre2_parser_destroy(PCRE2_STATE *pcre2) {
+ if(pcre2)
+ freez(pcre2);
+}
+
+const char *pcre2_parser_error(PCRE2_STATE *pcre2) {
+ return pcre2->msg;
+}
+
+bool pcre2_parse_document(PCRE2_STATE *pcre2, const char *txt, size_t len) {
+ pcre2->line = txt;
+ pcre2->pos = 0;
+ pcre2->msg[0] = '\0';
+
+ if(!len)
+ len = strlen(txt);
+
+ int rc = pcre2_match(pcre2->re, (PCRE2_SPTR)pcre2->line, len, 0, 0, pcre2->match_data, NULL);
+ if(rc < 0) {
+ pcre2_error_message(pcre2, rc, -1);
+ return false;
+ }
+
+ jb_traverse_pcre2_named_groups_and_send_keys(pcre2, pcre2->re, pcre2->match_data, (char *)pcre2->line);
+
+ return true;
+}
+
+void pcre2_test(void) {
+ LOG_JOB jb = { .prefix = "NIGNX_" };
+ PCRE2_STATE *pcre2 = pcre2_parser_create(&jb);
+
+ pcre2_parse_document(pcre2, "{\"value\":\"\\u\\u039A\\u03B1\\u03BB\\u03B7\\u03BC\\u03AD\\u03C1\\u03B1\"}", 0);
+
+ pcre2_parser_destroy(pcre2);
+}
diff --git a/collectors/log2journal/log2journal-rename.c b/collectors/log2journal/log2journal-rename.c
new file mode 100644
index 00000000..c6975779
--- /dev/null
+++ b/collectors/log2journal/log2journal-rename.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "log2journal.h"
+
+void rename_cleanup(RENAME *rn) {
+ hashed_key_cleanup(&rn->new_key);
+ hashed_key_cleanup(&rn->old_key);
+}
+
+bool log_job_rename_add(LOG_JOB *jb, const char *new_key, size_t new_key_len, const char *old_key, size_t old_key_len) {
+ if(jb->renames.used >= MAX_RENAMES) {
+ log2stderr("Error: too many renames. You can rename up to %d fields.", MAX_RENAMES);
+ return false;
+ }
+
+ RENAME *rn = &jb->renames.array[jb->renames.used++];
+ hashed_key_len_set(&rn->new_key, new_key, new_key_len);
+ hashed_key_len_set(&rn->old_key, old_key, old_key_len);
+
+ return true;
+}
diff --git a/collectors/log2journal/log2journal-replace.c b/collectors/log2journal/log2journal-replace.c
new file mode 100644
index 00000000..429d615d
--- /dev/null
+++ b/collectors/log2journal/log2journal-replace.c
@@ -0,0 +1,111 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "log2journal.h"
+
+void replace_node_free(REPLACE_NODE *rpn) {
+ hashed_key_cleanup(&rpn->name);
+ rpn->next = NULL;
+ freez(rpn);
+}
+
+void replace_pattern_cleanup(REPLACE_PATTERN *rp) {
+ if(rp->pattern) {
+ freez((void *)rp->pattern);
+ rp->pattern = NULL;
+ }
+
+ while(rp->nodes) {
+ REPLACE_NODE *rpn = rp->nodes;
+ rp->nodes = rpn->next;
+ replace_node_free(rpn);
+ }
+}
+
+static REPLACE_NODE *replace_pattern_add_node(REPLACE_NODE **head, bool is_variable, const char *text) {
+ REPLACE_NODE *new_node = callocz(1, sizeof(REPLACE_NODE));
+ if (!new_node)
+ return NULL;
+
+ hashed_key_set(&new_node->name, text);
+ new_node->is_variable = is_variable;
+ new_node->next = NULL;
+
+ if (*head == NULL)
+ *head = new_node;
+
+ else {
+ REPLACE_NODE *current = *head;
+
+ // append it
+ while (current->next != NULL)
+ current = current->next;
+
+ current->next = new_node;
+ }
+
+ return new_node;
+}
+
+bool replace_pattern_set(REPLACE_PATTERN *rp, const char *pattern) {
+ replace_pattern_cleanup(rp);
+
+ rp->pattern = strdupz(pattern);
+ const char *current = rp->pattern;
+
+ while (*current != '\0') {
+ if (*current == '$' && *(current + 1) == '{') {
+ // Start of a variable
+ const char *end = strchr(current, '}');
+ if (!end) {
+ log2stderr("Error: Missing closing brace in replacement pattern: %s", rp->pattern);
+ return false;
+ }
+
+ size_t name_length = end - current - 2; // Length of the variable name
+ char *variable_name = strndupz(current + 2, name_length);
+ if (!variable_name) {
+ log2stderr("Error: Memory allocation failed for variable name.");
+ return false;
+ }
+
+ REPLACE_NODE *node = replace_pattern_add_node(&(rp->nodes), true, variable_name);
+ if (!node) {
+ freez(variable_name);
+ log2stderr("Error: Failed to add replacement node for variable.");
+ return false;
+ }
+
+ current = end + 1; // Move past the variable
+ }
+ else {
+ // Start of literal text
+ const char *start = current;
+ while (*current != '\0' && !(*current == '$' && *(current + 1) == '{')) {
+ current++;
+ }
+
+ size_t text_length = current - start;
+ char *text = strndupz(start, text_length);
+ if (!text) {
+ log2stderr("Error: Memory allocation failed for literal text.");
+ return false;
+ }
+
+ REPLACE_NODE *node = replace_pattern_add_node(&(rp->nodes), false, text);
+ if (!node) {
+ freez(text);
+ log2stderr("Error: Failed to add replacement node for text.");
+ return false;
+ }
+ }
+ }
+
+ for(REPLACE_NODE *node = rp->nodes; node; node = node->next) {
+ if(node->is_variable) {
+ rp->has_variables = true;
+ break;
+ }
+ }
+
+ return true;
+}
diff --git a/collectors/log2journal/log2journal-rewrite.c b/collectors/log2journal/log2journal-rewrite.c
new file mode 100644
index 00000000..112391bf
--- /dev/null
+++ b/collectors/log2journal/log2journal-rewrite.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "log2journal.h"
+
+void rewrite_cleanup(REWRITE *rw) {
+ hashed_key_cleanup(&rw->key);
+
+ if(rw->flags & RW_MATCH_PCRE2)
+ search_pattern_cleanup(&rw->match_pcre2);
+ else if(rw->flags & RW_MATCH_NON_EMPTY)
+ replace_pattern_cleanup(&rw->match_non_empty);
+
+ replace_pattern_cleanup(&rw->value);
+ rw->flags = RW_NONE;
+}
+
+bool log_job_rewrite_add(LOG_JOB *jb, const char *key, RW_FLAGS flags, const char *search_pattern, const char *replace_pattern) {
+ if(jb->rewrites.used >= MAX_REWRITES) {
+ log2stderr("Error: too many rewrites. You can add up to %d rewrite rules.", MAX_REWRITES);
+ return false;
+ }
+
+ if((flags & (RW_MATCH_PCRE2|RW_MATCH_NON_EMPTY)) && (!search_pattern || !*search_pattern)) {
+ log2stderr("Error: rewrite for key '%s' does not specify a search pattern.", key);
+ return false;
+ }
+
+ REWRITE *rw = &jb->rewrites.array[jb->rewrites.used++];
+ rw->flags = flags;
+
+ hashed_key_set(&rw->key, key);
+
+ if((flags & RW_MATCH_PCRE2) && !search_pattern_set(&rw->match_pcre2, search_pattern, strlen(search_pattern))) {
+ rewrite_cleanup(rw);
+ jb->rewrites.used--;
+ return false;
+ }
+ else if((flags & RW_MATCH_NON_EMPTY) && !replace_pattern_set(&rw->match_non_empty, search_pattern)) {
+ rewrite_cleanup(rw);
+ jb->rewrites.used--;
+ return false;
+ }
+
+ if(replace_pattern && *replace_pattern && !replace_pattern_set(&rw->value, replace_pattern)) {
+ rewrite_cleanup(rw);
+ jb->rewrites.used--;
+ return false;
+ }
+
+ return true;
+}
diff --git a/collectors/log2journal/log2journal-yaml.c b/collectors/log2journal/log2journal-yaml.c
new file mode 100644
index 00000000..862e7bf4
--- /dev/null
+++ b/collectors/log2journal/log2journal-yaml.c
@@ -0,0 +1,964 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "log2journal.h"
+
+// ----------------------------------------------------------------------------
+// yaml configuration file
+
+#ifdef HAVE_LIBYAML
+
+static const char *yaml_event_name(yaml_event_type_t type) {
+ switch (type) {
+ case YAML_NO_EVENT:
+ return "YAML_NO_EVENT";
+
+ case YAML_SCALAR_EVENT:
+ return "YAML_SCALAR_EVENT";
+
+ case YAML_ALIAS_EVENT:
+ return "YAML_ALIAS_EVENT";
+
+ case YAML_MAPPING_START_EVENT:
+ return "YAML_MAPPING_START_EVENT";
+
+ case YAML_MAPPING_END_EVENT:
+ return "YAML_MAPPING_END_EVENT";
+
+ case YAML_SEQUENCE_START_EVENT:
+ return "YAML_SEQUENCE_START_EVENT";
+
+ case YAML_SEQUENCE_END_EVENT:
+ return "YAML_SEQUENCE_END_EVENT";
+
+ case YAML_STREAM_START_EVENT:
+ return "YAML_STREAM_START_EVENT";
+
+ case YAML_STREAM_END_EVENT:
+ return "YAML_STREAM_END_EVENT";
+
+ case YAML_DOCUMENT_START_EVENT:
+ return "YAML_DOCUMENT_START_EVENT";
+
+ case YAML_DOCUMENT_END_EVENT:
+ return "YAML_DOCUMENT_END_EVENT";
+
+ default:
+ return "UNKNOWN";
+ }
+}
+
+#define yaml_error(parser, event, fmt, args...) yaml_error_with_trace(parser, event, __LINE__, __FUNCTION__, __FILE__, fmt, ##args)
+static void yaml_error_with_trace(yaml_parser_t *parser, yaml_event_t *event, size_t line, const char *function, const char *file, const char *format, ...) __attribute__ ((format(__printf__, 6, 7)));
+static void yaml_error_with_trace(yaml_parser_t *parser, yaml_event_t *event, size_t line, const char *function, const char *file, const char *format, ...) {
+ char buf[1024] = ""; // Initialize buf to an empty string
+ const char *type = "";
+
+ if(event) {
+ type = yaml_event_name(event->type);
+
+ switch (event->type) {
+ case YAML_SCALAR_EVENT:
+ copy_to_buffer(buf, sizeof(buf), (char *)event->data.scalar.value, event->data.scalar.length);
+ break;
+
+ case YAML_ALIAS_EVENT:
+ snprintf(buf, sizeof(buf), "%s", event->data.alias.anchor);
+ break;
+
+ default:
+ break;
+ }
+ }
+
+ fprintf(stderr, "YAML %zu@%s, %s(): (line %d, column %d, %s%s%s): ",
+ line, file, function,
+ (int)(parser->mark.line + 1), (int)(parser->mark.column + 1),
+ type, buf[0]? ", near ": "", buf);
+
+ va_list args;
+ va_start(args, format);
+ vfprintf(stderr, format, args);
+ va_end(args);
+ fprintf(stderr, "\n");
+}
+
+#define yaml_parse(parser, event) yaml_parse_with_trace(parser, event, __LINE__, __FUNCTION__, __FILE__)
+static bool yaml_parse_with_trace(yaml_parser_t *parser, yaml_event_t *event, size_t line __maybe_unused, const char *function __maybe_unused, const char *file __maybe_unused) {
+ if (!yaml_parser_parse(parser, event)) {
+ yaml_error(parser, NULL, "YAML parser error %d", parser->error);
+ return false;
+ }
+
+// fprintf(stderr, ">>> %s >>> %.*s\n",
+// yaml_event_name(event->type),
+// event->type == YAML_SCALAR_EVENT ? event->data.scalar.length : 0,
+// event->type == YAML_SCALAR_EVENT ? (char *)event->data.scalar.value : "");
+
+ return true;
+}
+
+#define yaml_parse_expect_event(parser, type) yaml_parse_expect_event_with_trace(parser, type, __LINE__, __FUNCTION__, __FILE__)
+static bool yaml_parse_expect_event_with_trace(yaml_parser_t *parser, yaml_event_type_t type, size_t line, const char *function, const char *file) {
+ yaml_event_t event;
+ if (!yaml_parse(parser, &event))
+ return false;
+
+ bool ret = true;
+ if(event.type != type) {
+ yaml_error_with_trace(parser, &event, line, function, file, "unexpected event - expecting: %s", yaml_event_name(type));
+ ret = false;
+ }
+// else
+// fprintf(stderr, "OK (%zu@%s, %s()\n", line, file, function);
+
+ yaml_event_delete(&event);
+ return ret;
+}
+
+#define yaml_scalar_matches(event, s, len) yaml_scalar_matches_with_trace(event, s, len, __LINE__, __FUNCTION__, __FILE__)
+static bool yaml_scalar_matches_with_trace(yaml_event_t *event, const char *s, size_t len, size_t line __maybe_unused, const char *function __maybe_unused, const char *file __maybe_unused) {
+ if(event->type != YAML_SCALAR_EVENT)
+ return false;
+
+ if(len != event->data.scalar.length)
+ return false;
+// else
+// fprintf(stderr, "OK (%zu@%s, %s()\n", line, file, function);
+
+ return strcmp((char *)event->data.scalar.value, s) == 0;
+}
+
+// ----------------------------------------------------------------------------
+
+static size_t yaml_parse_filename_injection(yaml_parser_t *parser, LOG_JOB *jb) {
+ yaml_event_t event;
+ size_t errors = 0;
+
+ if(!yaml_parse_expect_event(parser, YAML_MAPPING_START_EVENT))
+ return 1;
+
+ if (!yaml_parse(parser, &event))
+ return 1;
+
+ if (yaml_scalar_matches(&event, "key", strlen("key"))) {
+ yaml_event_t sub_event;
+ if (!yaml_parse(parser, &sub_event))
+ errors++;
+
+ else {
+ if (sub_event.type == YAML_SCALAR_EVENT) {
+ if(!log_job_filename_key_set(jb, (char *) sub_event.data.scalar.value,
+ sub_event.data.scalar.length))
+ errors++;
+ }
+
+ else {
+ yaml_error(parser, &sub_event, "expected the filename as %s", yaml_event_name(YAML_SCALAR_EVENT));
+ errors++;
+ }
+
+ yaml_event_delete(&sub_event);
+ }
+ }
+
+ if(!yaml_parse_expect_event(parser, YAML_MAPPING_END_EVENT))
+ errors++;
+
+ yaml_event_delete(&event);
+ return errors;
+}
+
+static size_t yaml_parse_filters(yaml_parser_t *parser, LOG_JOB *jb) {
+ if(!yaml_parse_expect_event(parser, YAML_MAPPING_START_EVENT))
+ return 1;
+
+ size_t errors = 0;
+ bool finished = false;
+
+ while(!errors && !finished) {
+ yaml_event_t event;
+
+ if(!yaml_parse(parser, &event))
+ return 1;
+
+ if(event.type == YAML_SCALAR_EVENT) {
+ if(yaml_scalar_matches(&event, "include", strlen("include"))) {
+ yaml_event_t sub_event;
+ if(!yaml_parse(parser, &sub_event))
+ errors++;
+
+ else {
+ if(sub_event.type == YAML_SCALAR_EVENT) {
+ if(!log_job_include_pattern_set(jb, (char *) sub_event.data.scalar.value,
+ sub_event.data.scalar.length))
+ errors++;
+ }
+
+ else {
+ yaml_error(parser, &sub_event, "expected the include as %s",
+ yaml_event_name(YAML_SCALAR_EVENT));
+ errors++;
+ }
+
+ yaml_event_delete(&sub_event);
+ }
+ }
+ else if(yaml_scalar_matches(&event, "exclude", strlen("exclude"))) {
+ yaml_event_t sub_event;
+ if(!yaml_parse(parser, &sub_event))
+ errors++;
+
+ else {
+ if(sub_event.type == YAML_SCALAR_EVENT) {
+ if(!log_job_exclude_pattern_set(jb,(char *) sub_event.data.scalar.value,
+ sub_event.data.scalar.length))
+ errors++;
+ }
+
+ else {
+ yaml_error(parser, &sub_event, "expected the exclude as %s",
+ yaml_event_name(YAML_SCALAR_EVENT));
+ errors++;
+ }
+
+ yaml_event_delete(&sub_event);
+ }
+ }
+ }
+ else if(event.type == YAML_MAPPING_END_EVENT)
+ finished = true;
+ else {
+ yaml_error(parser, &event, "expected %s or %s",
+ yaml_event_name(YAML_SCALAR_EVENT),
+ yaml_event_name(YAML_MAPPING_END_EVENT));
+ errors++;
+ }
+
+ yaml_event_delete(&event);
+ }
+
+ return errors;
+}
+
+static size_t yaml_parse_prefix(yaml_parser_t *parser, LOG_JOB *jb) {
+ yaml_event_t event;
+ size_t errors = 0;
+
+ if (!yaml_parse(parser, &event))
+ return 1;
+
+ if (event.type == YAML_SCALAR_EVENT) {
+ if(!log_job_key_prefix_set(jb, (char *) event.data.scalar.value, event.data.scalar.length))
+ errors++;
+ }
+
+ yaml_event_delete(&event);
+ return errors;
+}
+
+static bool yaml_parse_constant_field_injection(yaml_parser_t *parser, LOG_JOB *jb, bool unmatched) {
+ yaml_event_t event;
+ if (!yaml_parse(parser, &event) || event.type != YAML_SCALAR_EVENT) {
+ yaml_error(parser, &event, "Expected scalar for constant field injection key");
+ yaml_event_delete(&event);
+ return false;
+ }
+
+ char *key = strndupz((char *)event.data.scalar.value, event.data.scalar.length);
+ char *value = NULL;
+ bool ret = false;
+
+ yaml_event_delete(&event);
+
+ if (!yaml_parse(parser, &event) || event.type != YAML_SCALAR_EVENT) {
+ yaml_error(parser, &event, "Expected scalar for constant field injection value");
+ goto cleanup;
+ }
+
+ if(!yaml_scalar_matches(&event, "value", strlen("value"))) {
+ yaml_error(parser, &event, "Expected scalar 'value'");
+ goto cleanup;
+ }
+
+ if (!yaml_parse(parser, &event) || event.type != YAML_SCALAR_EVENT) {
+ yaml_error(parser, &event, "Expected scalar for constant field injection value");
+ goto cleanup;
+ }
+
+ value = strndupz((char *)event.data.scalar.value, event.data.scalar.length);
+
+ if(!log_job_injection_add(jb, key, strlen(key), value, strlen(value), unmatched))
+ ret = false;
+ else
+ ret = true;
+
+ ret = true;
+
+cleanup:
+ yaml_event_delete(&event);
+ freez(key);
+ freez(value);
+ return !ret ? 1 : 0;
+}
+
+static bool yaml_parse_injection_mapping(yaml_parser_t *parser, LOG_JOB *jb, bool unmatched) {
+ yaml_event_t event;
+ size_t errors = 0;
+ bool finished = false;
+
+ while (!errors && !finished) {
+ if (!yaml_parse(parser, &event)) {
+ errors++;
+ continue;
+ }
+
+ switch (event.type) {
+ case YAML_SCALAR_EVENT:
+ if (yaml_scalar_matches(&event, "key", strlen("key"))) {
+ errors += yaml_parse_constant_field_injection(parser, jb, unmatched);
+ } else {
+ yaml_error(parser, &event, "Unexpected scalar in injection mapping");
+ errors++;
+ }
+ break;
+
+ case YAML_MAPPING_END_EVENT:
+ finished = true;
+ break;
+
+ default:
+ yaml_error(parser, &event, "Unexpected event in injection mapping");
+ errors++;
+ break;
+ }
+
+ yaml_event_delete(&event);
+ }
+
+ return errors == 0;
+}
+
+static size_t yaml_parse_injections(yaml_parser_t *parser, LOG_JOB *jb, bool unmatched) {
+ yaml_event_t event;
+ size_t errors = 0;
+ bool finished = false;
+
+ if (!yaml_parse_expect_event(parser, YAML_SEQUENCE_START_EVENT))
+ return 1;
+
+ while (!errors && !finished) {
+ if (!yaml_parse(parser, &event)) {
+ errors++;
+ continue;
+ }
+
+ switch (event.type) {
+ case YAML_MAPPING_START_EVENT:
+ if (!yaml_parse_injection_mapping(parser, jb, unmatched))
+ errors++;
+ break;
+
+ case YAML_SEQUENCE_END_EVENT:
+ finished = true;
+ break;
+
+ default:
+ yaml_error(parser, &event, "Unexpected event in injections sequence");
+ errors++;
+ break;
+ }
+
+ yaml_event_delete(&event);
+ }
+
+ return errors;
+}
+
+static size_t yaml_parse_unmatched(yaml_parser_t *parser, LOG_JOB *jb) {
+ size_t errors = 0;
+ bool finished = false;
+
+ if (!yaml_parse_expect_event(parser, YAML_MAPPING_START_EVENT))
+ return 1;
+
+ while (!errors && !finished) {
+ yaml_event_t event;
+ if (!yaml_parse(parser, &event)) {
+ errors++;
+ continue;
+ }
+
+ switch (event.type) {
+ case YAML_SCALAR_EVENT:
+ if (yaml_scalar_matches(&event, "key", strlen("key"))) {
+ yaml_event_t sub_event;
+ if (!yaml_parse(parser, &sub_event)) {
+ errors++;
+ } else {
+ if (sub_event.type == YAML_SCALAR_EVENT) {
+ hashed_key_len_set(&jb->unmatched.key, (char *)sub_event.data.scalar.value, sub_event.data.scalar.length);
+ } else {
+ yaml_error(parser, &sub_event, "expected a scalar value for 'key'");
+ errors++;
+ }
+ yaml_event_delete(&sub_event);
+ }
+ } else if (yaml_scalar_matches(&event, "inject", strlen("inject"))) {
+ errors += yaml_parse_injections(parser, jb, true);
+ } else {
+ yaml_error(parser, &event, "Unexpected scalar in unmatched section");
+ errors++;
+ }
+ break;
+
+ case YAML_MAPPING_END_EVENT:
+ finished = true;
+ break;
+
+ default:
+ yaml_error(parser, &event, "Unexpected event in unmatched section");
+ errors++;
+ break;
+ }
+
+ yaml_event_delete(&event);
+ }
+
+ return errors;
+}
+
+static size_t yaml_parse_rewrites(yaml_parser_t *parser, LOG_JOB *jb) {
+ size_t errors = 0;
+
+ if (!yaml_parse_expect_event(parser, YAML_SEQUENCE_START_EVENT))
+ return 1;
+
+ bool finished = false;
+ while (!errors && !finished) {
+ yaml_event_t event;
+ if (!yaml_parse(parser, &event)) {
+ errors++;
+ continue;
+ }
+
+ switch (event.type) {
+ case YAML_MAPPING_START_EVENT:
+ {
+ RW_FLAGS flags = RW_NONE;
+ char *key = NULL;
+ char *search_pattern = NULL;
+ char *replace_pattern = NULL;
+
+ bool mapping_finished = false;
+ while (!errors && !mapping_finished) {
+ yaml_event_t sub_event;
+ if (!yaml_parse(parser, &sub_event)) {
+ errors++;
+ continue;
+ }
+
+ switch (sub_event.type) {
+ case YAML_SCALAR_EVENT:
+ if (yaml_scalar_matches(&sub_event, "key", strlen("key"))) {
+ if (!yaml_parse(parser, &sub_event) || sub_event.type != YAML_SCALAR_EVENT) {
+ yaml_error(parser, &sub_event, "Expected scalar for rewrite key");
+ errors++;
+ } else {
+ key = strndupz((char *)sub_event.data.scalar.value, sub_event.data.scalar.length);
+ yaml_event_delete(&sub_event);
+ }
+ } else if (yaml_scalar_matches(&sub_event, "match", strlen("match"))) {
+ if (!yaml_parse(parser, &sub_event) || sub_event.type != YAML_SCALAR_EVENT) {
+ yaml_error(parser, &sub_event, "Expected scalar for rewrite match PCRE2 pattern");
+ errors++;
+ }
+ else {
+ if(search_pattern)
+ freez(search_pattern);
+ flags |= RW_MATCH_PCRE2;
+ flags &= ~RW_MATCH_NON_EMPTY;
+ search_pattern = strndupz((char *)sub_event.data.scalar.value, sub_event.data.scalar.length);
+ yaml_event_delete(&sub_event);
+ }
+ } else if (yaml_scalar_matches(&sub_event, "not_empty", strlen("not_empty"))) {
+ if (!yaml_parse(parser, &sub_event) || sub_event.type != YAML_SCALAR_EVENT) {
+ yaml_error(parser, &sub_event, "Expected scalar for rewrite not empty condition");
+ errors++;
+ }
+ else {
+ if(search_pattern)
+ freez(search_pattern);
+ flags |= RW_MATCH_NON_EMPTY;
+ flags &= ~RW_MATCH_PCRE2;
+ search_pattern = strndupz((char *)sub_event.data.scalar.value, sub_event.data.scalar.length);
+ yaml_event_delete(&sub_event);
+ }
+ } else if (yaml_scalar_matches(&sub_event, "value", strlen("value"))) {
+ if (!yaml_parse(parser, &sub_event) || sub_event.type != YAML_SCALAR_EVENT) {
+ yaml_error(parser, &sub_event, "Expected scalar for rewrite value");
+ errors++;
+ } else {
+ replace_pattern = strndupz((char *)sub_event.data.scalar.value, sub_event.data.scalar.length);
+ yaml_event_delete(&sub_event);
+ }
+ } else if (yaml_scalar_matches(&sub_event, "stop", strlen("stop"))) {
+ if (!yaml_parse(parser, &sub_event) || sub_event.type != YAML_SCALAR_EVENT) {
+ yaml_error(parser, &sub_event, "Expected scalar for rewrite stop boolean");
+ errors++;
+ } else {
+ if(strncmp((char*)sub_event.data.scalar.value, "no", 2) == 0 ||
+ strncmp((char*)sub_event.data.scalar.value, "false", 5) == 0)
+ flags |= RW_DONT_STOP;
+ else
+ flags &= ~RW_DONT_STOP;
+
+ yaml_event_delete(&sub_event);
+ }
+ } else if (yaml_scalar_matches(&sub_event, "inject", strlen("inject"))) {
+ if (!yaml_parse(parser, &sub_event) || sub_event.type != YAML_SCALAR_EVENT) {
+ yaml_error(parser, &sub_event, "Expected scalar for rewrite inject boolean");
+ errors++;
+ } else {
+ if(strncmp((char*)sub_event.data.scalar.value, "yes", 3) == 0 ||
+ strncmp((char*)sub_event.data.scalar.value, "true", 4) == 0)
+ flags |= RW_INJECT;
+ else
+ flags &= ~RW_INJECT;
+
+ yaml_event_delete(&sub_event);
+ }
+ } else {
+ yaml_error(parser, &sub_event, "Unexpected scalar in rewrite mapping");
+ errors++;
+ }
+ break;
+
+ case YAML_MAPPING_END_EVENT:
+ if(key) {
+ if (!log_job_rewrite_add(jb, key, flags, search_pattern, replace_pattern))
+ errors++;
+ }
+
+ freez(key);
+ key = NULL;
+
+ freez(search_pattern);
+ search_pattern = NULL;
+
+ freez(replace_pattern);
+ replace_pattern = NULL;
+
+ flags = RW_NONE;
+
+ mapping_finished = true;
+ break;
+
+ default:
+ yaml_error(parser, &sub_event, "Unexpected event in rewrite mapping");
+ errors++;
+ break;
+ }
+
+ yaml_event_delete(&sub_event);
+ }
+ }
+ break;
+
+ case YAML_SEQUENCE_END_EVENT:
+ finished = true;
+ break;
+
+ default:
+ yaml_error(parser, &event, "Unexpected event in rewrites sequence");
+ errors++;
+ break;
+ }
+
+ yaml_event_delete(&event);
+ }
+
+ return errors;
+}
+
+static size_t yaml_parse_renames(yaml_parser_t *parser, LOG_JOB *jb) {
+ size_t errors = 0;
+
+ if (!yaml_parse_expect_event(parser, YAML_SEQUENCE_START_EVENT))
+ return 1;
+
+ bool finished = false;
+ while (!errors && !finished) {
+ yaml_event_t event;
+ if (!yaml_parse(parser, &event)) {
+ errors++;
+ continue;
+ }
+
+ switch (event.type) {
+ case YAML_MAPPING_START_EVENT:
+ {
+ struct key_rename rn = { 0 };
+
+ bool mapping_finished = false;
+ while (!errors && !mapping_finished) {
+ yaml_event_t sub_event;
+ if (!yaml_parse(parser, &sub_event)) {
+ errors++;
+ continue;
+ }
+
+ switch (sub_event.type) {
+ case YAML_SCALAR_EVENT:
+ if (yaml_scalar_matches(&sub_event, "new_key", strlen("new_key"))) {
+ if (!yaml_parse(parser, &sub_event) || sub_event.type != YAML_SCALAR_EVENT) {
+ yaml_error(parser, &sub_event, "Expected scalar for rename new_key");
+ errors++;
+ } else {
+ hashed_key_len_set(&rn.new_key, (char *)sub_event.data.scalar.value, sub_event.data.scalar.length);
+ yaml_event_delete(&sub_event);
+ }
+ } else if (yaml_scalar_matches(&sub_event, "old_key", strlen("old_key"))) {
+ if (!yaml_parse(parser, &sub_event) || sub_event.type != YAML_SCALAR_EVENT) {
+ yaml_error(parser, &sub_event, "Expected scalar for rename old_key");
+ errors++;
+ } else {
+ hashed_key_len_set(&rn.old_key, (char *)sub_event.data.scalar.value, sub_event.data.scalar.length);
+ yaml_event_delete(&sub_event);
+ }
+ } else {
+ yaml_error(parser, &sub_event, "Unexpected scalar in rewrite mapping");
+ errors++;
+ }
+ break;
+
+ case YAML_MAPPING_END_EVENT:
+ if(rn.old_key.key && rn.new_key.key) {
+ if (!log_job_rename_add(jb, rn.new_key.key, rn.new_key.len,
+ rn.old_key.key, rn.old_key.len))
+ errors++;
+ }
+ rename_cleanup(&rn);
+
+ mapping_finished = true;
+ break;
+
+ default:
+ yaml_error(parser, &sub_event, "Unexpected event in rewrite mapping");
+ errors++;
+ break;
+ }
+
+ yaml_event_delete(&sub_event);
+ }
+ }
+ break;
+
+ case YAML_SEQUENCE_END_EVENT:
+ finished = true;
+ break;
+
+ default:
+ yaml_error(parser, &event, "Unexpected event in rewrites sequence");
+ errors++;
+ break;
+ }
+
+ yaml_event_delete(&event);
+ }
+
+ return errors;
+}
+
+static size_t yaml_parse_pattern(yaml_parser_t *parser, LOG_JOB *jb) {
+ yaml_event_t event;
+ size_t errors = 0;
+
+ if (!yaml_parse(parser, &event))
+ return 1;
+
+ if(event.type == YAML_SCALAR_EVENT)
+ log_job_pattern_set(jb, (char *) event.data.scalar.value, event.data.scalar.length);
+ else {
+ yaml_error(parser, &event, "unexpected event type");
+ errors++;
+ }
+
+ yaml_event_delete(&event);
+ return errors;
+}
+
+static size_t yaml_parse_initialized(yaml_parser_t *parser, LOG_JOB *jb) {
+ size_t errors = 0;
+
+ if(!yaml_parse_expect_event(parser, YAML_STREAM_START_EVENT)) {
+ errors++;
+ goto cleanup;
+ }
+
+ if(!yaml_parse_expect_event(parser, YAML_DOCUMENT_START_EVENT)) {
+ errors++;
+ goto cleanup;
+ }
+
+ if(!yaml_parse_expect_event(parser, YAML_MAPPING_START_EVENT)) {
+ errors++;
+ goto cleanup;
+ }
+
+ bool finished = false;
+ while (!errors && !finished) {
+ yaml_event_t event;
+ if(!yaml_parse(parser, &event)) {
+ errors++;
+ continue;
+ }
+
+ switch(event.type) {
+ default:
+ yaml_error(parser, &event, "unexpected type");
+ errors++;
+ break;
+
+ case YAML_MAPPING_END_EVENT:
+ finished = true;
+ break;
+
+ case YAML_SCALAR_EVENT:
+ if (yaml_scalar_matches(&event, "pattern", strlen("pattern")))
+ errors += yaml_parse_pattern(parser, jb);
+
+ else if (yaml_scalar_matches(&event, "prefix", strlen("prefix")))
+ errors += yaml_parse_prefix(parser, jb);
+
+ else if (yaml_scalar_matches(&event, "filename", strlen("filename")))
+ errors += yaml_parse_filename_injection(parser, jb);
+
+ else if (yaml_scalar_matches(&event, "filter", strlen("filter")))
+ errors += yaml_parse_filters(parser, jb);
+
+ else if (yaml_scalar_matches(&event, "inject", strlen("inject")))
+ errors += yaml_parse_injections(parser, jb, false);
+
+ else if (yaml_scalar_matches(&event, "unmatched", strlen("unmatched")))
+ errors += yaml_parse_unmatched(parser, jb);
+
+ else if (yaml_scalar_matches(&event, "rewrite", strlen("rewrite")))
+ errors += yaml_parse_rewrites(parser, jb);
+
+ else if (yaml_scalar_matches(&event, "rename", strlen("rename")))
+ errors += yaml_parse_renames(parser, jb);
+
+ else {
+ yaml_error(parser, &event, "unexpected scalar");
+ errors++;
+ }
+ break;
+ }
+
+ yaml_event_delete(&event);
+ }
+
+ if(!errors && !yaml_parse_expect_event(parser, YAML_DOCUMENT_END_EVENT)) {
+ errors++;
+ goto cleanup;
+ }
+
+ if(!errors && !yaml_parse_expect_event(parser, YAML_STREAM_END_EVENT)) {
+ errors++;
+ goto cleanup;
+ }
+
+cleanup:
+ return errors;
+}
+
+bool yaml_parse_file(const char *config_file_path, LOG_JOB *jb) {
+ if(!config_file_path || !*config_file_path) {
+ log2stderr("yaml configuration filename cannot be empty.");
+ return false;
+ }
+
+ FILE *fp = fopen(config_file_path, "r");
+ if (!fp) {
+ log2stderr("Error opening config file: %s", config_file_path);
+ return false;
+ }
+
+ yaml_parser_t parser;
+ yaml_parser_initialize(&parser);
+ yaml_parser_set_input_file(&parser, fp);
+
+ size_t errors = yaml_parse_initialized(&parser, jb);
+
+ yaml_parser_delete(&parser);
+ fclose(fp);
+ return errors == 0;
+}
+
+bool yaml_parse_config(const char *config_name, LOG_JOB *jb) {
+ char filename[FILENAME_MAX + 1];
+
+ snprintf(filename, sizeof(filename), "%s/%s.yaml", LOG2JOURNAL_CONFIG_PATH, config_name);
+ return yaml_parse_file(filename, jb);
+}
+
+#endif // HAVE_LIBYAML
+
+// ----------------------------------------------------------------------------
+// printing yaml
+
+static void yaml_print_multiline_value(const char *s, size_t depth) {
+ if (!s)
+ s = "";
+
+ do {
+ const char* next = strchr(s, '\n');
+ if(next) next++;
+
+ size_t len = next ? (size_t)(next - s) : strlen(s);
+ char buf[len + 1];
+ copy_to_buffer(buf, sizeof(buf), s, len);
+
+ fprintf(stderr, "%.*s%s%s",
+ (int)(depth * 2), " ",
+ buf, next ? "" : "\n");
+
+ s = next;
+ } while(s && *s);
+}
+
+static bool needs_quotes_in_yaml(const char *str) {
+ // Lookup table for special YAML characters
+ static bool special_chars[256] = { false };
+ static bool table_initialized = false;
+
+ if (!table_initialized) {
+ // Initialize the lookup table
+ const char *special_chars_str = ":{}[],&*!|>'\"%@`^";
+ for (const char *c = special_chars_str; *c; ++c) {
+ special_chars[(unsigned char)*c] = true;
+ }
+ table_initialized = true;
+ }
+
+ while (*str) {
+ if (special_chars[(unsigned char)*str]) {
+ return true;
+ }
+ str++;
+ }
+ return false;
+}
+
+static void yaml_print_node(const char *key, const char *value, size_t depth, bool dash) {
+ if(depth > 10) depth = 10;
+ const char *quote = "'";
+
+ const char *second_line = NULL;
+ if(value && strchr(value, '\n')) {
+ second_line = value;
+ value = "|";
+ quote = "";
+ }
+ else if(!value || !needs_quotes_in_yaml(value))
+ quote = "";
+
+ fprintf(stderr, "%.*s%s%s%s%s%s%s\n",
+ (int)(depth * 2), " ", dash ? "- ": "",
+ key ? key : "", key ? ": " : "",
+ quote, value ? value : "", quote);
+
+ if(second_line) {
+ yaml_print_multiline_value(second_line, depth + 1);
+ }
+}
+
+void log_job_configuration_to_yaml(LOG_JOB *jb) {
+ if(jb->pattern)
+ yaml_print_node("pattern", jb->pattern, 0, false);
+
+ if(jb->prefix) {
+ fprintf(stderr, "\n");
+ yaml_print_node("prefix", jb->prefix, 0, false);
+ }
+
+ if(jb->filename.key.key) {
+ fprintf(stderr, "\n");
+ yaml_print_node("filename", NULL, 0, false);
+ yaml_print_node("key", jb->filename.key.key, 1, false);
+ }
+
+ if(jb->filter.include.pattern || jb->filter.exclude.pattern) {
+ fprintf(stderr, "\n");
+ yaml_print_node("filter", NULL, 0, false);
+
+ if(jb->filter.include.pattern)
+ yaml_print_node("include", jb->filter.include.pattern, 1, false);
+
+ if(jb->filter.exclude.pattern)
+ yaml_print_node("exclude", jb->filter.exclude.pattern, 1, false);
+ }
+
+ if(jb->renames.used) {
+ fprintf(stderr, "\n");
+ yaml_print_node("rename", NULL, 0, false);
+
+ for(size_t i = 0; i < jb->renames.used ;i++) {
+ yaml_print_node("new_key", jb->renames.array[i].new_key.key, 1, true);
+ yaml_print_node("old_key", jb->renames.array[i].old_key.key, 2, false);
+ }
+ }
+
+ if(jb->injections.used) {
+ fprintf(stderr, "\n");
+ yaml_print_node("inject", NULL, 0, false);
+
+ for (size_t i = 0; i < jb->injections.used; i++) {
+ yaml_print_node("key", jb->injections.keys[i].key.key, 1, true);
+ yaml_print_node("value", jb->injections.keys[i].value.pattern, 2, false);
+ }
+ }
+
+ if(jb->rewrites.used) {
+ fprintf(stderr, "\n");
+ yaml_print_node("rewrite", NULL, 0, false);
+
+ for(size_t i = 0; i < jb->rewrites.used ;i++) {
+ REWRITE *rw = &jb->rewrites.array[i];
+
+ yaml_print_node("key", rw->key.key, 1, true);
+
+ if(rw->flags & RW_MATCH_PCRE2)
+ yaml_print_node("match", rw->match_pcre2.pattern, 2, false);
+
+ else if(rw->flags & RW_MATCH_NON_EMPTY)
+ yaml_print_node("not_empty", rw->match_non_empty.pattern, 2, false);
+
+ yaml_print_node("value", rw->value.pattern, 2, false);
+
+ if(rw->flags & RW_INJECT)
+ yaml_print_node("inject", "yes", 2, false);
+
+ if(rw->flags & RW_DONT_STOP)
+ yaml_print_node("stop", "no", 2, false);
+ }
+ }
+
+ if(jb->unmatched.key.key || jb->unmatched.injections.used) {
+ fprintf(stderr, "\n");
+ yaml_print_node("unmatched", NULL, 0, false);
+
+ if(jb->unmatched.key.key)
+ yaml_print_node("key", jb->unmatched.key.key, 1, false);
+
+ if(jb->unmatched.injections.used) {
+ fprintf(stderr, "\n");
+ yaml_print_node("inject", NULL, 1, false);
+
+ for (size_t i = 0; i < jb->unmatched.injections.used; i++) {
+ yaml_print_node("key", jb->unmatched.injections.keys[i].key.key, 2, true);
+ yaml_print_node("value", jb->unmatched.injections.keys[i].value.pattern, 3, false);
+ }
+ }
+ }
+}
diff --git a/collectors/log2journal/log2journal.c b/collectors/log2journal/log2journal.c
new file mode 100644
index 00000000..c3204939
--- /dev/null
+++ b/collectors/log2journal/log2journal.c
@@ -0,0 +1,569 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "log2journal.h"
+
+// ----------------------------------------------------------------------------
+
+const char journal_key_characters_map[256] = {
+ // control characters
+ [0] = '\0', [1] = '_', [2] = '_', [3] = '_', [4] = '_', [5] = '_', [6] = '_', [7] = '_',
+ [8] = '_', [9] = '_', [10] = '_', [11] = '_', [12] = '_', [13] = '_', [14] = '_', [15] = '_',
+ [16] = '_', [17] = '_', [18] = '_', [19] = '_', [20] = '_', [21] = '_', [22] = '_', [23] = '_',
+ [24] = '_', [25] = '_', [26] = '_', [27] = '_', [28] = '_', [29] = '_', [30] = '_', [31] = '_',
+
+ // symbols
+ [' '] = '_', ['!'] = '_', ['"'] = '_', ['#'] = '_', ['$'] = '_', ['%'] = '_', ['&'] = '_', ['\''] = '_',
+ ['('] = '_', [')'] = '_', ['*'] = '_', ['+'] = '_', [','] = '_', ['-'] = '_', ['.'] = '_', ['/'] = '_',
+
+ // numbers
+ ['0'] = '0', ['1'] = '1', ['2'] = '2', ['3'] = '3', ['4'] = '4', ['5'] = '5', ['6'] = '6', ['7'] = '7',
+ ['8'] = '8', ['9'] = '9',
+
+ // symbols
+ [':'] = '_', [';'] = '_', ['<'] = '_', ['='] = '_', ['>'] = '_', ['?'] = '_', ['@'] = '_',
+
+ // capitals
+ ['A'] = 'A', ['B'] = 'B', ['C'] = 'C', ['D'] = 'D', ['E'] = 'E', ['F'] = 'F', ['G'] = 'G', ['H'] = 'H',
+ ['I'] = 'I', ['J'] = 'J', ['K'] = 'K', ['L'] = 'L', ['M'] = 'M', ['N'] = 'N', ['O'] = 'O', ['P'] = 'P',
+ ['Q'] = 'Q', ['R'] = 'R', ['S'] = 'S', ['T'] = 'T', ['U'] = 'U', ['V'] = 'V', ['W'] = 'W', ['X'] = 'X',
+ ['Y'] = 'Y', ['Z'] = 'Z',
+
+ // symbols
+ ['['] = '_', ['\\'] = '_', [']'] = '_', ['^'] = '_', ['_'] = '_', ['`'] = '_',
+
+ // lower to upper
+ ['a'] = 'A', ['b'] = 'B', ['c'] = 'C', ['d'] = 'D', ['e'] = 'E', ['f'] = 'F', ['g'] = 'G', ['h'] = 'H',
+ ['i'] = 'I', ['j'] = 'J', ['k'] = 'K', ['l'] = 'L', ['m'] = 'M', ['n'] = 'N', ['o'] = 'O', ['p'] = 'P',
+ ['q'] = 'Q', ['r'] = 'R', ['s'] = 'S', ['t'] = 'T', ['u'] = 'U', ['v'] = 'V', ['w'] = 'W', ['x'] = 'X',
+ ['y'] = 'Y', ['z'] = 'Z',
+
+ // symbols
+ ['{'] = '_', ['|'] = '_', ['}'] = '_', ['~'] = '_', [127] = '_', // Delete (DEL)
+
+ // Extended ASCII characters (128-255) set to underscore
+ [128] = '_', [129] = '_', [130] = '_', [131] = '_', [132] = '_', [133] = '_', [134] = '_', [135] = '_',
+ [136] = '_', [137] = '_', [138] = '_', [139] = '_', [140] = '_', [141] = '_', [142] = '_', [143] = '_',
+ [144] = '_', [145] = '_', [146] = '_', [147] = '_', [148] = '_', [149] = '_', [150] = '_', [151] = '_',
+ [152] = '_', [153] = '_', [154] = '_', [155] = '_', [156] = '_', [157] = '_', [158] = '_', [159] = '_',
+ [160] = '_', [161] = '_', [162] = '_', [163] = '_', [164] = '_', [165] = '_', [166] = '_', [167] = '_',
+ [168] = '_', [169] = '_', [170] = '_', [171] = '_', [172] = '_', [173] = '_', [174] = '_', [175] = '_',
+ [176] = '_', [177] = '_', [178] = '_', [179] = '_', [180] = '_', [181] = '_', [182] = '_', [183] = '_',
+ [184] = '_', [185] = '_', [186] = '_', [187] = '_', [188] = '_', [189] = '_', [190] = '_', [191] = '_',
+ [192] = '_', [193] = '_', [194] = '_', [195] = '_', [196] = '_', [197] = '_', [198] = '_', [199] = '_',
+ [200] = '_', [201] = '_', [202] = '_', [203] = '_', [204] = '_', [205] = '_', [206] = '_', [207] = '_',
+ [208] = '_', [209] = '_', [210] = '_', [211] = '_', [212] = '_', [213] = '_', [214] = '_', [215] = '_',
+ [216] = '_', [217] = '_', [218] = '_', [219] = '_', [220] = '_', [221] = '_', [222] = '_', [223] = '_',
+ [224] = '_', [225] = '_', [226] = '_', [227] = '_', [228] = '_', [229] = '_', [230] = '_', [231] = '_',
+ [232] = '_', [233] = '_', [234] = '_', [235] = '_', [236] = '_', [237] = '_', [238] = '_', [239] = '_',
+ [240] = '_', [241] = '_', [242] = '_', [243] = '_', [244] = '_', [245] = '_', [246] = '_', [247] = '_',
+ [248] = '_', [249] = '_', [250] = '_', [251] = '_', [252] = '_', [253] = '_', [254] = '_', [255] = '_',
+};
+
+// ----------------------------------------------------------------------------
+
+static inline HASHED_KEY *get_key_from_hashtable(LOG_JOB *jb, HASHED_KEY *k) {
+ if(k->flags & HK_HASHTABLE_ALLOCATED)
+ return k;
+
+ if(!k->hashtable_ptr) {
+ HASHED_KEY *ht_key;
+ SIMPLE_HASHTABLE_SLOT_KEY *slot = simple_hashtable_get_slot_KEY(&jb->hashtable, k->hash, true);
+ if((ht_key = SIMPLE_HASHTABLE_SLOT_DATA(slot))) {
+ if(!(ht_key->flags & HK_COLLISION_CHECKED)) {
+ ht_key->flags |= HK_COLLISION_CHECKED;
+
+ if(strcmp(ht_key->key, k->key) != 0)
+ log2stderr("Hashtable collision detected on key '%s' (hash %lx) and '%s' (hash %lx). "
+ "Please file a bug report.", ht_key->key, (unsigned long) ht_key->hash, k->key
+ , (unsigned long) k->hash
+ );
+ }
+ }
+ else {
+ ht_key = callocz(1, sizeof(HASHED_KEY));
+ ht_key->key = strdupz(k->key);
+ ht_key->len = k->len;
+ ht_key->hash = k->hash;
+ ht_key->flags = HK_HASHTABLE_ALLOCATED;
+
+ simple_hashtable_set_slot_KEY(&jb->hashtable, slot, ht_key->hash, ht_key);
+ }
+
+ k->hashtable_ptr = ht_key;
+ }
+
+ return k->hashtable_ptr;
+}
+
+static inline HASHED_KEY *get_key_from_hashtable_with_char_ptr(LOG_JOB *jb, const char *key) {
+ HASHED_KEY find = {
+ .key = key,
+ .len = strlen(key),
+ };
+ find.hash = XXH3_64bits(key, find.len);
+
+ return get_key_from_hashtable(jb, &find);
+}
+
+// ----------------------------------------------------------------------------
+
+static inline void validate_key(LOG_JOB *jb __maybe_unused, HASHED_KEY *k) {
+ if(k->len > JOURNAL_MAX_KEY_LEN)
+ log2stderr("WARNING: key '%s' has length %zu, which is more than %zu, the max systemd-journal allows",
+ k->key, (size_t)k->len, (size_t)JOURNAL_MAX_KEY_LEN);
+
+ for(size_t i = 0; i < k->len ;i++) {
+ char c = k->key[i];
+
+ if((c < 'A' || c > 'Z') && !isdigit(c) && c != '_') {
+ log2stderr("WARNING: key '%s' contains characters that are not allowed by systemd-journal.", k->key);
+ break;
+ }
+ }
+
+ if(isdigit(k->key[0]))
+ log2stderr("WARNING: key '%s' starts with a digit and may not be accepted by systemd-journal.", k->key);
+
+ if(k->key[0] == '_')
+ log2stderr("WARNING: key '%s' starts with an underscore, which makes it a systemd-journal trusted field. "
+ "Such fields are accepted by systemd-journal-remote, but not by systemd-journald.", k->key);
+}
+
+// ----------------------------------------------------------------------------
+
+static inline size_t replace_evaluate_to_buffer(LOG_JOB *jb, HASHED_KEY *k __maybe_unused, REPLACE_PATTERN *rp, char *dst, size_t dst_size) {
+ size_t remaining = dst_size;
+ char *copy_to = dst;
+
+ for(REPLACE_NODE *node = rp->nodes; node != NULL && remaining > 1; node = node->next) {
+ if(node->is_variable) {
+ if(hashed_keys_match(&node->name, &jb->line.key)) {
+ size_t copied = copy_to_buffer(copy_to, remaining, jb->line.trimmed, jb->line.trimmed_len);
+ copy_to += copied;
+ remaining -= copied;
+ }
+ else {
+ HASHED_KEY *ktmp = get_key_from_hashtable_with_char_ptr(jb, node->name.key);
+ if(ktmp->value.len) {
+ size_t copied = copy_to_buffer(copy_to, remaining, ktmp->value.txt, ktmp->value.len);
+ copy_to += copied;
+ remaining -= copied;
+ }
+ }
+ }
+ else {
+ size_t copied = copy_to_buffer(copy_to, remaining, node->name.key, node->name.len);
+ copy_to += copied;
+ remaining -= copied;
+ }
+ }
+
+ return copy_to - dst;
+}
+
+static inline void replace_evaluate(LOG_JOB *jb, HASHED_KEY *k, REPLACE_PATTERN *rp) {
+ HASHED_KEY *ht_key = get_key_from_hashtable(jb, k);
+
+ // set it to empty value
+ k->value.len = 0;
+
+ for(REPLACE_NODE *node = rp->nodes; node != NULL; node = node->next) {
+ if(node->is_variable) {
+ if(hashed_keys_match(&node->name, &jb->line.key))
+ txt_expand_and_append(&ht_key->value, jb->line.trimmed, jb->line.trimmed_len);
+
+ else {
+ HASHED_KEY *ktmp = get_key_from_hashtable_with_char_ptr(jb, node->name.key);
+ if(ktmp->value.len)
+ txt_expand_and_append(&ht_key->value, ktmp->value.txt, ktmp->value.len);
+ }
+ }
+ else
+ txt_expand_and_append(&ht_key->value, node->name.key, node->name.len);
+ }
+}
+
+static inline void replace_evaluate_from_pcre2(LOG_JOB *jb, HASHED_KEY *k, REPLACE_PATTERN *rp, SEARCH_PATTERN *sp) {
+ assert(k->flags & HK_HASHTABLE_ALLOCATED);
+
+ // set the temporary TEXT to zero length
+ jb->rewrites.tmp.len = 0;
+
+ PCRE2_SIZE *ovector = pcre2_get_ovector_pointer(sp->match_data);
+
+ // Iterate through the linked list of replacement nodes
+ for(REPLACE_NODE *node = rp->nodes; node != NULL; node = node->next) {
+ if(node->is_variable) {
+ int group_number = pcre2_substring_number_from_name(
+ sp->re, (PCRE2_SPTR) node->name.key);
+
+ if(group_number >= 0) {
+ PCRE2_SIZE start_offset = ovector[2 * group_number];
+ PCRE2_SIZE end_offset = ovector[2 * group_number + 1];
+ PCRE2_SIZE length = end_offset - start_offset;
+
+ txt_expand_and_append(&jb->rewrites.tmp, k->value.txt + start_offset, length);
+ }
+ else {
+ if(hashed_keys_match(&node->name, &jb->line.key))
+ txt_expand_and_append(&jb->rewrites.tmp, jb->line.trimmed, jb->line.trimmed_len);
+
+ else {
+ HASHED_KEY *ktmp = get_key_from_hashtable_with_char_ptr(jb, node->name.key);
+ if(ktmp->value.len)
+ txt_expand_and_append(&jb->rewrites.tmp, ktmp->value.txt, ktmp->value.len);
+ }
+ }
+ }
+ else {
+ txt_expand_and_append(&jb->rewrites.tmp, node->name.key, node->name.len);
+ }
+ }
+
+ // swap the values of the temporary TEXT and the key value
+ TEXT tmp = k->value;
+ k->value = jb->rewrites.tmp;
+ jb->rewrites.tmp = tmp;
+}
+
+static inline bool rewrite_conditions_satisfied(LOG_JOB *jb, HASHED_KEY *k, REWRITE *rw) {
+ assert(k->flags & HK_HASHTABLE_ALLOCATED);
+
+ if(rw->flags & RW_MATCH_PCRE2) {
+ return search_pattern_matches(&rw->match_pcre2, k->value.txt, k->value.len);
+ }
+ else if(rw->flags & RW_MATCH_NON_EMPTY) {
+ char buffer[2]; // we don't need a big buffer - we just check if anything is written
+ if(replace_evaluate_to_buffer(jb, k, &rw->match_non_empty, buffer, sizeof(buffer)))
+ // it copied something
+ return true;
+ else
+ // it copied nothing
+ return false;
+ }
+ else
+ // no conditions
+ return true;
+}
+
+// ----------------------------------------------------------------------------
+
+static inline HASHED_KEY *rename_key(LOG_JOB *jb, HASHED_KEY *k) {
+ if(!(k->flags & HK_RENAMES_CHECKED) || k->flags & HK_HAS_RENAMES) {
+ k->flags |= HK_RENAMES_CHECKED;
+
+ for(size_t i = 0; i < jb->renames.used; i++) {
+ RENAME *rn = &jb->renames.array[i];
+
+ if(hashed_keys_match(&rn->old_key, k)) {
+ k->flags |= HK_HAS_RENAMES;
+
+ return get_key_from_hashtable(jb, &rn->new_key);
+ }
+ }
+ }
+
+ return k;
+}
+
+// ----------------------------------------------------------------------------
+
+static inline void send_key_value_constant(LOG_JOB *jb __maybe_unused, HASHED_KEY *key, const char *value, size_t len) {
+ HASHED_KEY *ht_key = get_key_from_hashtable(jb, key);
+
+ txt_replace(&ht_key->value, value, len);
+ ht_key->flags |= HK_VALUE_FROM_LOG;
+
+ // fprintf(stderr, "SET %s=%.*s\n", ht_key->key, (int)ht_key->value.len, ht_key->value.txt);
+}
+
+static inline void send_key_value_error(LOG_JOB *jb, HASHED_KEY *key, const char *format, ...) __attribute__ ((format(__printf__, 3, 4)));
+static inline void send_key_value_error(LOG_JOB *jb, HASHED_KEY *key, const char *format, ...) {
+ HASHED_KEY *ht_key = get_key_from_hashtable(jb, key);
+
+ printf("%s=", ht_key->key);
+ va_list args;
+ va_start(args, format);
+ vprintf(format, args);
+ va_end(args);
+ printf("\n");
+}
+
+inline void log_job_send_extracted_key_value(LOG_JOB *jb, const char *key, const char *value, size_t len) {
+ HASHED_KEY *ht_key = get_key_from_hashtable_with_char_ptr(jb, key);
+ HASHED_KEY *nk = rename_key(jb, ht_key);
+ txt_replace(&nk->value, value, len);
+ ht_key->flags |= HK_VALUE_FROM_LOG;
+
+// fprintf(stderr, "SET %s=%.*s\n", ht_key->key, (int)ht_key->value.len, ht_key->value.txt);
+}
+
+static inline void log_job_process_rewrites(LOG_JOB *jb) {
+ for(size_t i = 0; i < jb->rewrites.used ;i++) {
+ REWRITE *rw = &jb->rewrites.array[i];
+
+ HASHED_KEY *k = get_key_from_hashtable(jb, &rw->key);
+
+ if(!(rw->flags & RW_INJECT) && !(k->flags & HK_VALUE_FROM_LOG) && !k->value.len)
+ continue;
+
+ if(!(k->flags & HK_VALUE_REWRITTEN) && rewrite_conditions_satisfied(jb, k, rw)) {
+ if(rw->flags & RW_MATCH_PCRE2)
+ replace_evaluate_from_pcre2(jb, k, &rw->value, &rw->match_pcre2);
+ else
+ replace_evaluate(jb, k, &rw->value);
+
+ if(!(rw->flags & RW_DONT_STOP))
+ k->flags |= HK_VALUE_REWRITTEN;
+
+// fprintf(stderr, "REWRITE %s=%.*s\n", k->key, (int)k->value.len, k->value.txt);
+ }
+ }
+}
+
+static inline void send_all_fields(LOG_JOB *jb) {
+ SIMPLE_HASHTABLE_SORTED_FOREACH_READ_ONLY(&jb->hashtable, kptr, HASHED_KEY, _KEY) {
+ HASHED_KEY *k = SIMPLE_HASHTABLE_SORTED_FOREACH_READ_ONLY_VALUE(kptr);
+
+ if(k->value.len) {
+ // the key exists and has some value
+
+ if(!(k->flags & HK_FILTERED)) {
+ k->flags |= HK_FILTERED;
+
+ bool included = jb->filter.include.re ? search_pattern_matches(&jb->filter.include, k->key, k->len) : true;
+ bool excluded = jb->filter.exclude.re ? search_pattern_matches(&jb->filter.exclude, k->key, k->len) : false;
+
+ if(included && !excluded)
+ k->flags |= HK_FILTERED_INCLUDED;
+ else
+ k->flags &= ~HK_FILTERED_INCLUDED;
+
+ // log some error if the key does not comply to journal standards
+ validate_key(jb, k);
+ }
+
+ if(k->flags & HK_FILTERED_INCLUDED)
+ printf("%s=%.*s\n", k->key, (int)k->value.len, k->value.txt);
+
+ // reset it for the next round
+ k->value.txt[0] = '\0';
+ k->value.len = 0;
+ }
+
+ k->flags &= ~(HK_VALUE_REWRITTEN | HK_VALUE_FROM_LOG);
+ }
+}
+
+// ----------------------------------------------------------------------------
+// injection of constant fields
+
+static void select_which_injections_should_be_injected_on_unmatched(LOG_JOB *jb) {
+ // mark all injections to be added to unmatched logs
+ for(size_t i = 0; i < jb->injections.used ; i++)
+ jb->injections.keys[i].on_unmatched = true;
+
+ if(jb->injections.used && jb->unmatched.injections.used) {
+ // we have both injections and injections on unmatched
+
+ // we find all the injections that are also configured as injections on unmatched,
+ // and we disable them, so that the output will not have the same key twice
+
+ for(size_t i = 0; i < jb->injections.used ;i++) {
+ for(size_t u = 0; u < jb->unmatched.injections.used ; u++) {
+ if(strcmp(jb->injections.keys[i].key.key, jb->unmatched.injections.keys[u].key.key) == 0)
+ jb->injections.keys[i].on_unmatched = false;
+ }
+ }
+ }
+}
+
+
+static inline void jb_finalize_injections(LOG_JOB *jb, bool line_is_matched) {
+ for (size_t j = 0; j < jb->injections.used; j++) {
+ if(!line_is_matched && !jb->injections.keys[j].on_unmatched)
+ continue;
+
+ INJECTION *inj = &jb->injections.keys[j];
+
+ replace_evaluate(jb, &inj->key, &inj->value);
+ }
+}
+
+// ----------------------------------------------------------------------------
+// filename injection
+
+static inline void jb_inject_filename(LOG_JOB *jb) {
+ if (jb->filename.key.key && jb->filename.current.len)
+ send_key_value_constant(jb, &jb->filename.key, jb->filename.current.txt, jb->filename.current.len);
+}
+
+static inline bool jb_switched_filename(LOG_JOB *jb, const char *line, size_t len) {
+ // IMPORTANT:
+ // Return TRUE when the caller should skip this line (because it is ours).
+ // Unfortunately, we have to consume empty lines too.
+
+ // IMPORTANT:
+ // filename may not be NULL terminated and have more data than the filename.
+
+ if (!len) {
+ jb->filename.last_line_was_empty = true;
+ return true;
+ }
+
+ // Check if it's a log file change line
+ if (jb->filename.last_line_was_empty && line[0] == '=' && strncmp(line, "==> ", 4) == 0) {
+ const char *start = line + 4;
+ const char *end = strstr(line, " <==");
+ while (*start == ' ') start++;
+ if (*start != '\n' && *start != '\0' && end) {
+ txt_replace(&jb->filename.current, start, end - start);
+ return true;
+ }
+ }
+
+ jb->filename.last_line_was_empty = false;
+ return false;
+}
+
+static inline bool jb_send_unmatched_line(LOG_JOB *jb, const char *line) {
+ if (!jb->unmatched.key.key)
+ return false;
+
+ // we are sending errors to systemd-journal
+ send_key_value_error(jb, &jb->unmatched.key, "Parsing error on: %s", line);
+
+ for (size_t j = 0; j < jb->unmatched.injections.used; j++) {
+ INJECTION *inj = &jb->unmatched.injections.keys[j];
+
+ replace_evaluate(jb, &inj->key, &inj->value);
+ }
+
+ return true;
+}
+
+// ----------------------------------------------------------------------------
+// running a job
+
+static char *get_next_line(LOG_JOB *jb __maybe_unused, char *buffer, size_t size, size_t *line_length) {
+ if(!fgets(buffer, (int)size, stdin)) {
+ *line_length = 0;
+ return NULL;
+ }
+
+ char *line = buffer;
+ size_t len = strlen(line);
+
+ // remove trailing newlines and spaces
+ while(len > 1 && (line[len - 1] == '\n' || isspace(line[len - 1])))
+ line[--len] = '\0';
+
+ // skip leading spaces
+ while(isspace(*line)) {
+ line++;
+ len--;
+ }
+
+ *line_length = len;
+ return line;
+}
+
+int log_job_run(LOG_JOB *jb) {
+ select_which_injections_should_be_injected_on_unmatched(jb);
+
+ PCRE2_STATE *pcre2 = NULL;
+ LOG_JSON_STATE *json = NULL;
+ LOGFMT_STATE *logfmt = NULL;
+
+ if(strcmp(jb->pattern, "json") == 0) {
+ json = json_parser_create(jb);
+ // never fails
+ }
+ else if(strcmp(jb->pattern, "logfmt") == 0) {
+ logfmt = logfmt_parser_create(jb);
+ // never fails
+ }
+ else if(strcmp(jb->pattern, "none") != 0) {
+ pcre2 = pcre2_parser_create(jb);
+ if(pcre2_has_error(pcre2)) {
+ log2stderr("%s", pcre2_parser_error(pcre2));
+ pcre2_parser_destroy(pcre2);
+ return 1;
+ }
+ }
+
+ jb->line.buffer = mallocz(MAX_LINE_LENGTH + 1);
+ jb->line.size = MAX_LINE_LENGTH + 1;
+ jb->line.trimmed_len = 0;
+ jb->line.trimmed = jb->line.buffer;
+
+ while ((jb->line.trimmed = get_next_line(jb, (char *)jb->line.buffer, jb->line.size, &jb->line.trimmed_len))) {
+ const char *line = jb->line.trimmed;
+ size_t len = jb->line.trimmed_len;
+
+ if(jb_switched_filename(jb, line, len))
+ continue;
+
+ bool line_is_matched = true;
+
+ if(json)
+ line_is_matched = json_parse_document(json, line);
+ else if(logfmt)
+ line_is_matched = logfmt_parse_document(logfmt, line);
+ else if(pcre2)
+ line_is_matched = pcre2_parse_document(pcre2, line, len);
+
+ if(!line_is_matched) {
+ if(json)
+ log2stderr("%s", json_parser_error(json));
+ else if(logfmt)
+ log2stderr("%s", logfmt_parser_error(logfmt));
+ else if(pcre2)
+ log2stderr("%s", pcre2_parser_error(pcre2));
+
+ if(!jb_send_unmatched_line(jb, line))
+ // just logging to stderr, not sending unmatched lines
+ continue;
+ }
+
+ jb_inject_filename(jb);
+ jb_finalize_injections(jb, line_is_matched);
+
+ log_job_process_rewrites(jb);
+ send_all_fields(jb);
+ printf("\n");
+ fflush(stdout);
+ }
+
+ if(json)
+ json_parser_destroy(json);
+
+ else if(logfmt)
+ logfmt_parser_destroy(logfmt);
+
+ else if(pcre2)
+ pcre2_parser_destroy(pcre2);
+
+ freez((void *)jb->line.buffer);
+
+ return 0;
+}
+
+// ----------------------------------------------------------------------------
+
+int main(int argc, char *argv[]) {
+ LOG_JOB log_job;
+
+ log_job_init(&log_job);
+
+ if(!log_job_command_line_parse_parameters(&log_job, argc, argv))
+ exit(1);
+
+ if(log_job.show_config)
+ log_job_configuration_to_yaml(&log_job);
+
+ int ret = log_job_run(&log_job);
+
+ log_job_cleanup(&log_job);
+ return ret;
+}
diff --git a/collectors/log2journal/log2journal.d/default.yaml b/collectors/log2journal/log2journal.d/default.yaml
new file mode 100644
index 00000000..d41efc4a
--- /dev/null
+++ b/collectors/log2journal/log2journal.d/default.yaml
@@ -0,0 +1,15 @@
+pattern: none
+
+filename:
+ key: LOG_FILENAME
+
+inject:
+ - key: MESSAGE
+ value: '${LINE}' # a special variable that resolves to the whole line read from the log
+
+ - key: PRIORITY
+ value: 6 # Valid PRIORITIES: 0=emerg, 1=alert, 2=crit, 3=error, 4=warn, 5=notice, 6=info, 7=debug
+
+ - key: SYSLOG_IDENTIFIER
+ value: log2journal # the name of the application sending the logs
+
diff --git a/collectors/log2journal/log2journal.d/nginx-combined.yaml b/collectors/log2journal/log2journal.d/nginx-combined.yaml
new file mode 100644
index 00000000..003c774d
--- /dev/null
+++ b/collectors/log2journal/log2journal.d/nginx-combined.yaml
@@ -0,0 +1,91 @@
+# Netdata log2journal Configuration
+# The following parses nginx log files using the combined format.
+
+# The PCRE2 pattern to match log entries and give names to the fields.
+# The journal will have these names, so follow their rules. You can
+# initiate an extended PCRE2 pattern by starting the pattern with (?x)
+pattern: |
+ (?x) # Enable PCRE2 extended mode
+ ^
+ (?<NGINX_REMOTE_ADDR>[^ ]+) \s - \s # NGINX_REMOTE_ADDR
+ (?<NGINX_REMOTE_USER>[^ ]+) \s # NGINX_REMOTE_USER
+ \[
+ (?<NGINX_TIME_LOCAL>[^\]]+) # NGINX_TIME_LOCAL
+ \]
+ \s+ "
+ (?<NGINX_REQUEST>
+ (?<NGINX_REQUEST_METHOD>[A-Z]+) \s+ # NGINX_METHOD
+ (?<NGINX_REQUEST_URI>[^ ]+) \s+
+ (?<NGINX_SERVER_PROTOCOL>[^"]+)
+ )
+ " \s+
+ (?<NGINX_STATUS>\d+) \s+ # NGINX_STATUS
+ (?<NGINX_BODY_BYTES_SENT>\d+) \s+ # NGINX_BODY_BYTES_SENT
+ "(?<NGINX_HTTP_REFERER>[^"]*)" \s+ # NGINX_HTTP_REFERER
+ "(?<NGINX_HTTP_USER_AGENT>[^"]*)" # NGINX_HTTP_USER_AGENT
+
+# When log2journal can detect the filename of each log entry (tail gives it
+# only when it tails multiple files), this key will be used to send the
+# filename to the journals.
+filename:
+ key: NGINX_LOG_FILENAME
+
+rename:
+ - new_key: MESSAGE
+ old_key: NGINX_REQUEST
+
+# Inject constant fields into the journal logs.
+inject:
+ - key: SYSLOG_IDENTIFIER
+ value: nginx-log
+
+ # inject PRIORITY is a duplicate of NGINX_STATUS
+ - key: PRIORITY
+ value: '${NGINX_STATUS}'
+
+ # Inject NGINX_STATUS_FAMILY is a duplicate of NGINX_STATUS
+ - key: NGINX_STATUS_FAMILY
+ value: '${NGINX_STATUS}'
+
+# Rewrite the value of fields (including the duplicated ones).
+# The search pattern can have named groups, and the replace pattern can use
+# them as ${name}.
+rewrite:
+ # PRIORITY is a duplicate of NGINX_STATUS
+ # Valid PRIORITIES: 0=emerg, 1=alert, 2=crit, 3=error, 4=warn, 5=notice, 6=info, 7=debug
+ - key: PRIORITY
+ match: '^[123]'
+ value: 6
+
+ - key: PRIORITY
+ match: '^4'
+ value: 5
+
+ - key: PRIORITY
+ match: '^5'
+ value: 3
+
+ - key: PRIORITY
+ match: '.*'
+ value: 4
+
+ # NGINX_STATUS_FAMILY is a duplicate of NGINX_STATUS
+ - key: NGINX_STATUS_FAMILY
+ match: '^(?<first_digit>[1-5])'
+ value: '${first_digit}xx'
+
+ - key: NGINX_STATUS_FAMILY
+ match: '.*'
+ value: 'UNKNOWN'
+
+# Control what to do when input logs do not match the main PCRE2 pattern.
+unmatched:
+ # The journal key to log the PCRE2 error message to.
+ # Set this to MESSAGE, so you to see the error in the log.
+ key: MESSAGE
+
+ # Inject static fields to the unmatched entries.
+ # Set PRIORITY=1 (alert) to help you spot unmatched entries in the logs.
+ inject:
+ - key: PRIORITY
+ value: 1
diff --git a/collectors/log2journal/log2journal.d/nginx-json.yaml b/collectors/log2journal/log2journal.d/nginx-json.yaml
new file mode 100644
index 00000000..7fdc4be5
--- /dev/null
+++ b/collectors/log2journal/log2journal.d/nginx-json.yaml
@@ -0,0 +1,164 @@
+# For all nginx variables, check this:
+# https://nginx.org/en/docs/http/ngx_http_core_module.html#var_connection_requests
+
+pattern: json
+
+prefix: NGINX_
+
+# When log2journal can detect the filename of each log entry (tail gives it
+# only when it tails multiple files), this key will be used to send the
+# filename to the journals.
+filename:
+ key: NGINX_LOG_FILENAME
+
+filter:
+ exclude: '^(NGINX_BINARY_REMOTE_ADDR)$'
+
+rename:
+ - new_key: MESSAGE
+ old_key: NGINX_REQUEST
+
+ # args is an alias for query_string
+ - new_key: NGINX_QUERY_STRING
+ old_key: NGINX_ARGS
+
+ # document_uri is an alias for uri
+ - new_key: NGINX_URI
+ old_key: NGINX_DOCUMENT_URI
+
+ # is_args states if the request had a query string or not
+ - new_key: NGINX_HAS_QUERY_STRING
+ old_key: NGINX_IS_ARGS
+
+ # msec is the timestamp in seconds, with fractional digits for milliseconds
+ - new_key: NGINX_TIMESTAMP_SEC
+ old_key: NGINX_MSEC
+
+ # nginx_version is already prefixed with nginx, let's remove one of them
+ - new_key: NGINX_VERSION
+ old_key: NGINX_NGINX_VERSION
+
+ # pipe states if the request was pipelined or not
+ - new_key: NGINX_PIPELINED
+ old_key: NGINX_PIPE
+
+ # rename numeric TLVs to their names
+ - new_key: NGINX_PROXY_PROTOCOL_TLV_ALPN
+ old_key: NGINX_PROXY_PROTOCOL_TLV_0X01
+ - new_key: NGINX_PROXY_PROTOCOL_TLV_AUTHORITY
+ old_key: NGINX_PROXY_PROTOCOL_TLV_0X02
+ - new_key: NGINX_PROXY_PROTOCOL_TLV_UNIQUE_ID
+ old_key: NGINX_PROXY_PROTOCOL_TLV_0X05
+ - new_key: NGINX_PROXY_PROTOCOL_TLV_SSL
+ old_key: NGINX_PROXY_PROTOCOL_TLV_0X20
+ - new_key: NGINX_PROXY_PROTOCOL_TLV_NETNS
+ old_key: NGINX_PROXY_PROTOCOL_TLV_0X30
+
+ # rename numeric SSL TLVs to their names
+ - new_key: NGINX_PROXY_PROTOCOL_TLV_SSL_VERSION
+ old_key: NGINX_PROXY_PROTOCOL_TLV_SSL_0X21
+ - new_key: NGINX_PROXY_PROTOCOL_TLV_SSL_CN
+ old_key: NGINX_PROXY_PROTOCOL_TLV_SSL_0X22
+ - new_key: NGINX_PROXY_PROTOCOL_TLV_SSL_CIPHER
+ old_key: NGINX_PROXY_PROTOCOL_TLV_SSL_0X23
+ - new_key: NGINX_PROXY_PROTOCOL_TLV_SSL_SIG_ALG
+ old_key: NGINX_PROXY_PROTOCOL_TLV_SSL_0X24
+ - new_key: NGINX_PROXY_PROTOCOL_TLV_SSL_KEY_ALG
+ old_key: NGINX_PROXY_PROTOCOL_TLV_SSL_0X25
+
+# Inject constant fields into the journal logs.
+inject:
+ - key: SYSLOG_IDENTIFIER
+ value: nginx-log
+
+ # inject PRIORITY is a duplicate of NGINX_STATUS
+ - key: PRIORITY
+ value: '${NGINX_STATUS}'
+
+ # Inject NGINX_STATUS_FAMILY is a duplicate of NGINX_STATUS
+ - key: NGINX_STATUS_FAMILY
+ value: '${NGINX_STATUS}'
+
+
+# Rewrite the value of fields (including the duplicated ones).
+# The search pattern can have named groups, and the replace pattern can use
+# them as ${name}.
+rewrite:
+ # a ? means it has query string, everything else means it does not
+ - key: NGINX_HAS_QUERY_STRING
+ match: '^\?$'
+ value: yes
+ - key: NGINX_HAS_QUERY_STRING
+ match: '.*'
+ value: no
+
+ # 'on' means it was HTTPS, everything else means it was not
+ - key: NGINX_HTTPS
+ match: '^on$'
+ value: yes
+ - key: NGINX_HTTPS
+ match: '.*'
+ value: no
+
+ # 'p' means it was pipelined, everything else means it was not
+ - key: NGINX_PIPELINED
+ match: '^p$'
+ value: yes
+ - key: NGINX_PIPELINED
+ match: '.*'
+ value: no
+
+ # zero means client sent a certificate and it was verified, non-zero means otherwise
+ - key: NGINX_PROXY_PROTOCOL_TLV_SSL_VERIFY
+ match: '^0$'
+ value: yes
+ - key: NGINX_PROXY_PROTOCOL_TLV_SSL_VERIFY
+ match: '.*'
+ value: no
+
+ # 'OK' means request completed, everything else means it didn't
+ - key: NGINX_REQUEST_COMPLETION
+ match: '^OK$'
+ value: 'completed'
+ - key: NGINX_REQUEST_COMPLETION
+ match: '.*'
+ value: 'not completed'
+
+ # PRIORTY is a duplicate of NGINX_STATUS
+ # Valid PRIORITIES: 0=emerg, 1=alert, 2=crit, 3=error, 4=warn, 5=notice, 6=info, 7=debug
+ - key: PRIORITY
+ match: '^[123]'
+ value: 6
+
+ - key: PRIORITY
+ match: '^4'
+ value: 5
+
+ - key: PRIORITY
+ match: '^5'
+ value: 3
+
+ - key: PRIORITY
+ match: '.*'
+ value: 4
+
+ # NGINX_STATUS_FAMILY is a duplicate of NGINX_STATUS
+ - key: NGINX_STATUS_FAMILY
+ match: '^(?<first_digit>[1-5])'
+ value: '${first_digit}xx'
+
+ - key: NGINX_STATUS_FAMILY
+ match: '.*'
+ value: 'UNKNOWN'
+
+# Control what to do when input logs do not match the main PCRE2 pattern.
+unmatched:
+ # The journal key to log the PCRE2 error message to.
+ # Set this to MESSAGE, so you to see the error in the log.
+ key: MESSAGE
+
+ # Inject static fields to the unmatched entries.
+ # Set PRIORITY=1 (alert) to help you spot unmatched entries in the logs.
+ inject:
+ - key: PRIORITY
+ value: 1
diff --git a/collectors/log2journal/log2journal.h b/collectors/log2journal/log2journal.h
new file mode 100644
index 00000000..834a5b13
--- /dev/null
+++ b/collectors/log2journal/log2journal.h
@@ -0,0 +1,501 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_LOG2JOURNAL_H
+#define NETDATA_LOG2JOURNAL_H
+
+// only for PACKAGE_VERSION
+#include "config.h"
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <dirent.h>
+#include <string.h>
+#include <stdbool.h>
+#include <string.h>
+#include <ctype.h>
+#include <math.h>
+#include <stdarg.h>
+#include <assert.h>
+
+// ----------------------------------------------------------------------------
+// logging
+
+// enable the compiler to check for printf like errors on our log2stderr() function
+static inline void log2stderr(const char *format, ...) __attribute__ ((format(__printf__, 1, 2)));
+static inline void log2stderr(const char *format, ...) {
+ va_list args;
+ va_start(args, format);
+ vfprintf(stderr, format, args);
+ va_end(args);
+ fprintf(stderr, "\n");
+}
+
+// ----------------------------------------------------------------------------
+// allocation functions abstraction
+
+static inline void *mallocz(size_t size) {
+ void *ptr = malloc(size);
+ if (!ptr) {
+ log2stderr("Fatal Error: Memory allocation failed. Requested size: %zu bytes.", size);
+ exit(EXIT_FAILURE);
+ }
+ return ptr;
+}
+
+static inline void *callocz(size_t elements, size_t size) {
+ void *ptr = calloc(elements, size);
+ if (!ptr) {
+ log2stderr("Fatal Error: Memory allocation failed. Requested size: %zu bytes.", elements * size);
+ exit(EXIT_FAILURE);
+ }
+ return ptr;
+}
+
+static inline void *reallocz(void *ptr, size_t size) {
+ void *new_ptr = realloc(ptr, size);
+ if (!new_ptr) {
+ log2stderr("Fatal Error: Memory reallocation failed. Requested size: %zu bytes.", size);
+ exit(EXIT_FAILURE);
+ }
+ return new_ptr;
+}
+
+static inline char *strdupz(const char *s) {
+ char *ptr = strdup(s);
+ if (!ptr) {
+ log2stderr("Fatal Error: Memory allocation failed in strdup.");
+ exit(EXIT_FAILURE);
+ }
+ return ptr;
+}
+
+static inline char *strndupz(const char *s, size_t n) {
+ char *ptr = strndup(s, n);
+ if (!ptr) {
+ log2stderr("Fatal Error: Memory allocation failed in strndup. Requested size: %zu bytes.", n);
+ exit(EXIT_FAILURE);
+ }
+ return ptr;
+}
+
+static inline void freez(void *ptr) {
+ if (ptr)
+ free(ptr);
+}
+
+// ----------------------------------------------------------------------------
+
+#define XXH_INLINE_ALL
+#include "../../libnetdata/xxhash.h"
+
+#define PCRE2_CODE_UNIT_WIDTH 8
+#include <pcre2.h>
+
+#ifdef HAVE_LIBYAML
+#include <yaml.h>
+#endif
+
+// ----------------------------------------------------------------------------
+// hashtable for HASHED_KEY
+
+// cleanup hashtable defines
+#undef SIMPLE_HASHTABLE_SORT_FUNCTION
+#undef SIMPLE_HASHTABLE_VALUE_TYPE
+#undef SIMPLE_HASHTABLE_NAME
+#undef NETDATA_SIMPLE_HASHTABLE_H
+
+struct hashed_key;
+static inline int compare_keys(struct hashed_key *k1, struct hashed_key *k2);
+#define SIMPLE_HASHTABLE_SORT_FUNCTION compare_keys
+#define SIMPLE_HASHTABLE_VALUE_TYPE struct hashed_key
+#define SIMPLE_HASHTABLE_NAME _KEY
+#include "../../libnetdata/simple_hashtable.h"
+
+// ----------------------------------------------------------------------------
+
+#define MAX_OUTPUT_KEYS 1024
+#define MAX_LINE_LENGTH (1024 * 1024)
+#define MAX_INJECTIONS (MAX_OUTPUT_KEYS / 2)
+#define MAX_REWRITES (MAX_OUTPUT_KEYS / 2)
+#define MAX_RENAMES (MAX_OUTPUT_KEYS / 2)
+
+#define JOURNAL_MAX_KEY_LEN 64 // according to systemd-journald
+#define JOURNAL_MAX_VALUE_LEN (48 * 1024) // according to systemd-journald
+
+#define LOG2JOURNAL_CONFIG_PATH LIBCONFIG_DIR "/log2journal.d"
+
+// ----------------------------------------------------------------------------
+// character conversion for journal keys
+
+extern const char journal_key_characters_map[256];
+
+// ----------------------------------------------------------------------------
+// copy to buffer, while ensuring there is no buffer overflow
+
+static inline size_t copy_to_buffer(char *dst, size_t dst_size, const char *src, size_t src_len) {
+ if(dst_size < 2) {
+ if(dst_size == 1)
+ *dst = '\0';
+
+ return 0;
+ }
+
+ if(src_len <= dst_size - 1) {
+ memcpy(dst, src, src_len);
+ dst[src_len] = '\0';
+ return src_len;
+ }
+ else {
+ memcpy(dst, src, dst_size - 1);
+ dst[dst_size - 1] = '\0';
+ return dst_size - 1;
+ }
+}
+
+// ----------------------------------------------------------------------------
+// A dynamically sized, reusable text buffer,
+// allowing us to be fast (no allocations during iterations) while having the
+// smallest possible allocations.
+
+typedef struct txt {
+ char *txt;
+ uint32_t size;
+ uint32_t len;
+} TEXT;
+
+static inline void txt_cleanup(TEXT *t) {
+ if(!t)
+ return;
+
+ if(t->txt)
+ freez(t->txt);
+
+ t->txt = NULL;
+ t->size = 0;
+ t->len = 0;
+}
+
+static inline void txt_replace(TEXT *t, const char *s, size_t len) {
+ if(!s || !*s || len == 0) {
+ s = "";
+ len = 0;
+ }
+
+ if(len + 1 <= t->size) {
+ // the existing value allocation, fits our value
+
+ memcpy(t->txt, s, len);
+ t->txt[len] = '\0';
+ t->len = len;
+ }
+ else {
+ // no existing value allocation, or too small for our value
+ // cleanup and increase the buffer
+
+ txt_cleanup(t);
+
+ t->txt = strndupz(s, len);
+ t->size = len + 1;
+ t->len = len;
+ }
+}
+
+static inline void txt_expand_and_append(TEXT *t, const char *s, size_t len) {
+ if(len + 1 > (t->size - t->len)) {
+ size_t new_size = t->len + len + 1;
+ if(new_size < t->size * 2)
+ new_size = t->size * 2;
+
+ t->txt = reallocz(t->txt, new_size);
+ t->size = new_size;
+ }
+
+ char *copy_to = &t->txt[t->len];
+ memcpy(copy_to, s, len);
+ copy_to[len] = '\0';
+ t->len += len;
+}
+
+// ----------------------------------------------------------------------------
+
+typedef enum __attribute__((__packed__)) {
+ HK_NONE = 0,
+
+ // permanent flags - they are set once to optimize various decisions and lookups
+
+ HK_HASHTABLE_ALLOCATED = (1 << 0), // this is key object allocated in the hashtable
+ // objects that do not have this, have a pointer to a key in the hashtable
+ // objects that have this, value a value allocated
+
+ HK_FILTERED = (1 << 1), // we checked once if this key in filtered
+ HK_FILTERED_INCLUDED = (1 << 2), // the result of the filtering was to include it in the output
+
+ HK_COLLISION_CHECKED = (1 << 3), // we checked once for collision check of this key
+
+ HK_RENAMES_CHECKED = (1 << 4), // we checked once if there are renames on this key
+ HK_HAS_RENAMES = (1 << 5), // and we found there is a rename rule related to it
+
+ // ephemeral flags - they are unset at the end of each log line
+
+ HK_VALUE_FROM_LOG = (1 << 14), // the value of this key has been read from the log (or from injection, duplication)
+ HK_VALUE_REWRITTEN = (1 << 15), // the value of this key has been rewritten due to one of our rewrite rules
+
+} HASHED_KEY_FLAGS;
+
+typedef struct hashed_key {
+ const char *key;
+ uint32_t len;
+ HASHED_KEY_FLAGS flags;
+ XXH64_hash_t hash;
+ union {
+ struct hashed_key *hashtable_ptr; // HK_HASHTABLE_ALLOCATED is not set
+ TEXT value; // HK_HASHTABLE_ALLOCATED is set
+ };
+} HASHED_KEY;
+
+static inline void hashed_key_cleanup(HASHED_KEY *k) {
+ if(k->key) {
+ freez((void *)k->key);
+ k->key = NULL;
+ }
+
+ if(k->flags & HK_HASHTABLE_ALLOCATED)
+ txt_cleanup(&k->value);
+ else
+ k->hashtable_ptr = NULL;
+}
+
+static inline void hashed_key_set(HASHED_KEY *k, const char *name) {
+ hashed_key_cleanup(k);
+
+ k->key = strdupz(name);
+ k->len = strlen(k->key);
+ k->hash = XXH3_64bits(k->key, k->len);
+ k->flags = HK_NONE;
+}
+
+static inline void hashed_key_len_set(HASHED_KEY *k, const char *name, size_t len) {
+ hashed_key_cleanup(k);
+
+ k->key = strndupz(name, len);
+ k->len = len;
+ k->hash = XXH3_64bits(k->key, k->len);
+ k->flags = HK_NONE;
+}
+
+static inline bool hashed_keys_match(HASHED_KEY *k1, HASHED_KEY *k2) {
+ return ((k1 == k2) || (k1->hash == k2->hash && strcmp(k1->key, k2->key) == 0));
+}
+
+static inline int compare_keys(struct hashed_key *k1, struct hashed_key *k2) {
+ return strcmp(k1->key, k2->key);
+}
+
+// ----------------------------------------------------------------------------
+
+typedef struct search_pattern {
+ const char *pattern;
+ pcre2_code *re;
+ pcre2_match_data *match_data;
+ TEXT error;
+} SEARCH_PATTERN;
+
+void search_pattern_cleanup(SEARCH_PATTERN *sp);
+bool search_pattern_set(SEARCH_PATTERN *sp, const char *search_pattern, size_t search_pattern_len);
+
+static inline bool search_pattern_matches(SEARCH_PATTERN *sp, const char *value, size_t value_len) {
+ return pcre2_match(sp->re, (PCRE2_SPTR)value, value_len, 0, 0, sp->match_data, NULL) >= 0;
+}
+
+// ----------------------------------------------------------------------------
+
+typedef struct replacement_node {
+ HASHED_KEY name;
+ bool is_variable;
+ bool logged_error;
+
+ struct replacement_node *next;
+} REPLACE_NODE;
+
+void replace_node_free(REPLACE_NODE *rpn);
+
+typedef struct replace_pattern {
+ const char *pattern;
+ REPLACE_NODE *nodes;
+ bool has_variables;
+} REPLACE_PATTERN;
+
+void replace_pattern_cleanup(REPLACE_PATTERN *rp);
+bool replace_pattern_set(REPLACE_PATTERN *rp, const char *pattern);
+
+// ----------------------------------------------------------------------------
+
+typedef struct injection {
+ bool on_unmatched;
+ HASHED_KEY key;
+ REPLACE_PATTERN value;
+} INJECTION;
+
+void injection_cleanup(INJECTION *inj);
+
+// ----------------------------------------------------------------------------
+
+typedef struct key_rename {
+ HASHED_KEY new_key;
+ HASHED_KEY old_key;
+} RENAME;
+
+void rename_cleanup(RENAME *rn);
+
+// ----------------------------------------------------------------------------
+
+typedef enum __attribute__((__packed__)) {
+ RW_NONE = 0,
+ RW_MATCH_PCRE2 = (1 << 1), // a rewrite rule
+ RW_MATCH_NON_EMPTY = (1 << 2), // a rewrite rule
+ RW_DONT_STOP = (1 << 3),
+ RW_INJECT = (1 << 4),
+} RW_FLAGS;
+
+typedef struct key_rewrite {
+ RW_FLAGS flags;
+ HASHED_KEY key;
+ union {
+ SEARCH_PATTERN match_pcre2;
+ REPLACE_PATTERN match_non_empty;
+ };
+ REPLACE_PATTERN value;
+} REWRITE;
+
+void rewrite_cleanup(REWRITE *rw);
+
+// ----------------------------------------------------------------------------
+// A job configuration and runtime structures
+
+typedef struct log_job {
+ bool show_config;
+
+ const char *pattern;
+ const char *prefix;
+
+ SIMPLE_HASHTABLE_KEY hashtable;
+
+ struct {
+ const char *buffer;
+ const char *trimmed;
+ size_t trimmed_len;
+ size_t size;
+ HASHED_KEY key;
+ } line;
+
+ struct {
+ SEARCH_PATTERN include;
+ SEARCH_PATTERN exclude;
+ } filter;
+
+ struct {
+ bool last_line_was_empty;
+ HASHED_KEY key;
+ TEXT current;
+ } filename;
+
+ struct {
+ uint32_t used;
+ INJECTION keys[MAX_INJECTIONS];
+ } injections;
+
+ struct {
+ HASHED_KEY key;
+ struct {
+ uint32_t used;
+ INJECTION keys[MAX_INJECTIONS];
+ } injections;
+ } unmatched;
+
+ struct {
+ uint32_t used;
+ REWRITE array[MAX_REWRITES];
+ TEXT tmp;
+ } rewrites;
+
+ struct {
+ uint32_t used;
+ RENAME array[MAX_RENAMES];
+ } renames;
+} LOG_JOB;
+
+// initialize a log job
+void log_job_init(LOG_JOB *jb);
+
+// free all resources consumed by the log job
+void log_job_cleanup(LOG_JOB *jb);
+
+// ----------------------------------------------------------------------------
+
+// the entry point to send key value pairs to the output
+// this implements the pipeline of processing renames, rewrites and duplications
+void log_job_send_extracted_key_value(LOG_JOB *jb, const char *key, const char *value, size_t len);
+
+// ----------------------------------------------------------------------------
+// configuration related
+
+// management of configuration to set settings
+bool log_job_filename_key_set(LOG_JOB *jb, const char *key, size_t key_len);
+bool log_job_key_prefix_set(LOG_JOB *jb, const char *prefix, size_t prefix_len);
+bool log_job_pattern_set(LOG_JOB *jb, const char *pattern, size_t pattern_len);
+bool log_job_injection_add(LOG_JOB *jb, const char *key, size_t key_len, const char *value, size_t value_len, bool unmatched);
+bool log_job_rewrite_add(LOG_JOB *jb, const char *key, RW_FLAGS flags, const char *search_pattern, const char *replace_pattern);
+bool log_job_rename_add(LOG_JOB *jb, const char *new_key, size_t new_key_len, const char *old_key, size_t old_key_len);
+bool log_job_include_pattern_set(LOG_JOB *jb, const char *pattern, size_t pattern_len);
+bool log_job_exclude_pattern_set(LOG_JOB *jb, const char *pattern, size_t pattern_len);
+
+// entry point to parse command line parameters
+bool log_job_command_line_parse_parameters(LOG_JOB *jb, int argc, char **argv);
+void log_job_command_line_help(const char *name);
+
+// ----------------------------------------------------------------------------
+// YAML configuration related
+
+#ifdef HAVE_LIBYAML
+bool yaml_parse_file(const char *config_file_path, LOG_JOB *jb);
+bool yaml_parse_config(const char *config_name, LOG_JOB *jb);
+#endif
+
+void log_job_configuration_to_yaml(LOG_JOB *jb);
+
+// ----------------------------------------------------------------------------
+// JSON parser
+
+typedef struct log_json_state LOG_JSON_STATE;
+LOG_JSON_STATE *json_parser_create(LOG_JOB *jb);
+void json_parser_destroy(LOG_JSON_STATE *js);
+const char *json_parser_error(LOG_JSON_STATE *js);
+bool json_parse_document(LOG_JSON_STATE *js, const char *txt);
+void json_test(void);
+
+size_t parse_surrogate(const char *s, char *d, size_t *remaining);
+
+// ----------------------------------------------------------------------------
+// logfmt parser
+
+typedef struct logfmt_state LOGFMT_STATE;
+LOGFMT_STATE *logfmt_parser_create(LOG_JOB *jb);
+void logfmt_parser_destroy(LOGFMT_STATE *lfs);
+const char *logfmt_parser_error(LOGFMT_STATE *lfs);
+bool logfmt_parse_document(LOGFMT_STATE *js, const char *txt);
+void logfmt_test(void);
+
+// ----------------------------------------------------------------------------
+// pcre2 parser
+
+typedef struct pcre2_state PCRE2_STATE;
+PCRE2_STATE *pcre2_parser_create(LOG_JOB *jb);
+void pcre2_parser_destroy(PCRE2_STATE *pcre2);
+const char *pcre2_parser_error(PCRE2_STATE *pcre2);
+bool pcre2_parse_document(PCRE2_STATE *pcre2, const char *txt, size_t len);
+bool pcre2_has_error(PCRE2_STATE *pcre2);
+void pcre2_test(void);
+
+void pcre2_get_error_in_buffer(char *msg, size_t msg_len, int rc, int pos);
+
+#endif //NETDATA_LOG2JOURNAL_H
diff --git a/collectors/log2journal/tests.d/default.output b/collectors/log2journal/tests.d/default.output
new file mode 100644
index 00000000..ef17cb2c
--- /dev/null
+++ b/collectors/log2journal/tests.d/default.output
@@ -0,0 +1,20 @@
+MESSAGE=key1=value01 key2=value02 key3=value03 key4=value04
+PRIORITY=6
+SYSLOG_IDENTIFIER=log2journal
+
+MESSAGE=key1=value11 key2=value12 key3=value13 key4=
+PRIORITY=6
+SYSLOG_IDENTIFIER=log2journal
+
+MESSAGE=key1=value21 key2=value22 key3=value23 key4=value24
+PRIORITY=6
+SYSLOG_IDENTIFIER=log2journal
+
+MESSAGE=key1=value31 key2=value32 key3=value33 key4=
+PRIORITY=6
+SYSLOG_IDENTIFIER=log2journal
+
+MESSAGE=key1=value41 key2=value42 key3=value43 key4=value44
+PRIORITY=6
+SYSLOG_IDENTIFIER=log2journal
+
diff --git a/collectors/log2journal/tests.d/full.output b/collectors/log2journal/tests.d/full.output
new file mode 100644
index 00000000..074092d4
--- /dev/null
+++ b/collectors/log2journal/tests.d/full.output
@@ -0,0 +1,77 @@
+pattern: |
+ (?x) # Enable PCRE2 extended mode
+ ^
+ (?<NGINX_REMOTE_ADDR>[^ ]+) \s - \s # NGINX_REMOTE_ADDR
+ (?<NGINX_REMOTE_USER>[^ ]+) \s # NGINX_REMOTE_USER
+ \[
+ (?<NGINX_TIME_LOCAL>[^\]]+) # NGINX_TIME_LOCAL
+ \]
+ \s+ "
+ (?<MESSAGE>
+ (?<NGINX_METHOD>[A-Z]+) \s+ # NGINX_METHOD
+ (?<NGINX_URL>[^ ]+) \s+
+ HTTP/(?<NGINX_HTTP_VERSION>[^"]+)
+ )
+ " \s+
+ (?<NGINX_STATUS>\d+) \s+ # NGINX_STATUS
+ (?<NGINX_BODY_BYTES_SENT>\d+) \s+ # NGINX_BODY_BYTES_SENT
+ "(?<NGINX_HTTP_REFERER>[^"]*)" \s+ # NGINX_HTTP_REFERER
+ "(?<NGINX_HTTP_USER_AGENT>[^"]*)" # NGINX_HTTP_USER_AGENT
+
+prefix: NGINX_
+
+filename:
+ key: NGINX_LOG_FILENAME
+
+filter:
+ include: '.*'
+ exclude: '.*HELLO.*WORLD.*'
+
+rename:
+ - new_key: TEST1
+ old_key: TEST2
+ - new_key: TEST3
+ old_key: TEST4
+
+inject:
+ - key: SYSLOG_IDENTIFIER
+ value: nginx-log
+ - key: SYSLOG_IDENTIFIER2
+ value: nginx-log2
+ - key: PRIORITY
+ value: '${NGINX_STATUS}'
+ - key: NGINX_STATUS_FAMILY
+ value: '${NGINX_STATUS}${NGINX_METHOD}'
+
+rewrite:
+ - key: PRIORITY
+ value: '${NGINX_STATUS}'
+ inject: yes
+ stop: no
+ - key: PRIORITY
+ match: '^[123]'
+ value: 6
+ - key: PRIORITY
+ match: '^4'
+ value: 5
+ - key: PRIORITY
+ match: '^5'
+ value: 3
+ - key: PRIORITY
+ match: '.*'
+ value: 4
+ - key: NGINX_STATUS_FAMILY
+ match: '^(?<first_digit>[1-5])'
+ value: '${first_digit}xx'
+ - key: NGINX_STATUS_FAMILY
+ match: '.*'
+ value: UNKNOWN
+
+unmatched:
+ key: MESSAGE
+
+ inject:
+ - key: PRIORITY
+ value: 1
+ - key: PRIORITY2
+ value: 2
diff --git a/collectors/log2journal/tests.d/full.yaml b/collectors/log2journal/tests.d/full.yaml
new file mode 100644
index 00000000..86cafb5a
--- /dev/null
+++ b/collectors/log2journal/tests.d/full.yaml
@@ -0,0 +1,76 @@
+pattern: |
+ (?x) # Enable PCRE2 extended mode
+ ^
+ (?<NGINX_REMOTE_ADDR>[^ ]+) \s - \s # NGINX_REMOTE_ADDR
+ (?<NGINX_REMOTE_USER>[^ ]+) \s # NGINX_REMOTE_USER
+ \[
+ (?<NGINX_TIME_LOCAL>[^\]]+) # NGINX_TIME_LOCAL
+ \]
+ \s+ "
+ (?<MESSAGE>
+ (?<NGINX_METHOD>[A-Z]+) \s+ # NGINX_METHOD
+ (?<NGINX_URL>[^ ]+) \s+
+ HTTP/(?<NGINX_HTTP_VERSION>[^"]+)
+ )
+ " \s+
+ (?<NGINX_STATUS>\d+) \s+ # NGINX_STATUS
+ (?<NGINX_BODY_BYTES_SENT>\d+) \s+ # NGINX_BODY_BYTES_SENT
+ "(?<NGINX_HTTP_REFERER>[^"]*)" \s+ # NGINX_HTTP_REFERER
+ "(?<NGINX_HTTP_USER_AGENT>[^"]*)" # NGINX_HTTP_USER_AGENT
+
+prefix: NGINX_
+
+filename:
+ key: NGINX_LOG_FILENAME
+
+filter:
+ include: '.*'
+ exclude: '.*HELLO.*WORLD.*'
+
+rename:
+ - new_key: TEST1
+ old_key: TEST2
+ - new_key: TEST3
+ old_key: TEST4
+
+inject:
+ - key: SYSLOG_IDENTIFIER
+ value: 'nginx-log'
+ - key: SYSLOG_IDENTIFIER2
+ value: 'nginx-log2'
+ - key: PRIORITY
+ value: '${NGINX_STATUS}'
+ - key: NGINX_STATUS_FAMILY
+ value: '${NGINX_STATUS}${NGINX_METHOD}'
+
+rewrite:
+ - key: "PRIORITY"
+ value: "${NGINX_STATUS}"
+ inject: yes
+ stop: no
+ - key: "PRIORITY"
+ match: "^[123]"
+ value: 6
+ - key: "PRIORITY"
+ match: "^4"
+ value: 5
+ - key: "PRIORITY"
+ match: "^5"
+ value: 3
+ - key: "PRIORITY"
+ match: ".*"
+ value: 4
+ - key: "NGINX_STATUS_FAMILY"
+ match: "^(?<first_digit>[1-5])"
+ value: "${first_digit}xx"
+ - key: "NGINX_STATUS_FAMILY"
+ match: ".*"
+ value: "UNKNOWN"
+
+unmatched:
+ key: MESSAGE
+ inject:
+ - key: PRIORITY
+ value: 1
+ - key: PRIORITY2
+ value: 2
diff --git a/collectors/log2journal/tests.d/json-exclude.output b/collectors/log2journal/tests.d/json-exclude.output
new file mode 100644
index 00000000..a8f6f83e
--- /dev/null
+++ b/collectors/log2journal/tests.d/json-exclude.output
@@ -0,0 +1,153 @@
+ARRAY2_0=1
+ARRAY2_1=-2.345
+ARRAY2_2=Array Element
+ARRAY2_3=true
+ARRAY2_4=false
+ARRAY2_5=null
+ARRAY2_6_BOOLEANFALSE=false
+ARRAY2_6_BOOLEANTRUE=true
+ARRAY2_6_FLOATNEGATIVE=-0.123
+ARRAY2_6_FLOATPOSITIVE=0.987
+ARRAY2_6_NULLVALUE=null
+ARRAY2_6_NUMERICNEGATIVE=-456
+ARRAY2_6_NUMERICPOSITIVE=123
+ARRAY2_6_SCIENTIFICFLOATNEGATIVE=-1.5e-2
+ARRAY2_6_SCIENTIFICINTPOSITIVE=6e4
+ARRAY2_6_SCIENTIFICSMALLPOSITIVE=5e-5
+ARRAY2_6_STRING=Nested Object in Array2
+ARRAY2_7_BOOLEANFALSE=false
+ARRAY2_7_BOOLEANTRUE=true
+ARRAY2_7_FLOATNEGATIVE=-2.71828
+ARRAY2_7_FLOATPOSITIVE=3.14159
+ARRAY2_7_NULLVALUE=null
+ARRAY2_7_NUMERICNEGATIVE=-123
+ARRAY2_7_NUMERICPOSITIVE=42
+ARRAY2_7_SCIENTIFICFLOATNEGATIVE=-2.5e-3
+ARRAY2_7_SCIENTIFICINTPOSITIVE=1e5
+ARRAY2_7_SCIENTIFICSMALLPOSITIVE=1e-4
+ARRAY2_7_STRING=Array Element with Object in Array2
+BOOLEANFALSE=false
+BOOLEANTRUE=true
+FLOATNEGATIVE=-2.71828
+FLOATPOSITIVE=3.14159
+NULLVALUE=null
+NUMERICNEGATIVE=-123
+NUMERICPOSITIVE=42
+OBJECT_BOOLEANFALSE=false
+OBJECT_BOOLEANTRUE=true
+OBJECT_FLOATNEGATIVE=-0.123
+OBJECT_FLOATPOSITIVE=0.987
+OBJECT_NULLVALUE=null
+OBJECT_NUMERICNEGATIVE=-456
+OBJECT_NUMERICPOSITIVE=123
+OBJECT_SCIENTIFICFLOATNEGATIVE=-1.5e-2
+OBJECT_SCIENTIFICINTPOSITIVE=6e4
+OBJECT_SCIENTIFICSMALLPOSITIVE=5e-5
+OBJECT_STRING=Nested Object
+SCIENTIFICFLOATNEGATIVE=-2.5e-3
+SCIENTIFICINTPOSITIVE=1e5
+SCIENTIFICSMALLPOSITIVE=1e-4
+STRING=Hello, World!
+
+ARRAY2_0=1
+ARRAY2_1=-2.345
+ARRAY2_2=Array Element
+ARRAY2_3=true
+ARRAY2_4=false
+ARRAY2_5=null
+ARRAY2_6_BOOLEANFALSE=false
+ARRAY2_6_BOOLEANTRUE=true
+ARRAY2_6_FLOATNEGATIVE=-0.123
+ARRAY2_6_FLOATPOSITIVE=0.987
+ARRAY2_6_NULLVALUE=null
+ARRAY2_6_NUMERICNEGATIVE=-456
+ARRAY2_6_NUMERICPOSITIVE=123
+ARRAY2_6_SCIENTIFICFLOATNEGATIVE=-1.5e-2
+ARRAY2_6_SCIENTIFICINTPOSITIVE=6e4
+ARRAY2_6_SCIENTIFICSMALLPOSITIVE=5e-5
+ARRAY2_6_STRING=Nested Object in Array2
+ARRAY2_7_BOOLEANFALSE=false
+ARRAY2_7_BOOLEANTRUE=true
+ARRAY2_7_FLOATNEGATIVE=-2.71828
+ARRAY2_7_FLOATPOSITIVE=3.14159
+ARRAY2_7_NULLVALUE=null
+ARRAY2_7_NUMERICNEGATIVE=-123
+ARRAY2_7_NUMERICPOSITIVE=42
+ARRAY2_7_SCIENTIFICFLOATNEGATIVE=-2.5e-3
+ARRAY2_7_SCIENTIFICINTPOSITIVE=1e5
+ARRAY2_7_SCIENTIFICSMALLPOSITIVE=1e-4
+ARRAY2_7_STRING=Array Element with Object in Array2
+BOOLEANFALSE=false
+BOOLEANTRUE=true
+FLOATNEGATIVE=-2.71828
+FLOATPOSITIVE=3.14159
+NULLVALUE=null
+NUMERICNEGATIVE=-123
+NUMERICPOSITIVE=42
+OBJECT_BOOLEANFALSE=false
+OBJECT_BOOLEANTRUE=true
+OBJECT_FLOATNEGATIVE=-0.123
+OBJECT_FLOATPOSITIVE=0.987
+OBJECT_NULLVALUE=null
+OBJECT_NUMERICNEGATIVE=-456
+OBJECT_NUMERICPOSITIVE=123
+OBJECT_SCIENTIFICFLOATNEGATIVE=-1.5e-2
+OBJECT_SCIENTIFICINTPOSITIVE=6e4
+OBJECT_SCIENTIFICSMALLPOSITIVE=5e-5
+OBJECT_STRING=Nested Object
+SCIENTIFICFLOATNEGATIVE=-2.5e-3
+SCIENTIFICINTPOSITIVE=1e5
+SCIENTIFICSMALLPOSITIVE=1e-4
+STRING=Hello, World!
+
+ARRAY2_0=1
+ARRAY2_1=-2.345
+ARRAY2_2=Array Element
+ARRAY2_3=true
+ARRAY2_4=false
+ARRAY2_5=null
+ARRAY2_6_BOOLEANFALSE=false
+ARRAY2_6_BOOLEANTRUE=true
+ARRAY2_6_FLOATNEGATIVE=-0.123
+ARRAY2_6_FLOATPOSITIVE=0.987
+ARRAY2_6_NULLVALUE=null
+ARRAY2_6_NUMERICNEGATIVE=-456
+ARRAY2_6_NUMERICPOSITIVE=123
+ARRAY2_6_SCIENTIFICFLOATNEGATIVE=-1.5e-2
+ARRAY2_6_SCIENTIFICINTPOSITIVE=6e4
+ARRAY2_6_SCIENTIFICSMALLPOSITIVE=5e-5
+ARRAY2_6_STRING=Nested Object in Array2
+ARRAY2_7_BOOLEANFALSE=false
+ARRAY2_7_BOOLEANTRUE=true
+ARRAY2_7_FLOATNEGATIVE=-2.71828
+ARRAY2_7_FLOATPOSITIVE=3.14159
+ARRAY2_7_NULLVALUE=null
+ARRAY2_7_NUMERICNEGATIVE=-123
+ARRAY2_7_NUMERICPOSITIVE=42
+ARRAY2_7_SCIENTIFICFLOATNEGATIVE=-2.5e-3
+ARRAY2_7_SCIENTIFICINTPOSITIVE=1e5
+ARRAY2_7_SCIENTIFICSMALLPOSITIVE=1e-4
+ARRAY2_7_STRING=Array Element with Object in Array2
+BOOLEANFALSE=false
+BOOLEANTRUE=true
+FLOATNEGATIVE=-2.71828
+FLOATPOSITIVE=3.14159
+NULLVALUE=null
+NUMERICNEGATIVE=-123
+NUMERICPOSITIVE=42
+OBJECT_BOOLEANFALSE=false
+OBJECT_BOOLEANTRUE=true
+OBJECT_FLOATNEGATIVE=-0.123
+OBJECT_FLOATPOSITIVE=0.987
+OBJECT_NULLVALUE=null
+OBJECT_NUMERICNEGATIVE=-456
+OBJECT_NUMERICPOSITIVE=123
+OBJECT_SCIENTIFICFLOATNEGATIVE=-1.5e-2
+OBJECT_SCIENTIFICINTPOSITIVE=6e4
+OBJECT_SCIENTIFICSMALLPOSITIVE=5e-5
+OBJECT_STRING=Nested Object
+SCIENTIFICFLOATNEGATIVE=-2.5e-3
+SCIENTIFICINTPOSITIVE=1e5
+SCIENTIFICSMALLPOSITIVE=1e-4
+STRING=Hello, World!
+
diff --git a/collectors/log2journal/tests.d/json-include.output b/collectors/log2journal/tests.d/json-include.output
new file mode 100644
index 00000000..326c58da
--- /dev/null
+++ b/collectors/log2journal/tests.d/json-include.output
@@ -0,0 +1,54 @@
+OBJECT_ARRAY_0=1
+OBJECT_ARRAY_1=-2
+OBJECT_ARRAY_2=3
+OBJECT_ARRAY_3=Nested Array
+OBJECT_ARRAY_4=true
+OBJECT_ARRAY_5=null
+OBJECT_BOOLEANFALSE=false
+OBJECT_BOOLEANTRUE=true
+OBJECT_FLOATNEGATIVE=-0.123
+OBJECT_FLOATPOSITIVE=0.987
+OBJECT_NULLVALUE=null
+OBJECT_NUMERICNEGATIVE=-456
+OBJECT_NUMERICPOSITIVE=123
+OBJECT_SCIENTIFICFLOATNEGATIVE=-1.5e-2
+OBJECT_SCIENTIFICINTPOSITIVE=6e4
+OBJECT_SCIENTIFICSMALLPOSITIVE=5e-5
+OBJECT_STRING=Nested Object
+
+OBJECT_ARRAY_0=1
+OBJECT_ARRAY_1=-2
+OBJECT_ARRAY_2=3
+OBJECT_ARRAY_3=Nested Array
+OBJECT_ARRAY_4=true
+OBJECT_ARRAY_5=null
+OBJECT_BOOLEANFALSE=false
+OBJECT_BOOLEANTRUE=true
+OBJECT_FLOATNEGATIVE=-0.123
+OBJECT_FLOATPOSITIVE=0.987
+OBJECT_NULLVALUE=null
+OBJECT_NUMERICNEGATIVE=-456
+OBJECT_NUMERICPOSITIVE=123
+OBJECT_SCIENTIFICFLOATNEGATIVE=-1.5e-2
+OBJECT_SCIENTIFICINTPOSITIVE=6e4
+OBJECT_SCIENTIFICSMALLPOSITIVE=5e-5
+OBJECT_STRING=Nested Object
+
+OBJECT_ARRAY_0=1
+OBJECT_ARRAY_1=-2
+OBJECT_ARRAY_2=3
+OBJECT_ARRAY_3=Nested Array
+OBJECT_ARRAY_4=true
+OBJECT_ARRAY_5=null
+OBJECT_BOOLEANFALSE=false
+OBJECT_BOOLEANTRUE=true
+OBJECT_FLOATNEGATIVE=-0.123
+OBJECT_FLOATPOSITIVE=0.987
+OBJECT_NULLVALUE=null
+OBJECT_NUMERICNEGATIVE=-456
+OBJECT_NUMERICPOSITIVE=123
+OBJECT_SCIENTIFICFLOATNEGATIVE=-1.5e-2
+OBJECT_SCIENTIFICINTPOSITIVE=6e4
+OBJECT_SCIENTIFICSMALLPOSITIVE=5e-5
+OBJECT_STRING=Nested Object
+
diff --git a/collectors/log2journal/tests.d/json.log b/collectors/log2journal/tests.d/json.log
new file mode 100644
index 00000000..3f133496
--- /dev/null
+++ b/collectors/log2journal/tests.d/json.log
@@ -0,0 +1,3 @@
+{ "numericPositive": 42, "numericNegative": -123, "floatPositive": 3.14159, "floatNegative": -2.71828, "scientificIntPositive": 1e5, "scientificFloatNegative": -2.5e-3, "scientificSmallPositive": 1e-4, "booleanTrue": true, "booleanFalse": false, "string": "Hello, World!", "nullValue": null, "object": { "numericPositive": 123, "numericNegative": -456, "floatPositive": 0.987, "floatNegative": -0.123, "scientificIntPositive": 6e4, "scientificFloatNegative": -1.5e-2, "scientificSmallPositive": 5e-5, "booleanTrue": true, "booleanFalse": false, "string": "Nested Object", "nullValue": null, "array": [1, -2, 3, "Nested Array", true, null] }, "array": [ 1, -2.345, "Array Element", true, false, null, { "numericPositive": 987, "numericNegative": -654, "string": "Nested Object in Array", "array": [null, false, true] }, { "numericPositive": 42, "numericNegative": -123, "floatPositive": 3.14159, "floatNegative": -2.71828, "scientificIntPositive": 1e5, "scientificFloatNegative": -2.5e-3, "scientificSmallPositive": 1e-4, "booleanTrue": true, "booleanFalse": false, "string": "Array Element with Object", "nullValue": null, "array": [1, -2, 3, "Nested Array in Object", true, null] } ], "array2": [ 1, -2.345, "Array Element", true, false, null, { "numericPositive": 123, "numericNegative": -456, "floatPositive": 0.987, "floatNegative": -0.123, "scientificIntPositive": 6e4, "scientificFloatNegative": -1.5e-2, "scientificSmallPositive": 5e-5, "booleanTrue": true, "booleanFalse": false, "string": "Nested Object in Array2", "nullValue": null, "array": [1, -2, 3, "Nested Array in Object2", true, null] }, { "numericPositive": 42, "numericNegative": -123, "floatPositive": 3.14159, "floatNegative": -2.71828, "scientificIntPositive": 1e5, "scientificFloatNegative": -2.5e-3, "scientificSmallPositive": 1e-4, "booleanTrue": true, "booleanFalse": false, "string": "Array Element with Object in Array2", "nullValue": null, "array": [1, -2, 3, "Nested Array in Object2", true, null]}]}
+{ "numericPositive": 42, "numericNegative": -123, "floatPositive": 3.14159, "floatNegative": -2.71828, "scientificIntPositive": 1e5, "scientificFloatNegative": -2.5e-3, "scientificSmallPositive": 1e-4, "booleanTrue": true, "booleanFalse": false, "string": "Hello, World!", "nullValue": null, "object": { "numericPositive": 123, "numericNegative": -456, "floatPositive": 0.987, "floatNegative": -0.123, "scientificIntPositive": 6e4, "scientificFloatNegative": -1.5e-2, "scientificSmallPositive": 5e-5, "booleanTrue": true, "booleanFalse": false, "string": "Nested Object", "nullValue": null, "array": [1, -2, 3, "Nested Array", true, null] }, "array": [ 1, -2.345, "Array Element", true, false, null, { "numericPositive": 987, "numericNegative": -654, "string": "Nested Object in Array", "array": [null, false, true] }, { "numericPositive": 42, "numericNegative": -123, "floatPositive": 3.14159, "floatNegative": -2.71828, "scientificIntPositive": 1e5, "scientificFloatNegative": -2.5e-3, "scientificSmallPositive": 1e-4, "booleanTrue": true, "booleanFalse": false, "string": "Array Element with Object", "nullValue": null, "array": [1, -2, 3, "Nested Array in Object", true, null] } ], "array2": [ 1, -2.345, "Array Element", true, false, null, { "numericPositive": 123, "numericNegative": -456, "floatPositive": 0.987, "floatNegative": -0.123, "scientificIntPositive": 6e4, "scientificFloatNegative": -1.5e-2, "scientificSmallPositive": 5e-5, "booleanTrue": true, "booleanFalse": false, "string": "Nested Object in Array2", "nullValue": null, "array": [1, -2, 3, "Nested Array in Object2", true, null] }, { "numericPositive": 42, "numericNegative": -123, "floatPositive": 3.14159, "floatNegative": -2.71828, "scientificIntPositive": 1e5, "scientificFloatNegative": -2.5e-3, "scientificSmallPositive": 1e-4, "booleanTrue": true, "booleanFalse": false, "string": "Array Element with Object in Array2", "nullValue": null, "array": [1, -2, 3, "Nested Array in Object2", true, null]}]}
+{ "numericPositive": 42, "numericNegative": -123, "floatPositive": 3.14159, "floatNegative": -2.71828, "scientificIntPositive": 1e5, "scientificFloatNegative": -2.5e-3, "scientificSmallPositive": 1e-4, "booleanTrue": true, "booleanFalse": false, "string": "Hello, World!", "nullValue": null, "object": { "numericPositive": 123, "numericNegative": -456, "floatPositive": 0.987, "floatNegative": -0.123, "scientificIntPositive": 6e4, "scientificFloatNegative": -1.5e-2, "scientificSmallPositive": 5e-5, "booleanTrue": true, "booleanFalse": false, "string": "Nested Object", "nullValue": null, "array": [1, -2, 3, "Nested Array", true, null] }, "array": [ 1, -2.345, "Array Element", true, false, null, { "numericPositive": 987, "numericNegative": -654, "string": "Nested Object in Array", "array": [null, false, true] }, { "numericPositive": 42, "numericNegative": -123, "floatPositive": 3.14159, "floatNegative": -2.71828, "scientificIntPositive": 1e5, "scientificFloatNegative": -2.5e-3, "scientificSmallPositive": 1e-4, "booleanTrue": true, "booleanFalse": false, "string": "Array Element with Object", "nullValue": null, "array": [1, -2, 3, "Nested Array in Object", true, null] } ], "array2": [ 1, -2.345, "Array Element", true, false, null, { "numericPositive": 123, "numericNegative": -456, "floatPositive": 0.987, "floatNegative": -0.123, "scientificIntPositive": 6e4, "scientificFloatNegative": -1.5e-2, "scientificSmallPositive": 5e-5, "booleanTrue": true, "booleanFalse": false, "string": "Nested Object in Array2", "nullValue": null, "array": [1, -2, 3, "Nested Array in Object2", true, null] }, { "numericPositive": 42, "numericNegative": -123, "floatPositive": 3.14159, "floatNegative": -2.71828, "scientificIntPositive": 1e5, "scientificFloatNegative": -2.5e-3, "scientificSmallPositive": 1e-4, "booleanTrue": true, "booleanFalse": false, "string": "Array Element with Object in Array2", "nullValue": null, "array": [1, -2, 3, "Nested Array in Object2", true, null]}]}
diff --git a/collectors/log2journal/tests.d/json.output b/collectors/log2journal/tests.d/json.output
new file mode 100644
index 00000000..83499cc5
--- /dev/null
+++ b/collectors/log2journal/tests.d/json.output
@@ -0,0 +1,294 @@
+ARRAY2_0=1
+ARRAY2_1=-2.345
+ARRAY2_2=Array Element
+ARRAY2_3=true
+ARRAY2_4=false
+ARRAY2_5=null
+ARRAY2_6_ARRAY_0=1
+ARRAY2_6_ARRAY_1=-2
+ARRAY2_6_ARRAY_2=3
+ARRAY2_6_ARRAY_3=Nested Array in Object2
+ARRAY2_6_ARRAY_4=true
+ARRAY2_6_ARRAY_5=null
+ARRAY2_6_BOOLEANFALSE=false
+ARRAY2_6_BOOLEANTRUE=true
+ARRAY2_6_FLOATNEGATIVE=-0.123
+ARRAY2_6_FLOATPOSITIVE=0.987
+ARRAY2_6_NULLVALUE=null
+ARRAY2_6_NUMERICNEGATIVE=-456
+ARRAY2_6_NUMERICPOSITIVE=123
+ARRAY2_6_SCIENTIFICFLOATNEGATIVE=-1.5e-2
+ARRAY2_6_SCIENTIFICINTPOSITIVE=6e4
+ARRAY2_6_SCIENTIFICSMALLPOSITIVE=5e-5
+ARRAY2_6_STRING=Nested Object in Array2
+ARRAY2_7_ARRAY_0=1
+ARRAY2_7_ARRAY_1=-2
+ARRAY2_7_ARRAY_2=3
+ARRAY2_7_ARRAY_3=Nested Array in Object2
+ARRAY2_7_ARRAY_4=true
+ARRAY2_7_ARRAY_5=null
+ARRAY2_7_BOOLEANFALSE=false
+ARRAY2_7_BOOLEANTRUE=true
+ARRAY2_7_FLOATNEGATIVE=-2.71828
+ARRAY2_7_FLOATPOSITIVE=3.14159
+ARRAY2_7_NULLVALUE=null
+ARRAY2_7_NUMERICNEGATIVE=-123
+ARRAY2_7_NUMERICPOSITIVE=42
+ARRAY2_7_SCIENTIFICFLOATNEGATIVE=-2.5e-3
+ARRAY2_7_SCIENTIFICINTPOSITIVE=1e5
+ARRAY2_7_SCIENTIFICSMALLPOSITIVE=1e-4
+ARRAY2_7_STRING=Array Element with Object in Array2
+ARRAY_0=1
+ARRAY_1=-2.345
+ARRAY_2=Array Element
+ARRAY_3=true
+ARRAY_4=false
+ARRAY_5=null
+ARRAY_6_ARRAY_0=null
+ARRAY_6_ARRAY_1=false
+ARRAY_6_ARRAY_2=true
+ARRAY_6_NUMERICNEGATIVE=-654
+ARRAY_6_NUMERICPOSITIVE=987
+ARRAY_6_STRING=Nested Object in Array
+ARRAY_7_ARRAY_0=1
+ARRAY_7_ARRAY_1=-2
+ARRAY_7_ARRAY_2=3
+ARRAY_7_ARRAY_3=Nested Array in Object
+ARRAY_7_ARRAY_4=true
+ARRAY_7_ARRAY_5=null
+ARRAY_7_BOOLEANFALSE=false
+ARRAY_7_BOOLEANTRUE=true
+ARRAY_7_FLOATNEGATIVE=-2.71828
+ARRAY_7_FLOATPOSITIVE=3.14159
+ARRAY_7_NULLVALUE=null
+ARRAY_7_NUMERICNEGATIVE=-123
+ARRAY_7_NUMERICPOSITIVE=42
+ARRAY_7_SCIENTIFICFLOATNEGATIVE=-2.5e-3
+ARRAY_7_SCIENTIFICINTPOSITIVE=1e5
+ARRAY_7_SCIENTIFICSMALLPOSITIVE=1e-4
+ARRAY_7_STRING=Array Element with Object
+BOOLEANFALSE=false
+BOOLEANTRUE=true
+FLOATNEGATIVE=-2.71828
+FLOATPOSITIVE=3.14159
+NULLVALUE=null
+NUMERICNEGATIVE=-123
+NUMERICPOSITIVE=42
+OBJECT_ARRAY_0=1
+OBJECT_ARRAY_1=-2
+OBJECT_ARRAY_2=3
+OBJECT_ARRAY_3=Nested Array
+OBJECT_ARRAY_4=true
+OBJECT_ARRAY_5=null
+OBJECT_BOOLEANFALSE=false
+OBJECT_BOOLEANTRUE=true
+OBJECT_FLOATNEGATIVE=-0.123
+OBJECT_FLOATPOSITIVE=0.987
+OBJECT_NULLVALUE=null
+OBJECT_NUMERICNEGATIVE=-456
+OBJECT_NUMERICPOSITIVE=123
+OBJECT_SCIENTIFICFLOATNEGATIVE=-1.5e-2
+OBJECT_SCIENTIFICINTPOSITIVE=6e4
+OBJECT_SCIENTIFICSMALLPOSITIVE=5e-5
+OBJECT_STRING=Nested Object
+SCIENTIFICFLOATNEGATIVE=-2.5e-3
+SCIENTIFICINTPOSITIVE=1e5
+SCIENTIFICSMALLPOSITIVE=1e-4
+STRING=Hello, World!
+
+ARRAY2_0=1
+ARRAY2_1=-2.345
+ARRAY2_2=Array Element
+ARRAY2_3=true
+ARRAY2_4=false
+ARRAY2_5=null
+ARRAY2_6_ARRAY_0=1
+ARRAY2_6_ARRAY_1=-2
+ARRAY2_6_ARRAY_2=3
+ARRAY2_6_ARRAY_3=Nested Array in Object2
+ARRAY2_6_ARRAY_4=true
+ARRAY2_6_ARRAY_5=null
+ARRAY2_6_BOOLEANFALSE=false
+ARRAY2_6_BOOLEANTRUE=true
+ARRAY2_6_FLOATNEGATIVE=-0.123
+ARRAY2_6_FLOATPOSITIVE=0.987
+ARRAY2_6_NULLVALUE=null
+ARRAY2_6_NUMERICNEGATIVE=-456
+ARRAY2_6_NUMERICPOSITIVE=123
+ARRAY2_6_SCIENTIFICFLOATNEGATIVE=-1.5e-2
+ARRAY2_6_SCIENTIFICINTPOSITIVE=6e4
+ARRAY2_6_SCIENTIFICSMALLPOSITIVE=5e-5
+ARRAY2_6_STRING=Nested Object in Array2
+ARRAY2_7_ARRAY_0=1
+ARRAY2_7_ARRAY_1=-2
+ARRAY2_7_ARRAY_2=3
+ARRAY2_7_ARRAY_3=Nested Array in Object2
+ARRAY2_7_ARRAY_4=true
+ARRAY2_7_ARRAY_5=null
+ARRAY2_7_BOOLEANFALSE=false
+ARRAY2_7_BOOLEANTRUE=true
+ARRAY2_7_FLOATNEGATIVE=-2.71828
+ARRAY2_7_FLOATPOSITIVE=3.14159
+ARRAY2_7_NULLVALUE=null
+ARRAY2_7_NUMERICNEGATIVE=-123
+ARRAY2_7_NUMERICPOSITIVE=42
+ARRAY2_7_SCIENTIFICFLOATNEGATIVE=-2.5e-3
+ARRAY2_7_SCIENTIFICINTPOSITIVE=1e5
+ARRAY2_7_SCIENTIFICSMALLPOSITIVE=1e-4
+ARRAY2_7_STRING=Array Element with Object in Array2
+ARRAY_0=1
+ARRAY_1=-2.345
+ARRAY_2=Array Element
+ARRAY_3=true
+ARRAY_4=false
+ARRAY_5=null
+ARRAY_6_ARRAY_0=null
+ARRAY_6_ARRAY_1=false
+ARRAY_6_ARRAY_2=true
+ARRAY_6_NUMERICNEGATIVE=-654
+ARRAY_6_NUMERICPOSITIVE=987
+ARRAY_6_STRING=Nested Object in Array
+ARRAY_7_ARRAY_0=1
+ARRAY_7_ARRAY_1=-2
+ARRAY_7_ARRAY_2=3
+ARRAY_7_ARRAY_3=Nested Array in Object
+ARRAY_7_ARRAY_4=true
+ARRAY_7_ARRAY_5=null
+ARRAY_7_BOOLEANFALSE=false
+ARRAY_7_BOOLEANTRUE=true
+ARRAY_7_FLOATNEGATIVE=-2.71828
+ARRAY_7_FLOATPOSITIVE=3.14159
+ARRAY_7_NULLVALUE=null
+ARRAY_7_NUMERICNEGATIVE=-123
+ARRAY_7_NUMERICPOSITIVE=42
+ARRAY_7_SCIENTIFICFLOATNEGATIVE=-2.5e-3
+ARRAY_7_SCIENTIFICINTPOSITIVE=1e5
+ARRAY_7_SCIENTIFICSMALLPOSITIVE=1e-4
+ARRAY_7_STRING=Array Element with Object
+BOOLEANFALSE=false
+BOOLEANTRUE=true
+FLOATNEGATIVE=-2.71828
+FLOATPOSITIVE=3.14159
+NULLVALUE=null
+NUMERICNEGATIVE=-123
+NUMERICPOSITIVE=42
+OBJECT_ARRAY_0=1
+OBJECT_ARRAY_1=-2
+OBJECT_ARRAY_2=3
+OBJECT_ARRAY_3=Nested Array
+OBJECT_ARRAY_4=true
+OBJECT_ARRAY_5=null
+OBJECT_BOOLEANFALSE=false
+OBJECT_BOOLEANTRUE=true
+OBJECT_FLOATNEGATIVE=-0.123
+OBJECT_FLOATPOSITIVE=0.987
+OBJECT_NULLVALUE=null
+OBJECT_NUMERICNEGATIVE=-456
+OBJECT_NUMERICPOSITIVE=123
+OBJECT_SCIENTIFICFLOATNEGATIVE=-1.5e-2
+OBJECT_SCIENTIFICINTPOSITIVE=6e4
+OBJECT_SCIENTIFICSMALLPOSITIVE=5e-5
+OBJECT_STRING=Nested Object
+SCIENTIFICFLOATNEGATIVE=-2.5e-3
+SCIENTIFICINTPOSITIVE=1e5
+SCIENTIFICSMALLPOSITIVE=1e-4
+STRING=Hello, World!
+
+ARRAY2_0=1
+ARRAY2_1=-2.345
+ARRAY2_2=Array Element
+ARRAY2_3=true
+ARRAY2_4=false
+ARRAY2_5=null
+ARRAY2_6_ARRAY_0=1
+ARRAY2_6_ARRAY_1=-2
+ARRAY2_6_ARRAY_2=3
+ARRAY2_6_ARRAY_3=Nested Array in Object2
+ARRAY2_6_ARRAY_4=true
+ARRAY2_6_ARRAY_5=null
+ARRAY2_6_BOOLEANFALSE=false
+ARRAY2_6_BOOLEANTRUE=true
+ARRAY2_6_FLOATNEGATIVE=-0.123
+ARRAY2_6_FLOATPOSITIVE=0.987
+ARRAY2_6_NULLVALUE=null
+ARRAY2_6_NUMERICNEGATIVE=-456
+ARRAY2_6_NUMERICPOSITIVE=123
+ARRAY2_6_SCIENTIFICFLOATNEGATIVE=-1.5e-2
+ARRAY2_6_SCIENTIFICINTPOSITIVE=6e4
+ARRAY2_6_SCIENTIFICSMALLPOSITIVE=5e-5
+ARRAY2_6_STRING=Nested Object in Array2
+ARRAY2_7_ARRAY_0=1
+ARRAY2_7_ARRAY_1=-2
+ARRAY2_7_ARRAY_2=3
+ARRAY2_7_ARRAY_3=Nested Array in Object2
+ARRAY2_7_ARRAY_4=true
+ARRAY2_7_ARRAY_5=null
+ARRAY2_7_BOOLEANFALSE=false
+ARRAY2_7_BOOLEANTRUE=true
+ARRAY2_7_FLOATNEGATIVE=-2.71828
+ARRAY2_7_FLOATPOSITIVE=3.14159
+ARRAY2_7_NULLVALUE=null
+ARRAY2_7_NUMERICNEGATIVE=-123
+ARRAY2_7_NUMERICPOSITIVE=42
+ARRAY2_7_SCIENTIFICFLOATNEGATIVE=-2.5e-3
+ARRAY2_7_SCIENTIFICINTPOSITIVE=1e5
+ARRAY2_7_SCIENTIFICSMALLPOSITIVE=1e-4
+ARRAY2_7_STRING=Array Element with Object in Array2
+ARRAY_0=1
+ARRAY_1=-2.345
+ARRAY_2=Array Element
+ARRAY_3=true
+ARRAY_4=false
+ARRAY_5=null
+ARRAY_6_ARRAY_0=null
+ARRAY_6_ARRAY_1=false
+ARRAY_6_ARRAY_2=true
+ARRAY_6_NUMERICNEGATIVE=-654
+ARRAY_6_NUMERICPOSITIVE=987
+ARRAY_6_STRING=Nested Object in Array
+ARRAY_7_ARRAY_0=1
+ARRAY_7_ARRAY_1=-2
+ARRAY_7_ARRAY_2=3
+ARRAY_7_ARRAY_3=Nested Array in Object
+ARRAY_7_ARRAY_4=true
+ARRAY_7_ARRAY_5=null
+ARRAY_7_BOOLEANFALSE=false
+ARRAY_7_BOOLEANTRUE=true
+ARRAY_7_FLOATNEGATIVE=-2.71828
+ARRAY_7_FLOATPOSITIVE=3.14159
+ARRAY_7_NULLVALUE=null
+ARRAY_7_NUMERICNEGATIVE=-123
+ARRAY_7_NUMERICPOSITIVE=42
+ARRAY_7_SCIENTIFICFLOATNEGATIVE=-2.5e-3
+ARRAY_7_SCIENTIFICINTPOSITIVE=1e5
+ARRAY_7_SCIENTIFICSMALLPOSITIVE=1e-4
+ARRAY_7_STRING=Array Element with Object
+BOOLEANFALSE=false
+BOOLEANTRUE=true
+FLOATNEGATIVE=-2.71828
+FLOATPOSITIVE=3.14159
+NULLVALUE=null
+NUMERICNEGATIVE=-123
+NUMERICPOSITIVE=42
+OBJECT_ARRAY_0=1
+OBJECT_ARRAY_1=-2
+OBJECT_ARRAY_2=3
+OBJECT_ARRAY_3=Nested Array
+OBJECT_ARRAY_4=true
+OBJECT_ARRAY_5=null
+OBJECT_BOOLEANFALSE=false
+OBJECT_BOOLEANTRUE=true
+OBJECT_FLOATNEGATIVE=-0.123
+OBJECT_FLOATPOSITIVE=0.987
+OBJECT_NULLVALUE=null
+OBJECT_NUMERICNEGATIVE=-456
+OBJECT_NUMERICPOSITIVE=123
+OBJECT_SCIENTIFICFLOATNEGATIVE=-1.5e-2
+OBJECT_SCIENTIFICINTPOSITIVE=6e4
+OBJECT_SCIENTIFICSMALLPOSITIVE=5e-5
+OBJECT_STRING=Nested Object
+SCIENTIFICFLOATNEGATIVE=-2.5e-3
+SCIENTIFICINTPOSITIVE=1e5
+SCIENTIFICSMALLPOSITIVE=1e-4
+STRING=Hello, World!
+
diff --git a/collectors/log2journal/tests.d/logfmt.log b/collectors/log2journal/tests.d/logfmt.log
new file mode 100644
index 00000000..e55a83bb
--- /dev/null
+++ b/collectors/log2journal/tests.d/logfmt.log
@@ -0,0 +1,5 @@
+key1=value01 key2=value02 key3=value03 key4=value04
+key1=value11 key2=value12 key3=value13 key4=
+key1=value21 key2=value22 key3=value23 key4=value24
+key1=value31 key2=value32 key3=value33 key4=
+key1=value41 key2=value42 key3=value43 key4=value44
diff --git a/collectors/log2journal/tests.d/logfmt.output b/collectors/log2journal/tests.d/logfmt.output
new file mode 100644
index 00000000..4291c966
--- /dev/null
+++ b/collectors/log2journal/tests.d/logfmt.output
@@ -0,0 +1,37 @@
+INJECTED=Key INJECTED had value 'value01 - value02' and now has this, but only on the first row of the log.
+KEY1=value01
+KEY2=value02
+KEY3=value03
+KEY4=value04
+SIMPLE_INJECTION=An unset variable looks like '', while the value of KEY2 is 'value02'
+YET_ANOTHER_INJECTION=value01 - value02 - Key INJECTED had value 'value01 - value02' and now has this, but only on the first row of the log. - this should work because inject is yes
+
+INJECTED=value11 - value12
+KEY1=value11
+KEY2=value12
+KEY3=value13
+SIMPLE_INJECTION=An unset variable looks like '', while the value of KEY2 is 'value12'
+YET_ANOTHER_INJECTION=value11 - value12 - value11 - value12 - this should work because inject is yes
+
+INJECTED=KEY4 has the value 'value24'; it is not empty, so INJECTED has been rewritten.
+KEY1=value21
+KEY2=value22
+KEY3=value23
+KEY4=value24
+SIMPLE_INJECTION=An unset variable looks like '', while the value of KEY2 is 'value22'
+YET_ANOTHER_INJECTION=value21 - value22 - KEY4 has the value 'value24'; it is not empty, so INJECTED has been rewritten. - this should work because inject is yes
+
+INJECTED=value31 - value32
+KEY1=value31
+KEY2=value32
+KEY3=value33
+YET_ANOTHER_INJECTION=value31 - value32 - value31 - value32 - this should work because inject is yes
+
+INJECTED=KEY4 has the value 'value44'; it is not empty, so INJECTED has been rewritten.
+KEY1=value41
+KEY2=value42
+KEY3=value43
+KEY4=value44
+SIMPLE_INJECTION=An unset variable looks like '', while the value of KEY2 is 'value42'
+YET_ANOTHER_INJECTION=value41 - value42 - KEY4 has the value 'value44'; it is not empty, so INJECTED has been rewritten. - this should work because inject is yes
+
diff --git a/collectors/log2journal/tests.d/logfmt.yaml b/collectors/log2journal/tests.d/logfmt.yaml
new file mode 100644
index 00000000..91e93a71
--- /dev/null
+++ b/collectors/log2journal/tests.d/logfmt.yaml
@@ -0,0 +1,34 @@
+pattern: logfmt
+
+inject:
+ - key: SIMPLE_INJECTION
+ value: "An unset variable looks like '${this}', while the value of KEY2 is '${KEY2}'"
+
+rewrite:
+ - key: INJECTED
+ value: "${KEY1} - ${KEY2}"
+ inject: yes
+ stop: no
+
+ - key: INJECTED
+ match: '^value01'
+ value: "Key INJECTED had value '${INJECTED}' and now has this, but only on the first row of the log."
+
+ - key: INJECTED
+ not_empty: "${KEY4}"
+ value: "KEY4 has the value '${KEY4}'; it is not empty, so INJECTED has been rewritten."
+
+ - key: INJECTED
+ match: '^KEY4 has the value'
+ value: "This value should not appear in the logs, because the previous one matched and stopped the pipeline."
+
+ - key: ANOTHER_INJECTION
+ value: "${KEY1} - ${KEY2} - ${INJECTED} - should not work because inject is not true amd ANOTHER_INJECTION is not in the log file."
+
+ - key: YET_ANOTHER_INJECTION
+ value: "${KEY1} - ${KEY2} - ${INJECTED} - this should work because inject is yes"
+ inject: yes
+
+ - key: SIMPLE_INJECTION
+ match: "KEY2 is 'value32'"
+ value: "" # empty, so SIMPLE_INJECTION should not be available on row 3
diff --git a/collectors/log2journal/tests.d/nginx-combined.log b/collectors/log2journal/tests.d/nginx-combined.log
new file mode 100644
index 00000000..b0faa81e
--- /dev/null
+++ b/collectors/log2journal/tests.d/nginx-combined.log
@@ -0,0 +1,14 @@
+2a02:169:1210::2000 - - [30/Nov/2023:19:35:27 +0000] "GET /api/v1/data?chart=system.net&format=json&points=267&group=average&gtime=0&options=ms%7Cflip%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775349 HTTP/1.1" 200 4844 "http://192.168.69.5:19999/" "Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36"
+2a02:169:1210::2000 - - [30/Nov/2023:19:35:27 +0000] "OPTIONS /api/v1/data?chart=netdata.clients&format=array&points=300&group=average&gtime=0&options=absolute%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775358 HTTP/1.1" 200 29 "http://192.168.69.5:19999/" "Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36"
+2a02:169:1210::2000 - - [30/Nov/2023:19:35:27 +0000] "OPTIONS /api/v1/data?chart=netdata.net&format=array&points=300&group=average&gtime=0&options=absolute%7Cjsonwrap%7Cnonzero&after=-300&dimensions=out&_=1701372775359 HTTP/1.1" 200 29 "http://192.168.69.5:19999/" "Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36"
+2a02:169:1210::2000 - - [30/Nov/2023:19:35:27 +0000] "OPTIONS /api/v1/data?chart=netdata.requests&format=array&points=300&group=average&gtime=0&options=absolute%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775357 HTTP/1.1" 200 29 "http://192.168.69.5:19999/" "Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36"
+127.0.0.1 - - [30/Nov/2023:19:35:28 +0000] "GET /stub_status HTTP/1.1" 200 120 "-" "Go-http-client/1.1"
+2a02:169:1210::2000 - - [30/Nov/2023:19:35:28 +0000] "GET /api/v1/data?chart=netdata.net&format=array&points=300&group=average&gtime=0&options=absolute%7Cjsonwrap%7Cnonzero&after=-300&dimensions=out&_=1701372775359 HTTP/1.1" 200 1918 "http://192.168.69.5:19999/" "Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36"
+2a02:169:1210::2000 - - [30/Nov/2023:19:35:28 +0000] "GET /api/v1/data?chart=netdata.requests&format=array&points=300&group=average&gtime=0&options=absolute%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775357 HTTP/1.1" 200 1632 "http://192.168.69.5:19999/" "Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36"
+2a02:169:1210::2000 - - [30/Nov/2023:19:35:28 +0000] "GET /api/v1/data?chart=netdata.clients&format=array&points=300&group=average&gtime=0&options=absolute%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775358 HTTP/1.1" 200 588 "http://192.168.69.5:19999/" "Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36"
+2a02:169:1210::2000 - - [30/Nov/2023:19:35:28 +0000] "OPTIONS /api/v1/data?chart=system.cpu&format=json&points=267&group=average&gtime=0&options=ms%7Cflip%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775360 HTTP/1.1" 200 29 "http://192.168.69.5:19999/" "Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36"
+2a02:169:1210::2000 - - [30/Nov/2023:19:35:28 +0000] "OPTIONS /api/v1/data?chart=netdata.net&format=array&points=300&group=average&gtime=0&options=absolute%7Cjsonwrap%7Cnonzero&after=-300&dimensions=in&_=1701372775361 HTTP/1.1" 200 29 "http://192.168.69.5:19999/" "Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36"
+2a02:169:1210::2000 - - [30/Nov/2023:19:35:28 +0000] "GET /api/v1/data?chart=system.cpu&format=json&points=267&group=average&gtime=0&options=ms%7Cflip%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775360 HTTP/1.1" 200 6085 "http://192.168.69.5:19999/" "Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36"
+2a02:169:1210::2000 - - [30/Nov/2023:19:35:28 +0000] "GET /api/v1/data?chart=netdata.net&format=array&points=300&group=average&gtime=0&options=absolute%7Cjsonwrap%7Cnonzero&after=-300&dimensions=in&_=1701372775361 HTTP/1.1" 200 1918 "http://192.168.69.5:19999/" "Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36"
+2a02:169:1210::2000 - - [30/Nov/2023:19:35:28 +0000] "OPTIONS /api/v1/data?chart=system.io&format=json&points=267&group=average&gtime=0&options=ms%7Cflip%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775362 HTTP/1.1" 200 29 "http://192.168.69.5:19999/" "Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36"
+2a02:169:1210::2000 - - [30/Nov/2023:19:35:28 +0000] "GET /api/v1/data?chart=system.io&format=json&points=267&group=average&gtime=0&options=ms%7Cflip%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775362 HTTP/1.1" 200 3503 "http://192.168.69.5:19999/" "Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36"
diff --git a/collectors/log2journal/tests.d/nginx-combined.output b/collectors/log2journal/tests.d/nginx-combined.output
new file mode 100644
index 00000000..07fd1101
--- /dev/null
+++ b/collectors/log2journal/tests.d/nginx-combined.output
@@ -0,0 +1,210 @@
+MESSAGE=GET /api/v1/data?chart=system.net&format=json&points=267&group=average&gtime=0&options=ms%7Cflip%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775349 HTTP/1.1
+NGINX_BODY_BYTES_SENT=4844
+NGINX_HTTP_REFERER=http://192.168.69.5:19999/
+NGINX_HTTP_USER_AGENT=Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36
+NGINX_REMOTE_ADDR=2a02:169:1210::2000
+NGINX_REMOTE_USER=-
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_URI=/api/v1/data?chart=system.net&format=json&points=267&group=average&gtime=0&options=ms%7Cflip%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775349
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_STATUS=200
+NGINX_STATUS_FAMILY=2xx
+NGINX_TIME_LOCAL=30/Nov/2023:19:35:27 +0000
+PRIORITY=6
+SYSLOG_IDENTIFIER=nginx-log
+
+MESSAGE=OPTIONS /api/v1/data?chart=netdata.clients&format=array&points=300&group=average&gtime=0&options=absolute%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775358 HTTP/1.1
+NGINX_BODY_BYTES_SENT=29
+NGINX_HTTP_REFERER=http://192.168.69.5:19999/
+NGINX_HTTP_USER_AGENT=Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36
+NGINX_REMOTE_ADDR=2a02:169:1210::2000
+NGINX_REMOTE_USER=-
+NGINX_REQUEST_METHOD=OPTIONS
+NGINX_REQUEST_URI=/api/v1/data?chart=netdata.clients&format=array&points=300&group=average&gtime=0&options=absolute%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775358
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_STATUS=200
+NGINX_STATUS_FAMILY=2xx
+NGINX_TIME_LOCAL=30/Nov/2023:19:35:27 +0000
+PRIORITY=6
+SYSLOG_IDENTIFIER=nginx-log
+
+MESSAGE=OPTIONS /api/v1/data?chart=netdata.net&format=array&points=300&group=average&gtime=0&options=absolute%7Cjsonwrap%7Cnonzero&after=-300&dimensions=out&_=1701372775359 HTTP/1.1
+NGINX_BODY_BYTES_SENT=29
+NGINX_HTTP_REFERER=http://192.168.69.5:19999/
+NGINX_HTTP_USER_AGENT=Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36
+NGINX_REMOTE_ADDR=2a02:169:1210::2000
+NGINX_REMOTE_USER=-
+NGINX_REQUEST_METHOD=OPTIONS
+NGINX_REQUEST_URI=/api/v1/data?chart=netdata.net&format=array&points=300&group=average&gtime=0&options=absolute%7Cjsonwrap%7Cnonzero&after=-300&dimensions=out&_=1701372775359
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_STATUS=200
+NGINX_STATUS_FAMILY=2xx
+NGINX_TIME_LOCAL=30/Nov/2023:19:35:27 +0000
+PRIORITY=6
+SYSLOG_IDENTIFIER=nginx-log
+
+MESSAGE=OPTIONS /api/v1/data?chart=netdata.requests&format=array&points=300&group=average&gtime=0&options=absolute%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775357 HTTP/1.1
+NGINX_BODY_BYTES_SENT=29
+NGINX_HTTP_REFERER=http://192.168.69.5:19999/
+NGINX_HTTP_USER_AGENT=Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36
+NGINX_REMOTE_ADDR=2a02:169:1210::2000
+NGINX_REMOTE_USER=-
+NGINX_REQUEST_METHOD=OPTIONS
+NGINX_REQUEST_URI=/api/v1/data?chart=netdata.requests&format=array&points=300&group=average&gtime=0&options=absolute%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775357
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_STATUS=200
+NGINX_STATUS_FAMILY=2xx
+NGINX_TIME_LOCAL=30/Nov/2023:19:35:27 +0000
+PRIORITY=6
+SYSLOG_IDENTIFIER=nginx-log
+
+MESSAGE=GET /stub_status HTTP/1.1
+NGINX_BODY_BYTES_SENT=120
+NGINX_HTTP_REFERER=-
+NGINX_HTTP_USER_AGENT=Go-http-client/1.1
+NGINX_REMOTE_ADDR=127.0.0.1
+NGINX_REMOTE_USER=-
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_URI=/stub_status
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_STATUS=200
+NGINX_STATUS_FAMILY=2xx
+NGINX_TIME_LOCAL=30/Nov/2023:19:35:28 +0000
+PRIORITY=6
+SYSLOG_IDENTIFIER=nginx-log
+
+MESSAGE=GET /api/v1/data?chart=netdata.net&format=array&points=300&group=average&gtime=0&options=absolute%7Cjsonwrap%7Cnonzero&after=-300&dimensions=out&_=1701372775359 HTTP/1.1
+NGINX_BODY_BYTES_SENT=1918
+NGINX_HTTP_REFERER=http://192.168.69.5:19999/
+NGINX_HTTP_USER_AGENT=Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36
+NGINX_REMOTE_ADDR=2a02:169:1210::2000
+NGINX_REMOTE_USER=-
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_URI=/api/v1/data?chart=netdata.net&format=array&points=300&group=average&gtime=0&options=absolute%7Cjsonwrap%7Cnonzero&after=-300&dimensions=out&_=1701372775359
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_STATUS=200
+NGINX_STATUS_FAMILY=2xx
+NGINX_TIME_LOCAL=30/Nov/2023:19:35:28 +0000
+PRIORITY=6
+SYSLOG_IDENTIFIER=nginx-log
+
+MESSAGE=GET /api/v1/data?chart=netdata.requests&format=array&points=300&group=average&gtime=0&options=absolute%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775357 HTTP/1.1
+NGINX_BODY_BYTES_SENT=1632
+NGINX_HTTP_REFERER=http://192.168.69.5:19999/
+NGINX_HTTP_USER_AGENT=Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36
+NGINX_REMOTE_ADDR=2a02:169:1210::2000
+NGINX_REMOTE_USER=-
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_URI=/api/v1/data?chart=netdata.requests&format=array&points=300&group=average&gtime=0&options=absolute%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775357
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_STATUS=200
+NGINX_STATUS_FAMILY=2xx
+NGINX_TIME_LOCAL=30/Nov/2023:19:35:28 +0000
+PRIORITY=6
+SYSLOG_IDENTIFIER=nginx-log
+
+MESSAGE=GET /api/v1/data?chart=netdata.clients&format=array&points=300&group=average&gtime=0&options=absolute%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775358 HTTP/1.1
+NGINX_BODY_BYTES_SENT=588
+NGINX_HTTP_REFERER=http://192.168.69.5:19999/
+NGINX_HTTP_USER_AGENT=Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36
+NGINX_REMOTE_ADDR=2a02:169:1210::2000
+NGINX_REMOTE_USER=-
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_URI=/api/v1/data?chart=netdata.clients&format=array&points=300&group=average&gtime=0&options=absolute%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775358
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_STATUS=200
+NGINX_STATUS_FAMILY=2xx
+NGINX_TIME_LOCAL=30/Nov/2023:19:35:28 +0000
+PRIORITY=6
+SYSLOG_IDENTIFIER=nginx-log
+
+MESSAGE=OPTIONS /api/v1/data?chart=system.cpu&format=json&points=267&group=average&gtime=0&options=ms%7Cflip%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775360 HTTP/1.1
+NGINX_BODY_BYTES_SENT=29
+NGINX_HTTP_REFERER=http://192.168.69.5:19999/
+NGINX_HTTP_USER_AGENT=Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36
+NGINX_REMOTE_ADDR=2a02:169:1210::2000
+NGINX_REMOTE_USER=-
+NGINX_REQUEST_METHOD=OPTIONS
+NGINX_REQUEST_URI=/api/v1/data?chart=system.cpu&format=json&points=267&group=average&gtime=0&options=ms%7Cflip%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775360
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_STATUS=200
+NGINX_STATUS_FAMILY=2xx
+NGINX_TIME_LOCAL=30/Nov/2023:19:35:28 +0000
+PRIORITY=6
+SYSLOG_IDENTIFIER=nginx-log
+
+MESSAGE=OPTIONS /api/v1/data?chart=netdata.net&format=array&points=300&group=average&gtime=0&options=absolute%7Cjsonwrap%7Cnonzero&after=-300&dimensions=in&_=1701372775361 HTTP/1.1
+NGINX_BODY_BYTES_SENT=29
+NGINX_HTTP_REFERER=http://192.168.69.5:19999/
+NGINX_HTTP_USER_AGENT=Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36
+NGINX_REMOTE_ADDR=2a02:169:1210::2000
+NGINX_REMOTE_USER=-
+NGINX_REQUEST_METHOD=OPTIONS
+NGINX_REQUEST_URI=/api/v1/data?chart=netdata.net&format=array&points=300&group=average&gtime=0&options=absolute%7Cjsonwrap%7Cnonzero&after=-300&dimensions=in&_=1701372775361
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_STATUS=200
+NGINX_STATUS_FAMILY=2xx
+NGINX_TIME_LOCAL=30/Nov/2023:19:35:28 +0000
+PRIORITY=6
+SYSLOG_IDENTIFIER=nginx-log
+
+MESSAGE=GET /api/v1/data?chart=system.cpu&format=json&points=267&group=average&gtime=0&options=ms%7Cflip%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775360 HTTP/1.1
+NGINX_BODY_BYTES_SENT=6085
+NGINX_HTTP_REFERER=http://192.168.69.5:19999/
+NGINX_HTTP_USER_AGENT=Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36
+NGINX_REMOTE_ADDR=2a02:169:1210::2000
+NGINX_REMOTE_USER=-
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_URI=/api/v1/data?chart=system.cpu&format=json&points=267&group=average&gtime=0&options=ms%7Cflip%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775360
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_STATUS=200
+NGINX_STATUS_FAMILY=2xx
+NGINX_TIME_LOCAL=30/Nov/2023:19:35:28 +0000
+PRIORITY=6
+SYSLOG_IDENTIFIER=nginx-log
+
+MESSAGE=GET /api/v1/data?chart=netdata.net&format=array&points=300&group=average&gtime=0&options=absolute%7Cjsonwrap%7Cnonzero&after=-300&dimensions=in&_=1701372775361 HTTP/1.1
+NGINX_BODY_BYTES_SENT=1918
+NGINX_HTTP_REFERER=http://192.168.69.5:19999/
+NGINX_HTTP_USER_AGENT=Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36
+NGINX_REMOTE_ADDR=2a02:169:1210::2000
+NGINX_REMOTE_USER=-
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_URI=/api/v1/data?chart=netdata.net&format=array&points=300&group=average&gtime=0&options=absolute%7Cjsonwrap%7Cnonzero&after=-300&dimensions=in&_=1701372775361
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_STATUS=200
+NGINX_STATUS_FAMILY=2xx
+NGINX_TIME_LOCAL=30/Nov/2023:19:35:28 +0000
+PRIORITY=6
+SYSLOG_IDENTIFIER=nginx-log
+
+MESSAGE=OPTIONS /api/v1/data?chart=system.io&format=json&points=267&group=average&gtime=0&options=ms%7Cflip%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775362 HTTP/1.1
+NGINX_BODY_BYTES_SENT=29
+NGINX_HTTP_REFERER=http://192.168.69.5:19999/
+NGINX_HTTP_USER_AGENT=Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36
+NGINX_REMOTE_ADDR=2a02:169:1210::2000
+NGINX_REMOTE_USER=-
+NGINX_REQUEST_METHOD=OPTIONS
+NGINX_REQUEST_URI=/api/v1/data?chart=system.io&format=json&points=267&group=average&gtime=0&options=ms%7Cflip%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775362
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_STATUS=200
+NGINX_STATUS_FAMILY=2xx
+NGINX_TIME_LOCAL=30/Nov/2023:19:35:28 +0000
+PRIORITY=6
+SYSLOG_IDENTIFIER=nginx-log
+
+MESSAGE=GET /api/v1/data?chart=system.io&format=json&points=267&group=average&gtime=0&options=ms%7Cflip%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775362 HTTP/1.1
+NGINX_BODY_BYTES_SENT=3503
+NGINX_HTTP_REFERER=http://192.168.69.5:19999/
+NGINX_HTTP_USER_AGENT=Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36
+NGINX_REMOTE_ADDR=2a02:169:1210::2000
+NGINX_REMOTE_USER=-
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_URI=/api/v1/data?chart=system.io&format=json&points=267&group=average&gtime=0&options=ms%7Cflip%7Cjsonwrap%7Cnonzero&after=-300&_=1701372775362
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_STATUS=200
+NGINX_STATUS_FAMILY=2xx
+NGINX_TIME_LOCAL=30/Nov/2023:19:35:28 +0000
+PRIORITY=6
+SYSLOG_IDENTIFIER=nginx-log
+
diff --git a/collectors/log2journal/tests.d/nginx-json.log b/collectors/log2journal/tests.d/nginx-json.log
new file mode 100644
index 00000000..7e2b5d5f
--- /dev/null
+++ b/collectors/log2journal/tests.d/nginx-json.log
@@ -0,0 +1,9 @@
+{"msec":"1644997905.123","connection":12345,"connection_requests":5,"pid":9876,"request_id":"8f3ebc1e38fbb92f","request_length":345,"remote_addr":"192.168.1.100","remote_user":"john_doe","remote_port":54321,"time_local":"19/Feb/2023:14:15:05 +0000","request":"GET /index.html HTTP/1.1","request_uri":"/index.html?param=value","args":"param=value","status":200,"body_bytes_sent":5432,"bytes_sent":6543,"http_referer":"https://example.com","http_user_agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64)","http_x_forwarded_for":"192.168.1.50, 10.0.0.1","host":"example.com","request_time":0.123,"upstream":"10.0.0.2:8080","upstream_connect_time":0.045,"upstream_header_time":0.020,"upstream_response_time":0.058,"upstream_response_length":7890,"upstream_cache_status":"MISS","ssl_protocol":"TLSv1.2","ssl_cipher":"AES256-SHA256","scheme":"https","request_method":"GET","server_protocol":"HTTP/1.1","pipe":".","gzip_ratio":"2.1","http_cf_ray":"abc123def456","geoip_country_code":"US"}
+{"msec":"1644997910.789","connection":54321,"connection_requests":10,"pid":5432,"request_id":"4a7bca5e19d3f8e7","request_length":432,"remote_addr":"10.0.0.3","remote_user":"","remote_port":12345,"time_local":"19/Feb/2023:14:15:10 +0000","request":"POST /api/update HTTP/1.1","request_uri":"/api/update","args":"","status":204,"body_bytes_sent":0,"bytes_sent":123,"http_referer":"","http_user_agent":"curl/7.68.0","http_x_forwarded_for":"","host":"api.example.com","request_time":0.032,"upstream":"backend-server-1:8080","upstream_connect_time":0.012,"upstream_header_time":0.020,"upstream_response_time":0.010,"upstream_response_length":0,"upstream_cache_status":"","ssl_protocol":"","ssl_cipher":"","scheme":"http","request_method":"POST","server_protocol":"HTTP/1.1","pipe":"p","gzip_ratio":"","http_cf_ray":"","geoip_country_code":""}
+{"msec":"1644997920.456","connection":98765,"connection_requests":15,"pid":1234,"request_id":"63f8ad2c3e1b4090","request_length":567,"remote_addr":"2001:0db8:85a3:0000:0000:8a2e:0370:7334","remote_user":"alice","remote_port":6789,"time_local":"19/Feb/2023:14:15:20 +0000","request":"GET /page?param1=value1&param2=value2 HTTP/2.0","request_uri":"/page?param1=value1&param2=value2","args":"param1=value1&param2=value2","status":404,"body_bytes_sent":0,"bytes_sent":0,"http_referer":"","http_user_agent":"Mozilla/5.0 (Linux; Android 10; Pixel 3)","http_x_forwarded_for":"","host":"example.org","request_time":0.045,"upstream":"","upstream_connect_time":0.0,"upstream_header_time":0.0,"upstream_response_time":0.0,"upstream_response_length":0,"upstream_cache_status":"","ssl_protocol":"","ssl_cipher":"","scheme":"https","request_method":"GET","server_protocol":"HTTP/2.0","pipe":".","gzip_ratio":"","http_cf_ray":"","geoip_country_code":"GB"}
+{"msec":"1644997930.987","connection":123,"connection_requests":3,"pid":5678,"request_id":"9e632a5b24c18f76","request_length":234,"remote_addr":"192.168.0.1","remote_user":"jane_doe","remote_port":9876,"time_local":"19/Feb/2023:14:15:30 +0000","request":"PUT /api/update HTTP/1.1","request_uri":"/api/update","args":"","status":500,"body_bytes_sent":543,"bytes_sent":876,"http_referer":"https://example.com/page","http_user_agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64)","http_x_forwarded_for":"","host":"api.example.com","request_time":0.123,"upstream":"backend-server-2:8080","upstream_connect_time":0.045,"upstream_header_time":0.020,"upstream_response_time":0.058,"upstream_response_length":7890,"upstream_cache_status":"HIT","ssl_protocol":"TLSv1.2","ssl_cipher":"AES256-SHA256","scheme":"https","request_method":"PUT","server_protocol":"HTTP/1.1","pipe":"p","gzip_ratio":"1.8","http_cf_ray":"xyz789abc123","geoip_country_code":"CA"}
+{"msec":"1644997940.234","connection":9876,"connection_requests":8,"pid":4321,"request_id":"1b6c59c8aef7d24a","request_length":456,"remote_addr":"203.0.113.1","remote_user":"","remote_port":5432,"time_local":"19/Feb/2023:14:15:40 +0000","request":"DELETE /api/resource HTTP/2.0","request_uri":"/api/resource","args":"","status":204,"body_bytes_sent":0,"bytes_sent":123,"http_referer":"","http_user_agent":"curl/7.68.0","http_x_forwarded_for":"","host":"api.example.com","request_time":0.032,"upstream":"backend-server-1:8080","upstream_connect_time":0.012,"upstream_header_time":0.020,"upstream_response_time":0.010,"upstream_response_length":0,"upstream_cache_status":"","ssl_protocol":"","ssl_cipher":"","scheme":"http","request_method":"DELETE","server_protocol":"HTTP/2.0","pipe":".","gzip_ratio":"","http_cf_ray":"","geoip_country_code":""}
+{"msec":"1644997950.789","connection":5432,"connection_requests":12,"pid":6543,"request_id":"72692d781d0b8a4f","request_length":789,"remote_addr":"198.51.100.2","remote_user":"bob","remote_port":8765,"time_local":"19/Feb/2023:14:15:50 +0000","request":"GET /profile?user=bob HTTP/1.1","request_uri":"/profile?user=bob","args":"user=bob","status":200,"body_bytes_sent":1234,"bytes_sent":2345,"http_referer":"","http_user_agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64)","http_x_forwarded_for":"","host":"example.com","request_time":0.065,"upstream":"10.0.0.2:8080","upstream_connect_time":0.045,"upstream_header_time":0.020,"upstream_response_time":0.058,"upstream_response_length":7890,"upstream_cache_status":"MISS","ssl_protocol":"TLSv1.3","ssl_cipher":"AES128-GCM-SHA256","scheme":"https","request_method":"GET","server_protocol":"HTTP/1.1","pipe":"p","gzip_ratio":"","http_cf_ray":"","geoip_country_code":"US"}
+{"msec":"1644997960.321","connection":65432,"connection_requests":7,"pid":7890,"request_id":"c3e158d41e75a9d7","request_length":321,"remote_addr":"203.0.113.2","remote_user":"","remote_port":9876,"time_local":"19/Feb/2023:14:15:60 +0000","request":"GET /dashboard HTTP/2.0","request_uri":"/dashboard","args":"","status":301,"body_bytes_sent":0,"bytes_sent":123,"http_referer":"","http_user_agent":"Mozilla/5.0 (Linux; Android 10; Pixel 3)","http_x_forwarded_for":"","host":"dashboard.example.org","request_time":0.032,"upstream":"","upstream_connect_time":0.0,"upstream_header_time":0.0,"upstream_response_time":0.0,"upstream_response_length":0,"upstream_cache_status":"","ssl_protocol":"","ssl_cipher":"","scheme":"https","request_method":"GET","server_protocol":"HTTP/2.0","pipe":".","gzip_ratio":"","http_cf_ray":"","geoip_country_code":""}
+{"msec":"1644997970.555","connection":8765,"connection_requests":9,"pid":8765,"request_id":"f9f6e8235de54af4","request_length":654,"remote_addr":"10.0.0.4","remote_user":"","remote_port":12345,"time_local":"19/Feb/2023:14:15:70 +0000","request":"POST /submit-form HTTP/1.1","request_uri":"/submit-form","args":"","status":201,"body_bytes_sent":876,"bytes_sent":987,"http_referer":"","http_user_agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64)","http_x_forwarded_for":"","host":"example.com","request_time":0.045,"upstream":"backend-server-3:8080","upstream_connect_time":0.012,"upstream_header_time":0.020,"upstream_response_time":0.010,"upstream_response_length":0,"upstream_cache_status":"","ssl_protocol":"","ssl_cipher":"","scheme":"http","request_method":"POST","server_protocol":"HTTP/1.1","pipe":"p","gzip_ratio":"","http_cf_ray":"","geoip_country_code":""}
+{"msec":"1644997980.987","connection":23456,"connection_requests":6,"pid":3456,"request_id":"2ec3e8859e7a406c","request_length":432,"remote_addr":"198.51.100.3","remote_user":"mary","remote_port":5678,"time_local":"19/Feb/2023:14:15:80 +0000","request":"GET /contact HTTP/1.1","request_uri":"/contact","args":"","status":404,"body_bytes_sent":0,"bytes_sent":0,"http_referer":"","http_user_agent":"Mozilla/5.0 (Linux; Android 10; Pixel 3)","http_x_forwarded_for":"","host":"example.org","request_time":0.032,"upstream":"","upstream_connect_time":0.0,"upstream_header_time":0.0,"upstream_response_time":0.0,"upstream_response_length":0,"upstream_cache_status":"","ssl_protocol":"","ssl_cipher":"","scheme":"https","request_method":"GET","server_protocol":"HTTP/1.1","pipe":".","gzip_ratio":"","http_cf_ray":"","geoip_country_code":"FR"}
diff --git a/collectors/log2journal/tests.d/nginx-json.output b/collectors/log2journal/tests.d/nginx-json.output
new file mode 100644
index 00000000..e7db9dcb
--- /dev/null
+++ b/collectors/log2journal/tests.d/nginx-json.output
@@ -0,0 +1,296 @@
+MESSAGE=GET /index.html HTTP/1.1
+NGINX_BODY_BYTES_SENT=5432
+NGINX_BYTES_SENT=6543
+NGINX_CONNECTION=12345
+NGINX_CONNECTION_REQUESTS=5
+NGINX_GEOIP_COUNTRY_CODE=US
+NGINX_GZIP_RATIO=2.1
+NGINX_HOST=example.com
+NGINX_HTTP_CF_RAY=abc123def456
+NGINX_HTTP_REFERER=https://example.com
+NGINX_HTTP_USER_AGENT=Mozilla/5.0 (Windows NT 10.0; Win64; x64)
+NGINX_HTTP_X_FORWARDED_FOR=192.168.1.50, 10.0.0.1
+NGINX_PID=9876
+NGINX_PIPELINED=no
+NGINX_QUERY_STRING=param=value
+NGINX_REMOTE_ADDR=192.168.1.100
+NGINX_REMOTE_PORT=54321
+NGINX_REMOTE_USER=john_doe
+NGINX_REQUEST_ID=8f3ebc1e38fbb92f
+NGINX_REQUEST_LENGTH=345
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_TIME=0.123
+NGINX_REQUEST_URI=/index.html?param=value
+NGINX_SCHEME=https
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_SSL_CIPHER=AES256-SHA256
+NGINX_SSL_PROTOCOL=TLSv1.2
+NGINX_STATUS=200
+NGINX_STATUS_FAMILY=2xx
+NGINX_TIMESTAMP_SEC=1644997905.123
+NGINX_TIME_LOCAL=19/Feb/2023:14:15:05 +0000
+NGINX_UPSTREAM=10.0.0.2:8080
+NGINX_UPSTREAM_CACHE_STATUS=MISS
+NGINX_UPSTREAM_CONNECT_TIME=0.045
+NGINX_UPSTREAM_HEADER_TIME=0.020
+NGINX_UPSTREAM_RESPONSE_LENGTH=7890
+NGINX_UPSTREAM_RESPONSE_TIME=0.058
+PRIORITY=6
+SYSLOG_IDENTIFIER=nginx-log
+
+MESSAGE=POST /api/update HTTP/1.1
+NGINX_BODY_BYTES_SENT=0
+NGINX_BYTES_SENT=123
+NGINX_CONNECTION=54321
+NGINX_CONNECTION_REQUESTS=10
+NGINX_HOST=api.example.com
+NGINX_HTTP_USER_AGENT=curl/7.68.0
+NGINX_PID=5432
+NGINX_PIPELINED=yes
+NGINX_REMOTE_ADDR=10.0.0.3
+NGINX_REMOTE_PORT=12345
+NGINX_REQUEST_ID=4a7bca5e19d3f8e7
+NGINX_REQUEST_LENGTH=432
+NGINX_REQUEST_METHOD=POST
+NGINX_REQUEST_TIME=0.032
+NGINX_REQUEST_URI=/api/update
+NGINX_SCHEME=http
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_STATUS=204
+NGINX_STATUS_FAMILY=2xx
+NGINX_TIMESTAMP_SEC=1644997910.789
+NGINX_TIME_LOCAL=19/Feb/2023:14:15:10 +0000
+NGINX_UPSTREAM=backend-server-1:8080
+NGINX_UPSTREAM_CONNECT_TIME=0.012
+NGINX_UPSTREAM_HEADER_TIME=0.020
+NGINX_UPSTREAM_RESPONSE_LENGTH=0
+NGINX_UPSTREAM_RESPONSE_TIME=0.010
+PRIORITY=6
+SYSLOG_IDENTIFIER=nginx-log
+
+MESSAGE=GET /page?param1=value1&param2=value2 HTTP/2.0
+NGINX_BODY_BYTES_SENT=0
+NGINX_BYTES_SENT=0
+NGINX_CONNECTION=98765
+NGINX_CONNECTION_REQUESTS=15
+NGINX_GEOIP_COUNTRY_CODE=GB
+NGINX_HOST=example.org
+NGINX_HTTP_USER_AGENT=Mozilla/5.0 (Linux; Android 10; Pixel 3)
+NGINX_PID=1234
+NGINX_PIPELINED=no
+NGINX_QUERY_STRING=param1=value1&param2=value2
+NGINX_REMOTE_ADDR=2001:0db8:85a3:0000:0000:8a2e:0370:7334
+NGINX_REMOTE_PORT=6789
+NGINX_REMOTE_USER=alice
+NGINX_REQUEST_ID=63f8ad2c3e1b4090
+NGINX_REQUEST_LENGTH=567
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_TIME=0.045
+NGINX_REQUEST_URI=/page?param1=value1&param2=value2
+NGINX_SCHEME=https
+NGINX_SERVER_PROTOCOL=HTTP/2.0
+NGINX_STATUS=404
+NGINX_STATUS_FAMILY=4xx
+NGINX_TIMESTAMP_SEC=1644997920.456
+NGINX_TIME_LOCAL=19/Feb/2023:14:15:20 +0000
+NGINX_UPSTREAM_CONNECT_TIME=0.0
+NGINX_UPSTREAM_HEADER_TIME=0.0
+NGINX_UPSTREAM_RESPONSE_LENGTH=0
+NGINX_UPSTREAM_RESPONSE_TIME=0.0
+PRIORITY=5
+SYSLOG_IDENTIFIER=nginx-log
+
+MESSAGE=PUT /api/update HTTP/1.1
+NGINX_BODY_BYTES_SENT=543
+NGINX_BYTES_SENT=876
+NGINX_CONNECTION=123
+NGINX_CONNECTION_REQUESTS=3
+NGINX_GEOIP_COUNTRY_CODE=CA
+NGINX_GZIP_RATIO=1.8
+NGINX_HOST=api.example.com
+NGINX_HTTP_CF_RAY=xyz789abc123
+NGINX_HTTP_REFERER=https://example.com/page
+NGINX_HTTP_USER_AGENT=Mozilla/5.0 (Windows NT 10.0; Win64; x64)
+NGINX_PID=5678
+NGINX_PIPELINED=yes
+NGINX_REMOTE_ADDR=192.168.0.1
+NGINX_REMOTE_PORT=9876
+NGINX_REMOTE_USER=jane_doe
+NGINX_REQUEST_ID=9e632a5b24c18f76
+NGINX_REQUEST_LENGTH=234
+NGINX_REQUEST_METHOD=PUT
+NGINX_REQUEST_TIME=0.123
+NGINX_REQUEST_URI=/api/update
+NGINX_SCHEME=https
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_SSL_CIPHER=AES256-SHA256
+NGINX_SSL_PROTOCOL=TLSv1.2
+NGINX_STATUS=500
+NGINX_STATUS_FAMILY=5xx
+NGINX_TIMESTAMP_SEC=1644997930.987
+NGINX_TIME_LOCAL=19/Feb/2023:14:15:30 +0000
+NGINX_UPSTREAM=backend-server-2:8080
+NGINX_UPSTREAM_CACHE_STATUS=HIT
+NGINX_UPSTREAM_CONNECT_TIME=0.045
+NGINX_UPSTREAM_HEADER_TIME=0.020
+NGINX_UPSTREAM_RESPONSE_LENGTH=7890
+NGINX_UPSTREAM_RESPONSE_TIME=0.058
+PRIORITY=3
+SYSLOG_IDENTIFIER=nginx-log
+
+MESSAGE=DELETE /api/resource HTTP/2.0
+NGINX_BODY_BYTES_SENT=0
+NGINX_BYTES_SENT=123
+NGINX_CONNECTION=9876
+NGINX_CONNECTION_REQUESTS=8
+NGINX_HOST=api.example.com
+NGINX_HTTP_USER_AGENT=curl/7.68.0
+NGINX_PID=4321
+NGINX_PIPELINED=no
+NGINX_REMOTE_ADDR=203.0.113.1
+NGINX_REMOTE_PORT=5432
+NGINX_REQUEST_ID=1b6c59c8aef7d24a
+NGINX_REQUEST_LENGTH=456
+NGINX_REQUEST_METHOD=DELETE
+NGINX_REQUEST_TIME=0.032
+NGINX_REQUEST_URI=/api/resource
+NGINX_SCHEME=http
+NGINX_SERVER_PROTOCOL=HTTP/2.0
+NGINX_STATUS=204
+NGINX_STATUS_FAMILY=2xx
+NGINX_TIMESTAMP_SEC=1644997940.234
+NGINX_TIME_LOCAL=19/Feb/2023:14:15:40 +0000
+NGINX_UPSTREAM=backend-server-1:8080
+NGINX_UPSTREAM_CONNECT_TIME=0.012
+NGINX_UPSTREAM_HEADER_TIME=0.020
+NGINX_UPSTREAM_RESPONSE_LENGTH=0
+NGINX_UPSTREAM_RESPONSE_TIME=0.010
+PRIORITY=6
+SYSLOG_IDENTIFIER=nginx-log
+
+MESSAGE=GET /profile?user=bob HTTP/1.1
+NGINX_BODY_BYTES_SENT=1234
+NGINX_BYTES_SENT=2345
+NGINX_CONNECTION=5432
+NGINX_CONNECTION_REQUESTS=12
+NGINX_GEOIP_COUNTRY_CODE=US
+NGINX_HOST=example.com
+NGINX_HTTP_USER_AGENT=Mozilla/5.0 (Windows NT 10.0; Win64; x64)
+NGINX_PID=6543
+NGINX_PIPELINED=yes
+NGINX_QUERY_STRING=user=bob
+NGINX_REMOTE_ADDR=198.51.100.2
+NGINX_REMOTE_PORT=8765
+NGINX_REMOTE_USER=bob
+NGINX_REQUEST_ID=72692d781d0b8a4f
+NGINX_REQUEST_LENGTH=789
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_TIME=0.065
+NGINX_REQUEST_URI=/profile?user=bob
+NGINX_SCHEME=https
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_SSL_CIPHER=AES128-GCM-SHA256
+NGINX_SSL_PROTOCOL=TLSv1.3
+NGINX_STATUS=200
+NGINX_STATUS_FAMILY=2xx
+NGINX_TIMESTAMP_SEC=1644997950.789
+NGINX_TIME_LOCAL=19/Feb/2023:14:15:50 +0000
+NGINX_UPSTREAM=10.0.0.2:8080
+NGINX_UPSTREAM_CACHE_STATUS=MISS
+NGINX_UPSTREAM_CONNECT_TIME=0.045
+NGINX_UPSTREAM_HEADER_TIME=0.020
+NGINX_UPSTREAM_RESPONSE_LENGTH=7890
+NGINX_UPSTREAM_RESPONSE_TIME=0.058
+PRIORITY=6
+SYSLOG_IDENTIFIER=nginx-log
+
+MESSAGE=GET /dashboard HTTP/2.0
+NGINX_BODY_BYTES_SENT=0
+NGINX_BYTES_SENT=123
+NGINX_CONNECTION=65432
+NGINX_CONNECTION_REQUESTS=7
+NGINX_HOST=dashboard.example.org
+NGINX_HTTP_USER_AGENT=Mozilla/5.0 (Linux; Android 10; Pixel 3)
+NGINX_PID=7890
+NGINX_PIPELINED=no
+NGINX_REMOTE_ADDR=203.0.113.2
+NGINX_REMOTE_PORT=9876
+NGINX_REQUEST_ID=c3e158d41e75a9d7
+NGINX_REQUEST_LENGTH=321
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_TIME=0.032
+NGINX_REQUEST_URI=/dashboard
+NGINX_SCHEME=https
+NGINX_SERVER_PROTOCOL=HTTP/2.0
+NGINX_STATUS=301
+NGINX_STATUS_FAMILY=3xx
+NGINX_TIMESTAMP_SEC=1644997960.321
+NGINX_TIME_LOCAL=19/Feb/2023:14:15:60 +0000
+NGINX_UPSTREAM_CONNECT_TIME=0.0
+NGINX_UPSTREAM_HEADER_TIME=0.0
+NGINX_UPSTREAM_RESPONSE_LENGTH=0
+NGINX_UPSTREAM_RESPONSE_TIME=0.0
+PRIORITY=6
+SYSLOG_IDENTIFIER=nginx-log
+
+MESSAGE=POST /submit-form HTTP/1.1
+NGINX_BODY_BYTES_SENT=876
+NGINX_BYTES_SENT=987
+NGINX_CONNECTION=8765
+NGINX_CONNECTION_REQUESTS=9
+NGINX_HOST=example.com
+NGINX_HTTP_USER_AGENT=Mozilla/5.0 (Windows NT 10.0; Win64; x64)
+NGINX_PID=8765
+NGINX_PIPELINED=yes
+NGINX_REMOTE_ADDR=10.0.0.4
+NGINX_REMOTE_PORT=12345
+NGINX_REQUEST_ID=f9f6e8235de54af4
+NGINX_REQUEST_LENGTH=654
+NGINX_REQUEST_METHOD=POST
+NGINX_REQUEST_TIME=0.045
+NGINX_REQUEST_URI=/submit-form
+NGINX_SCHEME=http
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_STATUS=201
+NGINX_STATUS_FAMILY=2xx
+NGINX_TIMESTAMP_SEC=1644997970.555
+NGINX_TIME_LOCAL=19/Feb/2023:14:15:70 +0000
+NGINX_UPSTREAM=backend-server-3:8080
+NGINX_UPSTREAM_CONNECT_TIME=0.012
+NGINX_UPSTREAM_HEADER_TIME=0.020
+NGINX_UPSTREAM_RESPONSE_LENGTH=0
+NGINX_UPSTREAM_RESPONSE_TIME=0.010
+PRIORITY=6
+SYSLOG_IDENTIFIER=nginx-log
+
+MESSAGE=GET /contact HTTP/1.1
+NGINX_BODY_BYTES_SENT=0
+NGINX_BYTES_SENT=0
+NGINX_CONNECTION=23456
+NGINX_CONNECTION_REQUESTS=6
+NGINX_GEOIP_COUNTRY_CODE=FR
+NGINX_HOST=example.org
+NGINX_HTTP_USER_AGENT=Mozilla/5.0 (Linux; Android 10; Pixel 3)
+NGINX_PID=3456
+NGINX_PIPELINED=no
+NGINX_REMOTE_ADDR=198.51.100.3
+NGINX_REMOTE_PORT=5678
+NGINX_REMOTE_USER=mary
+NGINX_REQUEST_ID=2ec3e8859e7a406c
+NGINX_REQUEST_LENGTH=432
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_TIME=0.032
+NGINX_REQUEST_URI=/contact
+NGINX_SCHEME=https
+NGINX_SERVER_PROTOCOL=HTTP/1.1
+NGINX_STATUS=404
+NGINX_STATUS_FAMILY=4xx
+NGINX_TIMESTAMP_SEC=1644997980.987
+NGINX_TIME_LOCAL=19/Feb/2023:14:15:80 +0000
+NGINX_UPSTREAM_CONNECT_TIME=0.0
+NGINX_UPSTREAM_HEADER_TIME=0.0
+NGINX_UPSTREAM_RESPONSE_LENGTH=0
+NGINX_UPSTREAM_RESPONSE_TIME=0.0
+PRIORITY=5
+SYSLOG_IDENTIFIER=nginx-log
+
diff --git a/collectors/log2journal/tests.sh b/collectors/log2journal/tests.sh
new file mode 100755
index 00000000..40243886
--- /dev/null
+++ b/collectors/log2journal/tests.sh
@@ -0,0 +1,148 @@
+#!/usr/bin/env bash
+
+if [ -f "${PWD}/log2journal" ]; then
+ log2journal_bin="${PWD}/log2journal"
+else
+ log2journal_bin="$(which log2journal)"
+fi
+
+[ -z "${log2journal_bin}" ] && echo >&2 "Cannot find log2journal binary" && exit 1
+echo >&2 "Using: ${log2journal_bin}"
+
+script_dir=$(dirname "$(readlink -f "$0")")
+tests="${script_dir}/tests.d"
+
+if [ ! -d "${tests}" ]; then
+ echo >&2 "tests directory '${tests}' is not found."
+ exit 1
+fi
+
+# Create a random directory name in /tmp
+tmp=$(mktemp -d /tmp/script_temp.XXXXXXXXXX)
+
+# Function to clean up the temporary directory on exit
+cleanup() {
+ echo "Cleaning up..."
+ rm -rf "$tmp"
+}
+
+# Register the cleanup function to run on script exit
+trap cleanup EXIT
+
+# Change to the temporary directory
+cd "$tmp" || exit 1
+
+# -----------------------------------------------------------------------------
+
+test_log2journal_config() {
+ local in="${1}"
+ local out="${2}"
+ shift 2
+
+ [ -f output ] && rm output
+
+ printf >&2 "running: "
+ printf >&2 "%q " "${log2journal_bin}" "${@}"
+ printf >&2 "\n"
+
+ "${log2journal_bin}" <"${in}" "${@}" >output 2>&1
+ ret=$?
+
+ [ $ret -ne 0 ] && echo >&2 "${log2journal_bin} exited with code: $ret" && cat output && exit 1
+
+ diff --ignore-all-space "${out}" output
+ [ $? -ne -0 ] && echo >&2 "${log2journal_bin} output does not match!" && exit 1
+
+ echo >&2 "OK"
+ echo >&2
+
+ return 0
+}
+
+# test yaml parsing
+echo >&2
+echo >&2 "Testing full yaml config parsing..."
+test_log2journal_config /dev/null "${tests}/full.output" -f "${tests}/full.yaml" --show-config || exit 1
+
+echo >&2 "Testing command line parsing..."
+test_log2journal_config /dev/null "${tests}/full.output" --show-config \
+ --prefix=NGINX_ \
+ --filename-key NGINX_LOG_FILENAME \
+ --inject SYSLOG_IDENTIFIER=nginx-log \
+ --inject=SYSLOG_IDENTIFIER2=nginx-log2 \
+ --inject 'PRIORITY=${NGINX_STATUS}' \
+ --inject='NGINX_STATUS_FAMILY=${NGINX_STATUS}${NGINX_METHOD}' \
+ --rewrite 'PRIORITY=//${NGINX_STATUS}/inject,dont-stop' \
+ --rewrite "PRIORITY=/^[123]/6" \
+ --rewrite='PRIORITY=|^4|5' \
+ '--rewrite=PRIORITY=-^5-3' \
+ --rewrite "PRIORITY=;.*;4" \
+ --rewrite 'NGINX_STATUS_FAMILY=|^(?<first_digit>[1-5])|${first_digit}xx' \
+ --rewrite 'NGINX_STATUS_FAMILY=|.*|UNKNOWN' \
+ --rename TEST1=TEST2 \
+ --rename=TEST3=TEST4 \
+ --unmatched-key MESSAGE \
+ --inject-unmatched PRIORITY=1 \
+ --inject-unmatched=PRIORITY2=2 \
+ --include=".*" \
+ --exclude ".*HELLO.*WORLD.*" \
+ '(?x) # Enable PCRE2 extended mode
+ ^
+ (?<NGINX_REMOTE_ADDR>[^ ]+) \s - \s # NGINX_REMOTE_ADDR
+ (?<NGINX_REMOTE_USER>[^ ]+) \s # NGINX_REMOTE_USER
+ \[
+ (?<NGINX_TIME_LOCAL>[^\]]+) # NGINX_TIME_LOCAL
+ \]
+ \s+ "
+ (?<MESSAGE>
+ (?<NGINX_METHOD>[A-Z]+) \s+ # NGINX_METHOD
+ (?<NGINX_URL>[^ ]+) \s+
+ HTTP/(?<NGINX_HTTP_VERSION>[^"]+)
+ )
+ " \s+
+ (?<NGINX_STATUS>\d+) \s+ # NGINX_STATUS
+ (?<NGINX_BODY_BYTES_SENT>\d+) \s+ # NGINX_BODY_BYTES_SENT
+ "(?<NGINX_HTTP_REFERER>[^"]*)" \s+ # NGINX_HTTP_REFERER
+ "(?<NGINX_HTTP_USER_AGENT>[^"]*)" # NGINX_HTTP_USER_AGENT' \
+ || exit 1
+
+# -----------------------------------------------------------------------------
+
+test_log2journal() {
+ local n="${1}"
+ local in="${2}"
+ local out="${3}"
+ shift 3
+
+ printf >&2 "running test No ${n}: "
+ printf >&2 "%q " "${log2journal_bin}" "${@}"
+ printf >&2 "\n"
+ echo >&2 "using as input : ${in}"
+ echo >&2 "expecting output: ${out}"
+
+ [ -f output ] && rm output
+
+ "${log2journal_bin}" <"${in}" "${@}" >output 2>&1
+ ret=$?
+
+ [ $ret -ne 0 ] && echo >&2 "${log2journal_bin} exited with code: $ret" && cat output && exit 1
+
+ diff "${out}" output
+ [ $? -ne -0 ] && echo >&2 "${log2journal_bin} output does not match! - here is what we got:" && cat output && exit 1
+
+ echo >&2 "OK"
+ echo >&2
+
+ return 0
+}
+
+echo >&2
+echo >&2 "Testing parsing and output..."
+
+test_log2journal 1 "${tests}/json.log" "${tests}/json.output" json
+test_log2journal 2 "${tests}/json.log" "${tests}/json-include.output" json --include "OBJECT"
+test_log2journal 3 "${tests}/json.log" "${tests}/json-exclude.output" json --exclude "ARRAY[^2]"
+test_log2journal 4 "${tests}/nginx-json.log" "${tests}/nginx-json.output" -f "${script_dir}/log2journal.d/nginx-json.yaml"
+test_log2journal 5 "${tests}/nginx-combined.log" "${tests}/nginx-combined.output" -f "${script_dir}/log2journal.d/nginx-combined.yaml"
+test_log2journal 6 "${tests}/logfmt.log" "${tests}/logfmt.output" -f "${tests}/logfmt.yaml"
+test_log2journal 7 "${tests}/logfmt.log" "${tests}/default.output" -f "${script_dir}/log2journal.d/default.yaml"