summaryrefslogtreecommitdiffstats
path: root/src/collectors/log2journal/README.md
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-11-25 17:33:56 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-11-25 17:34:10 +0000
commit83ba6762cc43d9db581b979bb5e3445669e46cc2 (patch)
tree2e69833b43f791ed253a7a20318b767ebe56cdb8 /src/collectors/log2journal/README.md
parentReleasing debian version 1.47.5-1. (diff)
downloadnetdata-83ba6762cc43d9db581b979bb5e3445669e46cc2.tar.xz
netdata-83ba6762cc43d9db581b979bb5e3445669e46cc2.zip
Merging upstream version 2.0.3+dfsg (Closes: #923993, #1042533, #1045145).
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'src/collectors/log2journal/README.md')
-rw-r--r--src/collectors/log2journal/README.md37
1 files changed, 17 insertions, 20 deletions
diff --git a/src/collectors/log2journal/README.md b/src/collectors/log2journal/README.md
index 9807b33ee..d9764d5d5 100644
--- a/src/collectors/log2journal/README.md
+++ b/src/collectors/log2journal/README.md
@@ -1,4 +1,3 @@
-
# log2journal
`log2journal` and `systemd-cat-native` can be used to convert a structured log file, such as the ones generated by web servers, into `systemd-journal` entries.
@@ -11,7 +10,6 @@ The result is like this: nginx logs into systemd-journal:
![image](https://github.com/netdata/netdata/assets/2662304/16b471ff-c5a1-4fcc-bcd5-83551e089f6c)
-
The overall process looks like this:
```bash
@@ -23,7 +21,8 @@ tail -F /var/log/nginx/*.log |\ # outputs log lines
These are the steps:
1. `tail -F /var/log/nginx/*.log`<br/>this command will tail all `*.log` files in `/var/log/nginx/`. We use `-F` instead of `-f` to ensure that files will still be tailed after log rotation.
-2. `log2joural` is a Netdata program. It reads log entries and extracts fields, according to the PCRE2 pattern it accepts. It can also apply some basic operations on the fields, like injecting new fields or duplicating existing ones or rewriting their values. The output of `log2journal` is in Systemd Journal Export Format, and it looks like this:
+2. `log2journal` is a Netdata program. It reads log entries and extracts fields, according to the PCRE2 pattern it accepts. It can also apply some basic operations on the fields, like injecting new fields or duplicating existing ones or rewriting their values. The output of `log2journal` is in Systemd Journal Export Format, and it looks like this:
+
```bash
KEY1=VALUE1 # << start of the first log line
KEY2=VALUE2
@@ -31,8 +30,8 @@ These are the steps:
KEY1=VALUE1 # << start of the second log line
KEY2=VALUE2
```
-3. `systemd-cat-native` is a Netdata program. I can send the logs to a local `systemd-journald` (journal namespaces supported), or to a remote `systemd-journal-remote`.
+3. `systemd-cat-native` is a Netdata program. I can send the logs to a local `systemd-journald` (journal namespaces supported), or to a remote `systemd-journal-remote`.
## Processing pipeline
@@ -44,19 +43,19 @@ The sequence of processing in Netdata's `log2journal` is designed to methodicall
2. **Extract Fields and Values**<br/>
Based on the input format (JSON, logfmt, or custom pattern), it extracts fields and their values from each log line. In the case of JSON and logfmt, it automatically extracts all fields. For custom patterns, it uses PCRE2 regular expressions, and fields are extracted based on sub-expressions defined in the pattern.
-3. **Transliteration**<br/>
+3. **Transliteration**<br/>
Extracted fields are transliterated to the limited character set accepted by systemd-journal: capitals A-Z, digits 0-9, underscores.
4. **Apply Optional Prefix**<br/>
If a prefix is specified, it is added to all keys. This happens before any other processing so that all subsequent matches and manipulations take the prefix into account.
-5. **Rename Fields**<br/>
+5. **Rename Fields**<br/>
Renames fields as specified in the configuration. This is used to change the names of the fields to match desired or required naming conventions.
6. **Inject New Fields**<br/>
New fields are injected into the log data. This can include constants or values derived from other fields, using variable substitution.
-7. **Rewrite Field Values**<br/>
+7. **Rewrite Field Values**<br/>
Applies rewriting rules to alter the values of the fields. This can involve complex transformations, including regular expressions and variable substitutions. The rewrite rules can also inject new fields into the data.
8. **Filter Fields**<br/>
@@ -81,7 +80,7 @@ We have an nginx server logging in this standard combined log format:
First, let's find the right pattern for `log2journal`. We ask ChatGPT:
-```
+```text
My nginx log uses this log format:
log_format access '$remote_addr - $remote_user [$time_local] '
@@ -122,11 +121,11 @@ ChatGPT replies with this:
Let's see what the above says:
1. `(?x)`: enable PCRE2 extended mode. In this mode spaces and newlines in the pattern are ignored. To match a space you have to use `\s`. This mode allows us to split the pattern is multiple lines and add comments to it.
-1. `^`: match the beginning of the line
-2. `(?<remote_addr[^ ]+)`: match anything up to the first space (`[^ ]+`), and name it `remote_addr`.
-3. `\s`: match a space
-4. `-`: match a hyphen
-5. and so on...
+2. `^`: match the beginning of the line
+3. `(?<remote_addr[^ ]+)`: match anything up to the first space (`[^ ]+`), and name it `remote_addr`.
+4. `\s`: match a space
+5. `-`: match a hyphen
+6. and so on...
We edit `nginx.yaml` and add it, like this:
@@ -427,7 +426,6 @@ Rewrite rules are powerful. You can have named groups in them, like in the main
Now the message is ready to be sent to a systemd-journal. For this we use `systemd-cat-native`. This command can send such messages to a journal running on the localhost, a local journal namespace, or a `systemd-journal-remote` running on another server. By just appending `| systemd-cat-native` to the command, the message will be sent to the local journal.
-
```bash
# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml | systemd-cat-native
# no output
@@ -486,7 +484,7 @@ tail -F /var/log/nginx/access.log |\
Create the file `/etc/systemd/system/nginx-logs.service` (change `/path/to/nginx.yaml` to the right path):
-```
+```text
[Unit]
Description=NGINX Log to Systemd Journal
After=network.target
@@ -524,7 +522,6 @@ Netdata will automatically pick the new namespace and present it at the list of
You can also instruct `systemd-cat-native` to log to a remote system, sending the logs to a `systemd-journal-remote` instance running on another server. Check [the manual of systemd-cat-native](/src/libnetdata/log/systemd-cat-native.md).
-
## Performance
`log2journal` and `systemd-cat-native` have been designed to process hundreds of thousands of log lines per second. They both utilize high performance indexing hashtables to speed up lookups, and queues that dynamically adapt to the number of log lines offered, offering a smooth and fast experience under all conditions.
@@ -537,15 +534,15 @@ The key characteristic that can influence the performance of a logs processing p
Especially the pattern `.*` seems to have the biggest impact on CPU consumption, especially when multiple `.*` are on the same pattern.
-Usually we use `.*` to indicate that we need to match everything up to a character, e.g. `.* ` to match up to a space. By replacing it with `[^ ]+` (meaning: match at least a character up to a space), the regular expression engine can be a lot more efficient, reducing the overall CPU utilization significantly.
+Usually we use `.*` to indicate that we need to match everything up to a character, e.g. `.*` to match up to a space. By replacing it with `[^ ]+` (meaning: match at least a character up to a space), the regular expression engine can be a lot more efficient, reducing the overall CPU utilization significantly.
### Performance of systemd journals
The ingestion pipeline of logs, from `tail` to `systemd-journald` or `systemd-journal-remote` is very efficient in all aspects. CPU utilization is better than any other system we tested and RAM usage is independent of the number of fields indexed, making systemd-journal one of the most efficient log management engines for ingesting high volumes of structured logs.
-High fields cardinality does not have a noticable impact on systemd-journal. The amount of fields indexed and the amount of unique values per field, have a linear and predictable result in the resource utilization of `systemd-journald` and `systemd-journal-remote`. This is unlike other logs management solutions, like Loki, that their RAM requirements grow exponentially as the cardinality increases, making it impractical for them to index the amount of information systemd journals can index.
+High fields cardinality does not have a noticeable impact on systemd-journal. The amount of fields indexed and the amount of unique values per field, have a linear and predictable result in the resource utilization of `systemd-journald` and `systemd-journal-remote`. This is unlike other logs management solutions, like Loki, that their RAM requirements grow exponentially as the cardinality increases, making it impractical for them to index the amount of information systemd journals can index.
-However, the number of fields added to journals influences the overall disk footprint. Less fields means more log entries per journal file, smaller overall disk footprint and faster queries.
+However, the number of fields added to journals influences the overall disk footprint. Less fields means more log entries per journal file, smaller overall disk footprint and faster queries.
systemd-journal files are primarily designed for security and reliability. This comes at the cost of disk footprint. The internal structure of journal files is such that in case of corruption, minimum data loss will incur. To achieve such a unique characteristic, certain data within the files need to be aligned at predefined boundaries, so that in case there is a corruption, non-corrupted parts of the journal file can be recovered.
@@ -578,7 +575,7 @@ If on other hand your organization prefers to maintain the full logs and control
## `log2journal` options
-```
+```text
Netdata log2journal v1.43.0-341-gdac4df856