collectors/log2journal/README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912

# log2journal

`log2journal` and `systemd-cat-native` can be used to convert a structured log file, such as the ones generated by web servers, into `systemd-journal` entries.

By combining these tools you can create advanced log processing pipelines sending any kind of structured text logs to systemd-journald. This is a simple, but powerful and efficient way to handle log processing.

The process involves the usual piping of shell commands, to get and process the log files in realtime.

The result is like this: nginx logs into systemd-journal:

![image](https://github.com/netdata/netdata/assets/2662304/16b471ff-c5a1-4fcc-bcd5-83551e089f6c)


The overall process looks like this:

```bash
tail -F /var/log/nginx/*.log       |\  # outputs log lines
  log2journal 'PATTERN'            |\  # outputs Journal Export Format
  systemd-cat-native                   # send to local/remote journald
```

These are the steps:

1. `tail -F /var/log/nginx/*.log`<br/>this command will tail all `*.log` files in `/var/log/nginx/`. We use `-F` instead of `-f` to ensure that files will still be tailed after log rotation.
2. `log2joural` is a Netdata program. It reads log entries and extracts fields, according to the PCRE2 pattern it accepts. It can also apply some basic operations on the fields, like injecting new fields or duplicating existing ones or rewriting their values. The output of `log2journal` is in Systemd Journal Export Format, and it looks like this:
    ```bash
   KEY1=VALUE1 # << start of the first log line
   KEY2=VALUE2
               # << log lines separator
   KEY1=VALUE1 # << start of the second log line
   KEY2=VALUE2
    ```
3. `systemd-cat-native` is a Netdata program. I can send the logs to a local `systemd-journald` (journal namespaces supported), or to a remote `systemd-journal-remote`.


## Processing pipeline

The sequence of processing in Netdata's `log2journal` is designed to methodically transform and prepare log data for export in the systemd Journal Export Format. This transformation occurs through a pipeline of stages, each with a specific role in processing the log entries. Here's a description of each stage in the sequence:

1. **Input**<br/>
  The tool reads one log line at a time from the input source. It supports different input formats such as JSON, logfmt, and free-form logs defined by PCRE2 patterns.

2. **Extract Fields and Values**<br/>
  Based on the input format (JSON, logfmt, or custom pattern), it extracts fields and their values from each log line. In the case of JSON and logfmt, it automatically extracts all fields. For custom patterns, it uses PCRE2 regular expressions, and fields are extracted based on sub-expressions defined in the pattern.

3. **Transliteration**<br/> 
  Extracted fields are transliterated to the limited character set accepted by systemd-journal: capitals A-Z, digits 0-9, underscores.

4. **Apply Optional Prefix**<br/>
  If a prefix is specified, it is added to all keys. This happens before any other processing so that all subsequent matches and manipulations take the prefix into account.

5. **Rename Fields**<br/> 
  Renames fields as specified in the configuration. This is used to change the names of the fields to match desired or required naming conventions.

6. **Inject New Fields**<br/>
  New fields are injected into the log data. This can include constants or values derived from other fields, using variable substitution.

7. **Rewrite Field Values**<br/> 
  Applies rewriting rules to alter the values of the fields. This can involve complex transformations, including regular expressions and variable substitutions. The rewrite rules can also inject new fields into the data.

8. **Filter Fields**<br/>
  Fields are filtered based on include and exclude patterns. This stage selects which fields are to be sent to the journal, allowing for selective logging.

9. **Output**<br/>
  Finally, the processed log data is output in the Journal Export Format. This format is compatible with systemd's journaling system and can be sent to local or remote systemd journal systems, by piping the output of `log2journal` to `systemd-cat-native`.

This pipeline ensures a flexible and comprehensive approach to log processing, allowing for a wide range of modifications and customizations to fit various logging requirements. Each stage builds upon the previous one, enabling complex log transformations and enrichments before the data is exported to the systemd journal.

## Real-life example

We have an nginx server logging in this standard combined log format:

```bash
        log_format combined '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent"';
```

### Extracting fields with a pattern

First, let's find the right pattern for `log2journal`. We ask ChatGPT:

```
My nginx log uses this log format:

log_format access '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent"';

I want to use `log2joural` to convert this log for systemd-journal.
`log2journal` accepts a PCRE2 regular expression, using the named groups
in the pattern as the journal fields to extract from the logs.

Please give me the PCRE2 pattern to extract all the fields from my nginx
log files.
```

ChatGPT replies with this:

```regexp
  (?x) # Enable PCRE2 extended mode
  ^
  (?<remote_addr>[^ ]+) \s - \s
  (?<remote_user>[^ ]+) \s
  \[
    (?<time_local>[^\]]+)
  \]
  \s+ "
  (?<request>
    (?<request_method>[A-Z]+) \s+
    (?<request_uri>[^ ]+) \s+
    (?<server_protocol>[^"]+)
  )
  " \s+
  (?<status>\d+) \s+
  (?<body_bytes_sent>\d+) \s+
  "(?<http_referer>[^"]*)" \s+
  "(?<http_user_agent>[^"]*)"
```

Let's see what the above says:

1. `(?x)`: enable PCRE2 extended mode. In this mode spaces and newlines in the pattern are ignored. To match a space you have to use `\s`. This mode allows us to split the pattern is multiple lines and add comments to it.
1. `^`: match the beginning of the line
2. `(?<remote_addr[^ ]+)`: match anything up to the first space (`[^ ]+`), and name it `remote_addr`.
3. `\s`: match a space
4. `-`: match a hyphen
5. and so on...

We edit `nginx.yaml` and add it, like this:

```yaml
pattern: |
  (?x) # Enable PCRE2 extended mode
  ^
  (?<remote_addr>[^ ]+) \s - \s
  (?<remote_user>[^ ]+) \s
  \[
    (?<time_local>[^\]]+)
  \]
  \s+ "
  (?<request>
    (?<request_method>[A-Z]+) \s+
    (?<request_uri>[^ ]+) \s+
    (?<server_protocol>[^"]+)
  )
  " \s+
  (?<status>\d+) \s+
  (?<body_bytes_sent>\d+) \s+
  "(?<http_referer>[^"]*)" \s+
  "(?<http_user_agent>[^"]*)"
```

Let's test it with a sample line (instead of `tail`):

```bash
# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
BODY_BYTES_SENT=4172
HTTP_REFERER=-
HTTP_USER_AGENT=Go-http-client/1.1
REMOTE_ADDR=1.2.3.4
REMOTE_USER=-
REQUEST=GET /index.html HTTP/1.1
REQUEST_METHOD=GET
REQUEST_URI=/index.html
SERVER_PROTOCOL=HTTP/1.1
STATUS=200
TIME_LOCAL=19/Nov/2023:00:24:43 +0000

```

As you can see, it extracted all the fields and made them capitals, as systemd-journal expects them.

### Prefixing field names

To make sure the fields are unique for nginx and do not interfere with other applications, we should prefix them with `NGINX_`:

```yaml
pattern: |
  (?x) # Enable PCRE2 extended mode
  ^
  (?<remote_addr>[^ ]+) \s - \s
  (?<remote_user>[^ ]+) \s
  \[
    (?<time_local>[^\]]+)
  \]
  \s+ "
  (?<request>
    (?<request_method>[A-Z]+) \s+
    (?<request_uri>[^ ]+) \s+
    (?<server_protocol>[^"]+)
  )
  " \s+
  (?<status>\d+) \s+
  (?<body_bytes_sent>\d+) \s+
  "(?<http_referer>[^"]*)" \s+
  "(?<http_user_agent>[^"]*)"

prefix: 'NGINX_'                    # <<< we added this
```

And let's try it:

```bash
# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
NGINX_BODY_BYTES_SENT=4172
NGINX_HTTP_REFERER=-
NGINX_HTTP_USER_AGENT=Go-http-client/1.1
NGINX_REMOTE_ADDR=1.2.3.4
NGINX_REMOTE_USER=-
NGINX_REQUEST=GET /index.html HTTP/1.1
NGINX_REQUEST_METHOD=GET
NGINX_REQUEST_URI=/index.html
NGINX_SERVER_PROTOCOL=HTTP/1.1
NGINX_STATUS=200
NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000

```

### Renaming fields

Now, all fields start with `NGINX_` but we want `NGINX_REQUEST` to be the `MESSAGE` of the log line, as we will see it by default in `journalctl` and the Netdata dashboard. Let's rename it:

```yaml
pattern: |
  (?x) # Enable PCRE2 extended mode
  ^
  (?<remote_addr>[^ ]+) \s - \s
  (?<remote_user>[^ ]+) \s
  \[
    (?<time_local>[^\]]+)
  \]
  \s+ "
  (?<request>
    (?<request_method>[A-Z]+) \s+
    (?<request_uri>[^ ]+) \s+
    (?<server_protocol>[^"]+)
  )
  " \s+
  (?<status>\d+) \s+
  (?<body_bytes_sent>\d+) \s+
  "(?<http_referer>[^"]*)" \s+
  "(?<http_user_agent>[^"]*)"

prefix: 'NGINX_'

rename:                         # <<< we added this
  - new_key: MESSAGE            # <<< we added this
    old_key: NGINX_REQUEST      # <<< we added this
```

Let's test it:

```bash
# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
MESSAGE=GET /index.html HTTP/1.1          # <<< renamed !
NGINX_BODY_BYTES_SENT=4172
NGINX_HTTP_REFERER=-
NGINX_HTTP_USER_AGENT=Go-http-client/1.1
NGINX_REMOTE_ADDR=1.2.3.4
NGINX_REMOTE_USER=-
NGINX_REQUEST_METHOD=GET
NGINX_REQUEST_URI=/index.html
NGINX_SERVER_PROTOCOL=HTTP/1.1
NGINX_STATUS=200
NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000

```

### Injecting new fields

To have a complete message in journals we need 3 fields: `MESSAGE`, `PRIORITY` and `SYSLOG_IDENTIFIER`. We have already added `MESSAGE` by renaming `NGINX_REQUEST`. We can also inject a `SYSLOG_IDENTIFIER` and `PRIORITY`.

Ideally, we would want the 5xx errors to be red in our `journalctl` output and the dashboard. To achieve that we need to set the `PRIORITY` field to the right log level. Log priorities are numeric and follow the `syslog` priorities. Checking `/usr/include/sys/syslog.h` we can see these:

```c
#define LOG_EMERG       0       /* system is unusable */
#define LOG_ALERT       1       /* action must be taken immediately */
#define LOG_CRIT        2       /* critical conditions */
#define LOG_ERR         3       /* error conditions */
#define LOG_WARNING     4       /* warning conditions */
#define LOG_NOTICE      5       /* normal but significant condition */
#define LOG_INFO        6       /* informational */
#define LOG_DEBUG       7       /* debug-level messages */
```

Avoid setting priority to 0 (`LOG_EMERG`), because these will be on your terminal (the journal uses `wall` to let you know of such events). A good priority for errors is 3 (red), or 4 (yellow).

To set the PRIORITY field in the output, we can use `NGINX_STATUS`. We will do this in 2 steps: a) inject the priority field as a copy is `NGINX_STATUS` and then b) use a pattern on its value to rewrite it to the priority level we want.

First, let's inject `SYSLOG_IDENTIFIER` and `PRIORITY`:

```yaml
pattern: |
  (?x) # Enable PCRE2 extended mode
  ^
  (?<remote_addr>[^ ]+) \s - \s
  (?<remote_user>[^ ]+) \s
  \[
    (?<time_local>[^\]]+)
  \]
  \s+ "
  (?<request>
    (?<request_method>[A-Z]+) \s+
    (?<request_uri>[^ ]+) \s+
    (?<server_protocol>[^"]+)
  )
  " \s+
  (?<status>\d+) \s+
  (?<body_bytes_sent>\d+) \s+
  "(?<http_referer>[^"]*)" \s+
  "(?<http_user_agent>[^"]*)"

prefix: 'NGINX_'

rename:                         
  - new_key: MESSAGE            
    old_key: NGINX_REQUEST      

inject:                         # <<< we added this
  - key: PRIORITY               # <<< we added this
    value: '${NGINX_STATUS}'    # <<< we added this
    
  - key: SYSLOG_IDENTIFIER      # <<< we added this
    value: 'nginx-log'          # <<< we added this
```

Let's see what this does:

```bash
# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
MESSAGE=GET /index.html HTTP/1.1
NGINX_BODY_BYTES_SENT=4172
NGINX_HTTP_REFERER=-
NGINX_HTTP_USER_AGENT=Go-http-client/1.1
NGINX_REMOTE_ADDR=1.2.3.4
NGINX_REMOTE_USER=-
NGINX_REQUEST_METHOD=GET
NGINX_REQUEST_URI=/index.html
NGINX_SERVER_PROTOCOL=HTTP/1.1
NGINX_STATUS=200
NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
PRIORITY=200                         # <<< PRIORITY added
SYSLOG_IDENTIFIER=nginx-log          # <<< SYSLOG_IDENTIFIER added

```

### Rewriting field values

Now we need to rewrite `PRIORITY` to the right syslog level based on its value (`NGINX_STATUS`). We will assign the priority 6 (info) when the status is 1xx, 2xx, 3xx, priority 5 (notice) when status is 4xx, priority 3 (error) when status is 5xx and anything else will go to priority 4 (warning). Let's do it:

```yaml
pattern: |
  (?x) # Enable PCRE2 extended mode
  ^
  (?<remote_addr>[^ ]+) \s - \s
  (?<remote_user>[^ ]+) \s
  \[
    (?<time_local>[^\]]+)
  \]
  \s+ "
  (?<request>
    (?<request_method>[A-Z]+) \s+
    (?<request_uri>[^ ]+) \s+
    (?<server_protocol>[^"]+)
  )
  " \s+
  (?<status>\d+) \s+
  (?<body_bytes_sent>\d+) \s+
  "(?<http_referer>[^"]*)" \s+
  "(?<http_user_agent>[^"]*)"

prefix: 'NGINX_'

rename:                         
  - new_key: MESSAGE            
    old_key: NGINX_REQUEST      

inject:                         
  - key: PRIORITY               
    value: '${NGINX_STATUS}'    

rewrite:                        # <<< we added this
  - key: PRIORITY               # <<< we added this
    match: '^[123]'             # <<< we added this
    value: 6                    # <<< we added this

  - key: PRIORITY               # <<< we added this
    match: '^4'                 # <<< we added this
    value: 5                    # <<< we added this

  - key: PRIORITY               # <<< we added this
    match: '^5'                 # <<< we added this
    value: 3                    # <<< we added this

  - key: PRIORITY               # <<< we added this
    match: '.*'                 # <<< we added this
    value: 4                    # <<< we added this
```

Rewrite rules are processed in order and the first matching a field, stops by default processing for this field. This is why the last rule, that matches everything does not always change the priority to 4.

Let's test it:

```bash
# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
MESSAGE=GET /index.html HTTP/1.1
NGINX_BODY_BYTES_SENT=4172
NGINX_HTTP_REFERER=-
NGINX_HTTP_USER_AGENT=Go-http-client/1.1
NGINX_REMOTE_ADDR=1.2.3.4
NGINX_REMOTE_USER=-
NGINX_REQUEST_METHOD=GET
NGINX_REQUEST_URI=/index.html
NGINX_SERVER_PROTOCOL=HTTP/1.1
NGINX_STATUS=200
NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
PRIORITY=6                           # <<< PRIORITY rewritten here
SYSLOG_IDENTIFIER=nginx-log

```

Rewrite rules are powerful. You can have named groups in them, like in the main pattern, to extract sub-fields from them, which you can then use in variable substitution. You can use rewrite rules to anonymize the URLs, e.g to remove customer IDs or transaction details from them.

### Sending logs to systemd-journal

Now the message is ready to be sent to a systemd-journal. For this we use `systemd-cat-native`. This command can send such messages to a journal running on the localhost, a local journal namespace, or a `systemd-journal-remote` running on another server. By just appending `| systemd-cat-native` to the command, the message will be sent to the local journal.


```bash
# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml | systemd-cat-native 
# no output

# let's find the message
# journalctl -r -o verbose SYSLOG_IDENTIFIER=nginx-log
Wed 2023-12-06 13:23:07.083299 EET [s=5290f0133f25407aaa1e2c451c0e4756;i=57194;b=0dfa96ecc2094cecaa8ec0efcb93b865;m=b133308867;t=60bd59346a289;x=5c1bdacf2b9c4bbd]
    PRIORITY=6
    _UID=0
    _GID=0
    _CAP_EFFECTIVE=1ffffffffff
    _SELINUX_CONTEXT=unconfined
    _BOOT_ID=0dfa96ecc2094cecaa8ec0efcb93b865
    _MACHINE_ID=355c8eca894d462bbe4c9422caf7a8bb
    _HOSTNAME=lab-logtest-src
    _RUNTIME_SCOPE=system
    _TRANSPORT=journal
    MESSAGE=GET /index.html HTTP/1.1
    NGINX_BODY_BYTES_SENT=4172
    NGINX_HTTP_REFERER=-
    NGINX_HTTP_USER_AGENT=Go-http-client/1.1
    NGINX_REMOTE_ADDR=1.2.3.4
    NGINX_REMOTE_USER=-
    NGINX_REQUEST_METHOD=GET
    NGINX_REQUEST_URI=/index.html
    NGINX_SERVER_PROTOCOL=HTTP/1.1
    NGINX_STATUS=200
    NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
    SYSLOG_IDENTIFIER=nginx-log
    _PID=114343
    _COMM=systemd-cat-nat
    _AUDIT_SESSION=253
    _AUDIT_LOGINUID=1000
    _SYSTEMD_CGROUP=/user.slice/user-1000.slice/session-253.scope
    _SYSTEMD_SESSION=253
    _SYSTEMD_OWNER_UID=1000
    _SYSTEMD_UNIT=session-253.scope
    _SYSTEMD_SLICE=user-1000.slice
    _SYSTEMD_USER_SLICE=-.slice
    _SYSTEMD_INVOCATION_ID=c59e33ead8c24880b027e317b89f9f76
    _SOURCE_REALTIME_TIMESTAMP=1701861787083299

```

So, the log line, with all its fields parsed, ended up in systemd-journal. Now we can send all the nginx logs to systemd-journal like this:

```bash
tail -F /var/log/nginx/access.log |\
   log2journal -f nginx.yaml |\
   systemd-cat-native
```

## Best practices

**Create a systemd service unit**: Add the above commands to a systemd unit file. When you run it in a systemd unit file you will be able to start/stop it and also see its status. Furthermore you can use the `LogNamespace=` directive of systemd service units to isolate your nginx logs from the logs of the rest of the system. Here is how to do it:

Create the file `/etc/systemd/system/nginx-logs.service` (change `/path/to/nginx.yaml` to the right path):

```
[Unit]
Description=NGINX Log to Systemd Journal
After=network.target

[Service]
ExecStart=/bin/sh -c 'tail -F /var/log/nginx/access.log | log2journal -f /path/to/nginx.yaml' | systemd-cat-native
LogNamespace=nginx-logs
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target
```

Reload systemd to grab this file:

```bash
sudo systemctl daemon-reload
```

Enable and start the service:

```bash
sudo systemctl enable nginx-logs.service
sudo systemctl start nginx-logs.service
```

To see the logs of the namespace, use:

```bash
journalctl -f --namespace=nginx-logs
```

Netdata will automatically pick the new namespace and present it at the list of sources of the dashboard.

You can also instruct `systemd-cat-native` to log to a remote system, sending the logs to a `systemd-journal-remote` instance running on another server. Check [the manual of systemd-cat-native](https://github.com/netdata/netdata/blob/master/libnetdata/log/systemd-cat-native.md).


## Performance

`log2journal` and `systemd-cat-native` have been designed to process hundreds of thousands of log lines per second. They both utilize high performance indexing hashtables to speed up lookups, and queues that dynamically adapt to the number of log lines offered, offering a smooth and fast experience under all conditions.

In our tests, the combined CPU utilization of `log2journal` and `systemd-cat-native` versus `promtail` with similar configuration is 1 to 5. So, `log2journal` and `systemd-cat-native` combined, are 5 times faster than `promtail`.

### PCRE2 patterns

The key characteristic that can influence the performance of a logs processing pipeline using these tools, is the quality of the PCRE2 patterns used. Poorly created PCRE2 patterns can make processing significantly slower, or CPU consuming.

Especially the pattern `.*` seems to have the biggest impact on CPU consumption, especially when multiple `.*` are on the same pattern.

Usually we use `.*` to indicate that we need to match everything up to a character, e.g. `.* ` to match up to a space. By replacing it with `[^ ]+` (meaning: match at least a character up to a space), the regular expression engine can be a lot more efficient, reducing the overall CPU utilization significantly.

### Performance of systemd journals

The ingestion pipeline of logs, from `tail` to `systemd-journald` or `systemd-journal-remote` is very efficient in all aspects. CPU utilization is better than any other system we tested and RAM usage is independent of the number of fields indexed, making systemd-journal one of the most efficient log management engines for ingesting high volumes of structured logs.

High fields cardinality does not have a noticable impact on systemd-journal. The amount of fields indexed and the amount of unique values per field, have a linear and predictable result in the resource utilization of `systemd-journald` and `systemd-journal-remote`. This is unlike other logs management solutions, like Loki, that their RAM requirements grow exponentially as the cardinality increases, making it impractical for them to index the amount of information systemd journals can index.

However, the number of fields added to journals influences the overall disk footprint. Less fields means more log entries per journal file, smaller overall disk footprint and faster queries. 

systemd-journal files are primarily designed for security and reliability. This comes at the cost of disk footprint. The internal structure of journal files is such that in case of corruption, minimum data loss will incur. To achieve such a unique characteristic, certain data within the files need to be aligned at predefined boundaries, so that in case there is a corruption, non-corrupted parts of the journal file can be recovered.

Despite the fact that systemd-journald employees several techniques to optimize disk footprint, like deduplication of log entries, shared indexes for fields and their values, compression of long log entries, etc. the disk footprint of journal files is generally 10x more compared to other monitoring solutions, like Loki.

This can be improved by storing journal files in a compressed filesystem. In our tests, a compressed filesystem can save up to 75% of the space required by journal files. The journal files will still be bigger than the overall disk footprint of other solutions, but the flexibility (index any number of fields), reliability (minimal potential data loss) and security (tampering protection and sealing) features of systemd-journal justify the difference.

When using versions of systemd prior to 254 and you are centralizing logs to a remote system, `systemd-journal-remote` creates very small files (32MB). This results in increased duplication of information across the files, increasing the overall disk footprint. systemd versions 254+, added options to `systemd-journal-remote` to control the max size per file. This can significantly reduce the duplication of information.

Another limitation of the `systemd-journald` ecosystem is the uncompressed transmission of logs across systems. `systemd-journal-remote` up to version 254 that we tested, accepts encrypted, but uncompressed data. This means that when centralizing logs to a logs server, the bandwidth required will be increased compared to other log management solution.

## Security Considerations

`log2journal` and `systemd-cat-native` are used to convert log files to structured logs in the systemd-journald ecosystem.

Systemd-journal is a logs management solution designed primarily for security and reliability. When configured properly, it can reliably and securely store your logs, ensuring they will available and unchanged for as long as you need them.

When sending logs to a remote system, `systemd-cat-native` can be configured the same way `systemd-journal-upload` is configured, using HTTPS and private keys to encrypt and secure their transmission over the network.

When dealing with sensitive logs, organizations usually follow 2 strategies:

1. Anonymize the logs before storing them, so that the stored logs do not have any sensitive information.
2. Store the logs in full, including sensitive information, and carefully control who and how has access to them.

Netdata can help in both cases.

If you want to anonymize the logs before storing them, use rewriting rules at the `log2journal` phase to remove sensitive information from them. This process usually means matching the sensitive part and replacing with `XXX` or `CUSTOMER_ID`, or `CREDIT_CARD_NUMBER`, so that the resulting log entries stored in journal files will not include any such sensitive information.

If on other hand your organization prefers to maintain the full logs and control who and how has access on them, use Netdata Cloud to assign roles to your team members and control which roles can access the journal logs in your environment.

## `log2journal` options

```

Netdata log2journal v1.43.0-341-gdac4df856

Convert logs to systemd Journal Export Format.

 - JSON logs: extracts all JSON fields.
 - logfmt logs: extracts all logfmt fields.
 - free-form logs: uses PCRE2 patterns to extracts fields.

Usage: ./log2journal [OPTIONS] PATTERN|json

Options:

  --file /path/to/file.yaml or -f /path/to/file.yaml
       Read yaml configuration file for instructions.

  --config CONFIG_NAME or -c CONFIG_NAME
       Run with the internal YAML configuration named CONFIG_NAME.
       Available internal YAML configs:

       nginx-combined nginx-json default 

--------------------------------------------------------------------------------
  INPUT PROCESSING

  PATTERN
       PATTERN should be a valid PCRE2 regular expression.
       RE2 regular expressions (like the ones usually used in Go applications),
       are usually valid PCRE2 patterns too.
       Sub-expressions without named groups are evaluated, but their matches are
       not added to the output.

     - JSON mode
       JSON mode is enabled when the pattern is set to: json
       Field names are extracted from the JSON logs and are converted to the
       format expected by Journal Export Format (all caps, only _ is allowed).

     - logfmt mode
       logfmt mode is enabled when the pattern is set to: logfmt
       Field names are extracted from the logfmt logs and are converted to the
       format expected by Journal Export Format (all caps, only _ is allowed).

       All keys extracted from the input, are transliterated to match Journal
       semantics (capital A-Z, digits 0-9, underscore).

       In a YAML file:
       ```yaml
       pattern: 'PCRE2 pattern | json | logfmt'
       ```

--------------------------------------------------------------------------------
  GLOBALS

  --prefix PREFIX
       Prefix all fields with PREFIX. The PREFIX is added before any other
       processing, so that the extracted keys have to be matched with the PREFIX in
       them. PREFIX is NOT transliterated and it is assumed to be systemd-journal
       friendly.

       In a YAML file:
       ```yaml
       prefix: 'PREFIX_' # prepend all keys with this prefix.
       ```

  --filename-key KEY
       Add a field with KEY as the key and the current filename as value.
       Automatically detects filenames when piped after 'tail -F',
       and tail matches multiple filenames.
       To inject the filename when tailing a single file, use --inject.

       In a YAML file:
       ```yaml
       filename:
         key: KEY
       ```

--------------------------------------------------------------------------------
  RENAMING OF KEYS

  --rename NEW=OLD
       Rename fields. OLD has been transliterated and PREFIX has been added.
       NEW is assumed to be systemd journal friendly.

       Up to 512 renaming rules are allowed.

       In a YAML file:
       ```yaml
       rename:
         - new_key: KEY1
           old_key: KEY2 # transliterated with PREFIX added
         - new_key: KEY3
           old_key: KEY4 # transliterated with PREFIX added
         # add as many as required
       ```

--------------------------------------------------------------------------------
  INJECTING NEW KEYS

  --inject KEY=VALUE
       Inject constant fields to the output (both matched and unmatched logs).
       --inject entries are added to unmatched lines too, when their key is
       not used in --inject-unmatched (--inject-unmatched override --inject).
       VALUE can use variable like ${OTHER_KEY} to be replaced with the values
       of other keys available.

       Up to 512 fields can be injected.

       In a YAML file:
       ```yaml
       inject:
         - key: KEY1
           value: 'VALUE1'
         - key: KEY2
           value: '${KEY3}${KEY4}' # gets the values of KEY3 and KEY4
         # add as many as required
       ```

--------------------------------------------------------------------------------
  REWRITING KEY VALUES

  --rewrite KEY=/MATCH/REPLACE[/OPTIONS]
       Apply a rewrite rule to the values of a specific key.
       The first character after KEY= is the separator, which should also
       be used between the MATCH, REPLACE and OPTIONS.

       OPTIONS can be a comma separated list of `non-empty`, `dont-stop` and
       `inject`.

       When `non-empty` is given, MATCH is expected to be a variable
       substitution using `${KEY1}${KEY2}`. Once the substitution is completed
       the rule is matching the KEY only if the result is not empty.
       When `non-empty` is not set, the MATCH string is expected to be a PCRE2
       regular expression to be checked against the KEY value. This PCRE2
       pattern may include named groups to extract parts of the KEY's value.

       REPLACE supports variable substitution like `${variable}` against MATCH
       named groups (when MATCH is a PCRE2 pattern) and `${KEY}` against the
       keys defined so far.

       Example:
              --rewrite DATE=/^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})$/
                             ${day}/${month}/${year}
       The above will rewrite dates in the format YYYY-MM-DD to DD/MM/YYYY.

       Only one rewrite rule is applied per key; the sequence of rewrites for a
       given key, stops once a rule matches it. This allows providing a sequence
       of independent rewriting rules for the same key, matching the different
       values the key may get, and also provide a catch-all rewrite rule at the
       end, for setting the key value if no other rule matched it. The rewrite
       rule can allow processing more rewrite rules when OPTIONS includes
       the keyword 'dont-stop'.

       Up to 512 rewriting rules are allowed.

       In a YAML file:
       ```yaml
       rewrite:
         # the order if these rules in important - processed top to bottom
         - key: KEY1
           match: 'PCRE2 PATTERN WITH NAMED GROUPS'
           value: 'all match fields and input keys as ${VARIABLE}'
           inject: BOOLEAN # yes = inject the field, don't just rewrite it
           stop: BOOLEAN # no = continue processing, don't stop if matched
         - key: KEY2
           non_empty: '${KEY3}${KEY4}' # match only if this evaluates to non empty
           value: 'all input keys as ${VARIABLE}'
           inject: BOOLEAN # yes = inject the field, don't just rewrite it
           stop: BOOLEAN # no = continue processing, don't stop if matched
         # add as many rewrites as required
       ```

       By default rewrite rules are applied only on fields already defined.
       This allows shipping YAML files that include more rewrites than are
       required for a specific input file.
       Rewrite rules however allow injecting new fields when OPTIONS include
       the keyword `inject` or in YAML `inject: yes` is given.

       MATCH on the command line can be empty to define an unconditional rule.
       Similarly, `match` and `non_empty` can be omitted in the YAML file.
--------------------------------------------------------------------------------
  UNMATCHED LINES

  --unmatched-key KEY
       Include unmatched log entries in the output with KEY as the field name.
       Use this to include unmatched entries to the output stream.
       Usually it should be set to --unmatched-key=MESSAGE so that the
       unmatched entry will appear as the log message in the journals.
       Use --inject-unmatched to inject additional fields to unmatched lines.

       In a YAML file:
       ```yaml
       unmatched:
         key: MESSAGE  # inject the error log as MESSAGE
       ```

  --inject-unmatched LINE
       Inject lines into the output for each unmatched log entry.
       Usually, --inject-unmatched=PRIORITY=3 is needed to mark the unmatched
       lines as errors, so that they can easily be spotted in the journals.

       Up to 512 such lines can be injected.

       In a YAML file:
       ```yaml
       unmatched:
         key: MESSAGE  # inject the error log as MESSAGE
         inject::
           - key: KEY1
             value: 'VALUE1'
           # add as many constants as required
       ```

--------------------------------------------------------------------------------
  FILTERING

  --include PATTERN
       Include only keys matching the PCRE2 PATTERN.
       Useful when parsing JSON of logfmt logs, to include only the keys given.
       The keys are matched after the PREFIX has been added to them.

  --exclude PATTERN
       Exclude the keys matching the PCRE2 PATTERN.
       Useful when parsing JSON of logfmt logs, to exclude some of the keys given.
       The keys are matched after the PREFIX has been added to them.

       When both include and exclude patterns are set and both match a key,
       exclude wins and the key will not be added, like a pipeline, we first
       include it and then exclude it.

       In a YAML file:
       ```yaml
       filter:
         include: 'PCRE2 PATTERN MATCHING KEY NAMES TO INCLUDE'
         exclude: 'PCRE2 PATTERN MATCHING KEY NAMES TO EXCLUDE'
       ```

--------------------------------------------------------------------------------
  OTHER

  -h, or --help
       Display this help and exit.

  --show-config
       Show the configuration in YAML format before starting the job.
       This is also an easy way to convert command line parameters to yaml.

The program accepts all parameters as both --option=value and --option value.

The maximum log line length accepted is 1048576 characters.

PIPELINE AND SEQUENCE OF PROCESSING

This is a simple diagram of the pipeline taking place:
                                                                 
          +---------------------------------------------------+  
          |                       INPUT                       |  
          |             read one log line at a time           |  
          +---------------------------------------------------+  
                          v   v   v   v   v   v                  
          +---------------------------------------------------+  
          |             EXTRACT FIELDS AND VALUES             |  
          |            JSON, logfmt, or pattern based         |  
          |  (apply optional PREFIX - all keys use capitals)  |  
          +---------------------------------------------------+  
                          v   v   v   v   v   v                  
          +---------------------------------------------------+  
          |                   RENAME FIELDS                   |  
          |           change the names of the fields          |  
          +---------------------------------------------------+  
                          v   v   v   v   v   v                  
          +---------------------------------------------------+  
          |                 INJECT NEW FIELDS                 |  
          |   constants, or other field values as variables   |  
          +---------------------------------------------------+  
                          v   v   v   v   v   v                  
          +---------------------------------------------------+  
          |                REWRITE FIELD VALUES               |  
          |     pipeline multiple rewriting rules to alter    |  
          |               the values of the fields            |  
          +---------------------------------------------------+  
                          v   v   v   v   v   v                  
          +---------------------------------------------------+  
          |                   FILTER FIELDS                   |  
          |  use include and exclude patterns on the field    |  
          | names, to select which fields are sent to journal |  
          +---------------------------------------------------+  
                          v   v   v   v   v   v                  
          +---------------------------------------------------+  
          |                       OUTPUT                      |  
          |           generate Journal Export Format          |  
          +---------------------------------------------------+  
                                                                 
--------------------------------------------------------------------------------
JOURNAL FIELDS RULES (enforced by systemd-journald)

     - field names can be up to 64 characters
     - the only allowed field characters are A-Z, 0-9 and underscore
     - the first character of fields cannot be a digit
     - protected journal fields start with underscore:
       * they are accepted by systemd-journal-remote
       * they are NOT accepted by a local systemd-journald

     For best results, always include these fields:

      MESSAGE=TEXT
      The MESSAGE is the body of the log entry.
      This field is what we usually see in our logs.

      PRIORITY=NUMBER
      PRIORITY sets the severity of the log entry.
      0=emerg, 1=alert, 2=crit, 3=err, 4=warn, 5=notice, 6=info, 7=debug
      - Emergency events (0) are usually broadcast to all terminals.
      - Emergency, alert, critical, and error (0-3) are usually colored red.
      - Warning (4) entries are usually colored yellow.
      - Notice (5) entries are usually bold or have a brighter white color.
      - Info (6) entries are the default.
      - Debug (7) entries are usually grayed or dimmed.

      SYSLOG_IDENTIFIER=NAME
      SYSLOG_IDENTIFIER sets the name of application.
      Use something descriptive, like: SYSLOG_IDENTIFIER=nginx-logs

You can find the most common fields at 'man systemd.journal-fields'.

```

`log2journal` supports YAML configuration files, like the ones found [in this directory](https://github.com/netdata/netdata/tree/master/collectors/log2journal/log2journal.d).

## `systemd-cat-native` options

Read [the manual of systemd-cat-native](https://github.com/netdata/netdata/blob/master/libnetdata/log/systemd-cat-native.md).