1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
|
************
Faup
************
Description
===========
These new functions allow anybody to parse any variable containing URLs, hostnames, DNS queries and such to extract:
- the *scheme*
- the *credentials* (if present)
- the *tld* (with support for second-level TLDs)
- the *domain* (with and without the tld)
- the *subdomain*
- the *full hostname*
- the *port* (if present)
- the *resource* path (if present)
- the *query string* parameters (if present)
- the *fragment* (if present)
HowTo
-----
The module functions are fairly simple to use, and are divided in 2 classes:
* `faup()` allows to parse the entire URL and return all parts in a complete JSON
* `faup_<field>()` allows to parse the entire URL, but only returns the (potential) value of the requested field as string
Examples
^^^^^^^^
Using the `faup()` function
"""""""""""""""""""""""""""""""
The `faup()` is the simplest function to use, simply provide a value or variable (any type) as the only parameter, and the function returns a json object containing every element of the URL.
*example code:*
.. code-block:: none
set $!url = "https://user:pass@www.rsyslog.com:443/doc/v8-stable/rainerscript/functions/mo-faup.html?param=value#faup";
set $.faup = faup($!url);
*$.faup will contain:*
.. code-block:: none
{
"scheme": "https",
"credential": "user:pass",
"subdomain": "www",
"domain": "rsyslog.com",
"domain_without_tld": "rsyslog",
"host": "www.rsyslog.com",
"tld": "com",
"port": "443",
"resource_path": "\/doc\/v8-stable\/rainerscript\/functions\/mo-ffaup.html",
"query_string": "?param=value",
"fragment": "#faup"
}
.. note::
This is a classic rsyslog variable, and you can access every sub-key with `$.faup!domain`, `$.faup!resource_path`, etc...
Using the `faup_<field>()` functions
""""""""""""""""""""""""""""""""""""""""
Using the field functions is even simpler: for each field returned by the `faup()` function, there exists a corresponding function to get only that one field.
For example, if the goal is to recover the domain without the tld, the example above could be modified as follows:
*example code:*
.. code-block:: none
set $!url = "https://user:pass@www.rsyslog.com:443/doc/v8-stable/rainerscript/functions/mo-faup.html?param=value#faup";
set $.faup = faup_domain_without_tld($!url);
*$.faup will contain:*
.. code-block:: none
rsyslog
.. note::
The returned value is no longer a json object, but a simple string
Requirements
============
This module relies on the `faup <https://github.com/stricaud/faup>`_ library.
The library should be compiled (see link for instructions on how to compile) and installed on build and runtime machines.
.. warning::
Even if faup is statically compiled to rsyslog, the library still needs an additional file to work properly: the mozilla.tlds stored by the libfaup library in /usr/local/share/faup. It permits to properly match second-level TLDs and allow URLs such as www.rsyslog.co.uk to be correctly parsed into \<rsyslog:domain\>.\<co.uk:tld\> and not \<rsyslog:subdomain\>.\<co:domain\>.\<uk:tld\>
Motivations
===========
Those functions are the answer to a growing problem encountered in Rsyslog when using modules to enrich logs : some mechanics (like lookup tables or external module calls) require "strict" URL/hostname formats that are often not formatted correctly, resulting in lookup failures/misses.
This ensures getting stable inputs to provide to lookups/modules to enrich logs.
|