doc/antora/modules/unlang/pages/condition/regex.adoc


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180

= Regular Expressions

.Syntax
====
[source,unlang]
----
(<subject> =~ /<pattern>/)
(<subject> =~ /<pattern>/[imsux])

(<subject> !~ /<pattern>/)
(<subject> !~ /<pattern>/[imsux])
----
====

== Matching
The regular expression operators perform regular expression matching
on the data. The `<subject>` field can be an attribute reference or data,
as with the other xref:condition/cmp.adoc[comparison] operators.  The `/<pattern>/`
field must be a valid regular expression.

The `=~` operator evaluates to `true` when `data` matches the
`/<pattern>/`.  Otherwise, it evaluates to `false`.

The `!~` operator evaluates to `true` when `data` does not match the
`/<pattern>/`.  Otherwise, it evaluates to `true`.

The regular expression comparison is performed on the _string representation_
of the left side of the comparison.  That is, if the left side is an
xref:type/numb.adoc[integer], the regular expression will behave is if the
value `0` was the literal string `"0"`.  Similarly, if the left side is an
xref:attr.adoc[&Attribute-Name], then the regular expression will behave is if
the attribute was printed to a string, and the match was performed on the
resulting string.

.Checking if the `User-Name` attribute contains a realm of example.com
====
[source,unlang]
----
if (&User-Name =~ /@example\.com$/) {
    ...
}
----
====

== Dialects

The syntax of the regular expression is defined by the regular
expression library available on the local system.

FreeRADIUS currently supports:

* link:https://www.pcre.org/original/doc/html/[libpcre] and
link:https://www.pcre.org/current/doc/html/[libpcre2] both of which
provide
link:https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions[Perl
Compatible Regular expressions].
* Regex support provided by the local libc implementation, usually
link:http://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended[
Posix regular expressions].

[TIP]
====
Use the output of `radiusd -Xxv` to determine which regular expression library is in use.

....
...
Debug :   regex-pcre               : no
Debug :   regex-pcre2              : yes
Debug :   regex-posix              : no
Debug :   regex-posix-extended     : no
Debug :   regex-binsafe            : yes
...
Debug :   pcre2                    : 10.33 (2019-04-16) - retrieved at build time
....
====

[WARNING]
====
Depending on the regular expression library or libc implementation the server
was built against, the pattern matching function available may not be binary
safe (see `regex-binsafe` in the output of `radiusd -Xxv`).

If a binary safe regex match function is not available, and a match is
attempted against a subject that contains one or more `NUL` ('\0') bytes, the
match will be aborted, any condition that uses the result will evaluate to false,
and a warning will be emitted.
====

== Flags

The regular expression `/<pattern>/` may be followed by one or more flag
characters. Again, which flags are available depends on the regular expression
library the server was built with.  Multiple flags may be specified per
`/pattern/`.

.Flags and their uses

[options="header"]
|=====
| Flag Character | Available with | Effect
| `i`            | All            | Enable case-insensitive matching.
| `m`            | All            | '^' and '$' match newlines within the subject.
| `s`            | libpcre[2]     | '.' matches anything, including newlines.
| `u`            | libpcre[2]     | Treats subjects as UTF8.  Invalid UTF8
                                    sequences will result in the match failing.
 |`x`            | libpcre[2]     | Allows comments in expressions by ignoring
                                    whitespace, and text between '#' and the next
                                    newline character.
|=====

== Subcapture groups

When the `=~` or `!~` operators are used, then parentheses in the regular
expression will sub capture groups, which contain part of the subject string.

The special expansion `%{0}` expands to the portion of the subject that
matched. The expansions +
`%{1}`..`%{32}` expand to the contents of any subcapture groups.

When using libpcre[2], named capture groups may also be accessed using the
built-in expansion +
`%{regex:<named capture group>}`.

Please see the xref:xlat/builtin.adoc#_0_32[xlat documentation] for
more information on regular expression matching.

.Extracting the 'user' portion of a realm qualified string
====
[source,unlang]
----
if (&User-Name =~ /^(.*)@example\.com$/) {
    update reply {
        Reply-Message := "Hello %{1}"
    }
}
----
====

== Pre-Compiled vs Runtime Compiled

When the server starts any regular expressions comparisons it finds will be
pre-compiled, and if support is available, JIT'd (converted to machine code)
to ensure fast execution.

If a pattern contains a xref:xlat/index.adoc[string expansion], the pattern
cannot be compiled on startup, and will be compiled at runtime each time the
expression is evaluated. The server will also turn off JITing for runtime
compiled expressions, as the overhead is greater than the time that would be
saved during evaluation.

.A runtime compiled regular expression
====
[source,unlang]
----
if (&User-Name =~ /^@%{Tmp-String-0}$/) {
    ...
}
----
====

To ensure optimal performance you should limit the number of patterns
containing xref:xlat/index.adoc[string expansions], and if using PCRE, combine
multiple expressions operating on the same subject into a single expression
using the PCRE alternation '|' operator.

.Using multiple string expansions and the PCRE alternation operator
====
[source,unlang]
----
if (&User-Name =~ /^@(%{Tmp-String-0}|%{Tmp-String-1})$/) {
    ...
}
----
====


// Licenced under CC-by-NC 4.0.
// Copyright (C) 2020 Network RADIUS SAS.
// Copyright (C) 2019 Arran Cudbard-Bell <a.cudbardb@freeradius.org>
// Development of this documentation was sponsored by Network RADIUS SAS.