doc/configuration/configurable_failover.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476

Configurable Module Fail Over
=============================

Before configurable module failover, we had this kind of entry in
``radiusd.conf``:

::

  #---
  authorize {
    preprocess
    files
  }
  #---

This entry instructed the ``authorize`` section to first process the
request through the ``preprocess`` module, and if that returned success,
to process it through ``files`` module.  If that sequence returned
success, then the ``authorize`` stage itself would then return success.
Processing was strictly linear and if one module failed, the whole
section would fail immediately.

Configurable failover provides more flexibility. It takes advantage
of the tree structure of radiusd.conf to support a configuration
language that allows you to ``group`` modules that should work together
in ways other than simple lists.  You can control the flow of any
stage (e.g. ``authorize``) to fit your needs, without touching C code,
just by altering radiusd.conf.

This configurable fail-over has a convenient short-hand, too.
Administrators commonly want to say things like "try SQL1, if it's
down, try SQL2, otherwise drop the request."

For example:

::

  #---
  modules {
    sql sql1 {
      # configuration to connect to SQL database one
    }
    sql sql2 {
      # configuration to connect to SQL database two
    }
    always handled {
      rcode = handled
    }
  }

  #  Handle accounting packets
  accounting {
      detail			# always log to detail, stopping if it fails
      redundant {
        sql1			# try module sql1
        sql2			# if that's down, try module sql2
	handled			# otherwise drop the request as
				# it's been ``handled`` by the ``always``
				# module (see doc/rlm_always)
      }
  }
  #---

The ``redundant`` section is a configuration directive which tells the
server to process the second module if the first one fails.  Any
number of modules can be listed in a ``redundant`` section.  The server
will process each in turn, until one of the modules succeeds.  It will then stop processing the ``redundant`` list.

Rewriting results for single modules
------------------------------------

Normally, when a module fails, the entire section (``authorize``,
``accounting``, etc.) stops being processed.  In some cases, we may want
to permit  "soft failures".  That is, we may want to tell the server
that it is "ok" for a module to fail, and that the failure should not
be treated as a fatal error.

In this case, the module is treated as a "section", rather than just
as a single lne in ``radiusd.conf``.  The configuration entries for
that section are taken from the ``configurable fail-over`` code, and not
from the configuration information for that module.

For example, the ``detail`` module normally returns ``fail`` if it is
unable to write its information to the ``detail`` file.  As a test, we
can configure the server so that it continues processing the request,
even if the ``detail`` module fails.  The following example shows how:

::

  #--
  #  Handle accounting packets
  accounting {
      detail {
        fail = 1
      }
      redundant {
        sql1
        sql2
	handled
      }
  }
  #--

The ``fail = 1`` entry tells the server to remember the ``fail`` code,
with priority ``1``.  The normal configuration is ``fail = return``, which
means ``if the detail module fails, stop processing the accounting
section``.

Fail-over configuration entries
-------------------------------

Modules normally return on of the following codes as their result:

+-----------+-----------------------------------------------------+
|Code	    | Meaning                                             |
+===========+=====================================================+
|notfound   | the user was not found                              |
+-----------+-----------------------------------------------------+
|noop	    | the module did nothing                              |
+-----------+-----------------------------------------------------+
|ok	    | the module succeeded                                |
+-----------+-----------------------------------------------------+
|updated    | the module updated information in the request       |
+-----------+-----------------------------------------------------+
|fail       | the module failed                                   |
+-----------+-----------------------------------------------------+
|reject     | the module rejected the user                        |
+-----------+-----------------------------------------------------+
|userlock   | the user was locked out                             |
+-----------+-----------------------------------------------------+
|invalid    | the user's configuration entry was invalid          |
+-----------+-----------------------------------------------------+
|handled    | the module has done everything to handle the request|
+-----------+-----------------------------------------------------+

In a configurable fail-over section, each of these codes may be
listed, with a value.  If the code is not listed, or a configurable
fail-over section is not defined, then values that make sense for the
requested ``group`` (group, redundant, load-balance, etc) are used.

The special code ``default`` can be used to set all return codes to
the specified value.  This value will be used with a lower priority
than ones that are explicitly set.

The values for each code may be one of two things:

+---------+---------------------------------------------------------------+
|Value	  | Meaning                                                       |
+=========+===============================================================+
|<number> | Priority for this return code.                                |
+---------+---------------------------------------------------------------+
|return	  | Stop processing this configurable fail-over list.             |
+---------+---------------------------------------------------------------+
|reject	  | Stop processing this configurable fail-over list and          |
|         | immediately return a reject.                                  |
+---------+---------------------------------------------------------------+

The ``<number>`` used for a value may be any decimal number between 1
and 99999.  The number is used when processing a list of modules, to
determine which code is returned from the list.  For example, if
``module1`` returns ``fail`` with priority ``1``, and a later ``module2``
returns ``ok`` with priority ``3``, the return code from the list of
modules will be ``ok``, because it has higher priority than ``fail``.

This configurability allows the administrator to permit some modules
to fail, so long as a later module succeeds.


More Complex Configurations
---------------------------

The ``authorize`` section is normally a list of module names.  We can
create sub-lists by using the section name ``group``.  The ``redundant``
section above is just a short-hand for ``group``, with a set of default
return codes, which are different than the normal ``stop processing the
list on failure``.

For example, we can configure two detail modules, and allow either
to fail, so long as one of them succeeds.

::

  #--
  #  Handle accounting packets
  accounting {
      group {
        detail1 {
          fail = 1		# remember ``fail`` with priority 1
	  ok = return		# if we succeed, don't do ``detail2``
        }
	detail2 {
	  fail = 1		# remember ``fail`` with priority 1
	  ok = return		# if we succeed, return ``ok``
				# if ``detail1`` returned ``fail``
	}
      }			# returns ``fail`` only if BOTH modules returned ``fail``
      redundant {
        sql1
        sql2
	handled
      }
  }
  #--

This configuration says:

	- log to ``detail1``, and stop processing the ``group`` list if ``detail1`` returned OK.

	- If ``detail1`` returned ``fail``, then continue, but remember the ``fail`` code, with priority 1.

	- If ``detail2`` fails, then remember ``fail`` with priority 1.

	- If ``detail2`` returned ``ok``, return ``ok`` from the ``group``.

The return code from the ``group`` is the return code which was either
forced to return (e.g. ``ok`` for ``detail1``), or the highest priority
return code found by processing the list.

This process can be extended to any number of modules listed in a
``group`` section.


Virtual Modules
---------------

Some configurations may require using the same list of modules, in
the same order, in multiple sections.  For those systems, the
configuration can be simplified through the use of ``virtual`` modules.
These modules are configured as named sub-sections of the
``instantiate`` section, as follows:

::

	instantiate {
		...

		redundant sql1_or_2 {
			sql1
			sql2
		}
	}

The name ``sql1_or_2`` can then be used in any other section, such as
``authorize`` or ``accounting``.  The result will be *exactly* as if that
section was placed at the location of the ``sql1_or_2`` reference.

These virtual modules are full-fledged objects in and of themselves.
One virtual module can refer to another virtual module, and they can
contain ``if`` conditions, or any other configuration permitted in a
section.


Redundancy and Load-Balancing
-----------------------------

See ``man unlang`` or ``doc/load-balance`` for information on simple
redundancy (fail-over) and load balancing.


The Gory Details
-----------------

The fundamental object is called a MODCALLABLE, because it is something that
can be passed a specific radius request and returns one of the RLM_MODULE_*
results. It is a function - if you can accept the fact that pieces of
radiusd.conf are functions. There are two kinds of MODCALLABLEs: GROUPs and
SINGLEs.

A SINGLE is a reference to a module instance that was set up in the modules{}
section of radiusd.conf, like ``preprocess`` or ``sql1``. When a SINGLE is
called, the corresponding function in the rlm is invoked, and whichever
RLM_MODULE_* it returns becomes the RESULT of the SINGLE.

A GROUP is a section of radiusd.conf that includes some MODCALLABLEs.
Examples of GROUPs above include ``authorize{...}``, which implements the C
function module_authorize, and ``redundant{...}``, which contains two SINGLEs
that refer to a couple of redundant databases. Note that a GROUP can contain
other GROUPs - ``Auth-Type SQL{...}`` is also a GROUP, which implements the C
function module_authenticate when Auth-Type is set to SQL.

Now here's the fun part - what happens when a GROUP is called? It simply runs
through all of its children in order, and calls each one, whether it is
another GROUP or a SINGLE. It then looks at the RESULT of that child, and
takes some ACTION, which is basically either ``return that RESULT immediately``
or ``Keep going``. In the first example, any ``bad`` RESULT from the preprocess
module causes an immediate return, and any ``good`` RESULT causes the
authorize{...} GROUP to proceed to the files module.

We can see the exact rules by writing them out the long way:

::

  authorize {
    preprocess {
      notfound = 1
      noop     = 2
      ok       = 3
      updated  = 4
      fail     = return
      reject   = return
      userlock = return
      invalid  = return
      handled  = return
    }
    files {
      notfound = 1
      noop     = 2
      ok       = 3
      updated  = 4
      fail     = return
      reject   = return
      userlock = return
      invalid  = return
      handled  = return
    }
  }

This is the same as the first example, with the behavior explicitly
spelled out. Each SINGLE becomes its own section, containing a list of
RESULTs that it may return and what ACTION should follow from them. So
preprocess is called, and if it returns for example RLM_MODULE_REJECT,
then the reject=return rule is applied, and the authorize{...} GROUP
itself immediately returns RLM_MODULE_REJECT.

If preprocess returns RLM_MODULE_NOOP, the corresponding ACTION is ``2``. An
integer ACTION serves two purposes - first, it tells the parent GROUP to go
on to the next module. Second, it is a hint as to how desirable this RESULT
is as a candidate for the GROUP's own RESULT. So files is called... suppose
it returns RLM_MODULE_NOTFOUND. The ACTION for notfound inside the files{...}
block is ``1``. We have now reached the end of the authorize{...} GROUP and we
look at the RESULTs we accumulated along the way - there is a noop with
preference level 2, and a notfound with preference level 1, so the
authorize{...} GROUP as a whole returns RLM_MODULE_NOOP, which makes sense
because to say the user was not found at all would be a lie, since preprocess
apparently found him, or else it would have returned RLM_MODULE_NOTFOUND too.

We could use the ``default`` code to simplify the above example a
little.  The following two configurations are identical:

::

  files {
    notfound = 1
    noop     = 2
    ok       = 3
    updated  = 4
    default  = return
  }


When putting the ``default`` first, later definitions over-ride it's
return code:

::

  files {
    default  = return
    notfound = 1
    noop     = 2
    ok       = 3
    updated  = 4
  }

[Take a deep breath - the worst is over]

That RESULT preference/desirability stuff is pretty complex, but my hope is
that it will be complex enough to handle the needs of everyone's real-world
imperfect systems, while staying out of sight most of the time since the
defaults will be right for the most common configurations.

So where does redundant{...} fit in with all that? Well, redundant{...} is
simply a group that changes the default ACTIONs to something like

::

  fail = 1
  everythingelse = return

so that when one module fails, we keep trying until we find one that doesn't
fail, then return whatever it returned. And at the end, if they all failed,
the redundant GROUP as a whole returns RLM_MODULE_FAIL, just as you'd want it
to (I hope).

There are two other kinds of grouping: ``group{...}`` which does not have any
specialized default ACTIONs, and ``append{...}``, which should be used when you
have separate but similarly structured databases that are guaranteed not to
overlap.

That's all that really needs to be said. But now a few random notes:

GROUPs may have RESULT=ACTION
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

It would look like this:

::

  authorize {
    preprocess
    redundant {
      sql1
      sql2
      notfound = return
    }
    files
  }

which would prevent ``files`` from being called if neither of the SQL
instances could find the user.

redundant{...} and append{...} are just shortcuts
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You could write:

::

    group {
      sql1 {
        fail     = 1
        notfound = 2
        noop     = return
        ok       = return
        updated  = return
        reject   = return
        userlock = return
        invalid  = return
        handled  = return
      }
      sql2 {
        fail     = 1
        notfound = 2
        noop     = return
        ok       = return
        updated  = return
        reject   = return
        userlock = return
        invalid  = return
        handled  = return
      }
    }
  instead of
    redundant {
      sql1
      sql2
    }

but the latter is just a whole lot easier to read.

``authenticate{...}`` is not a GROUP
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

even though it contains a list of ``Auth-Type`` GROUPs, because its
semantics are totally different - it uses ``Auth-Type`` to decide which of
its members to call, and their order is irrelevant.

The default rules are context-sensitive
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For ``authorize``, the defaults are
what you saw above - notfound, noop, ok, and updated are considered
success, and anything else has an ACTION of ``return``. For authenticate, the
default is to return on success *or* reject, and only try the second and
following items if the first one fails. You can read all the default ACTIONs
in modcall.c (int defaultactions[][][]), or just trust me. They do the right
thing.

There are some rules that can't be implemented in this language
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

things like ``notfound = 1-reject``, ``noop = 2-ok``, ``ok = 3-ok``, etc. But I don't feel
justified adding that complexity in the first draft.
There are already enough things here that may never see real-world usage.
Like append{...}

-- Pac. 9/18/2000