doc/developer/northbound/advanced-topics.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294

Auto-generated CLI commands
~~~~~~~~~~~~~~~~~~~~~~~~~~~

In order to have less code to maintain, it should be possible to write a
tool that auto-generates CLI commands based on the FRR YANG models. As a
matter of fact, there are already a number of NETCONF-based CLIs that do
exactly that (e.g. `Clixon <https://github.com/clicon/clixon>`__,
ConfD’s CLI).

The problem however is that there isn’t an exact one-to-one mapping
between the existing CLI commands and the corresponding YANG nodes from
the native models. As an example, ripd’s
``timers basic (5-2147483647) (5-2147483647) (5-2147483647)`` command
changes three YANG leaves at the same time. In order to auto-generate
CLI commands and retain their original form, it’s necessary to add
annotations in the YANG modules to specify how the commands should look
like. Without YANG annotations, the CLI auto-generator will generate a
command for each YANG leaf, (leaf-)list and presence-container. The
ripd’s ``timers basic`` command, for instance, would become three
different commands, which would be undesirable.

   This Tail-f’s®
   `document <http://info.tail-f.com/hubfs/Whitepapers/Tail-f_ConfD-CLI__Cfg_Mode_App_Note_Rev%20C.pdf>`__
   shows how to customize ConfD auto-generated CLI commands using YANG
   annotations.

The good news is that *libyang* allows users to create plugins to
implement their own YANG extensions, which can be used to implement CLI
annotations. If done properly, a CLI generator can save FRR developers
from writing and maintaining hundreds if not thousands of DEFPYs!

CLI on a separate program
~~~~~~~~~~~~~~~~~~~~~~~~~

The flexible design of the northbound architecture opens the door to
move the CLI to a separate program in the long-term future. Some
advantages of doing so would be: \* Treat the CLI as just another
northbound client, instead of having CLI commands embedded in the
binaries of all FRR daemons. \* Improved robustness: bugs in CLI
commands (e.g. null-pointer dereferences) or in the CLI code itself
wouldn’t affect the FRR daemons. \* Foster innovation by allowing other
CLI programs to be implemented, possibly using higher level programming
languages.

The problem, however, is that the northbound retrofitting process will
convert only the CLI configuration commands and EXEC commands in a first
moment. Retrofitting the “show” commands is a completely different story
and shouldn’t happen anytime soon. This should hinder progress towards
moving the CLI to a separate program.

Proposed feature: confirmed commits
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Confirmed commits allow the user to request an automatic rollback to the
previous configuration if the commit operation is not confirmed within a
number of minutes. This is particularly useful when the user is
accessing the CLI through the network (e.g. using SSH) and any
configuration change might cause an unexpected loss of connectivity
between the user and the router (e.g. misconfiguration of a routing
protocol). By using a confirmed commit, the user can rest assured the
connectivity will be restored after the given timeout expires, avoiding
the need to access the router physically to fix the problem.

Example of how this feature could be provided in the CLI:
``commit confirmed [minutes <1-60>]``. The ability to do confirmed
commits should also be exposed in the northbound API so that the
northbound plugins can also take advantage of it (in the case of the
Sysrepo and ConfD plugins, confirmed commits are implemented externally
in the *netopeer2-server* and *confd* daemons, respectively).

Proposed feature: enable/disable configuration commands/sections
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Since the ``lyd_node`` data structure from *libyang* can hold private
data, it should be possible to mark configuration commands or sections
as active or inactive. This would allow CLI users to leverage this
feature to disable parts of the running configuration without actually
removing the associated commands, and then re-enable the disabled
configuration commands or sections later when necessary. Example:

::

   ripd(config)# show configuration running
   Configuration:
   [snip]
   !
   router rip
    default-metric 2
    distance 80
    network eth0
    network eth1
   !
   end
   ripd(config)# disable router rip
   ripd(config)# commit
   % Configuration committed successfully (Transaction ID #7).

   ripd(config)# show configuration running
   Configuration:
   [snip]
   !
   !router rip
    !default-metric 2
    !distance 80
    !network eth0
    !network eth1
   !
   end
   ripd(config)# enable router rip
   ripd(config)# commit
   % Configuration committed successfully (Transaction ID #8).

   ripd(config)# show configuration running
   [snip]
   frr defaults traditional
   !
   router rip
    default-metric 2
    distance 80
    network eth0
    network eth1
   !
   end

This capability could be useful in a number of occasions, like disabling
configuration commands that are no longer necessary (e.g. ACLs) but that
might be necessary at a later point in the future. Other example is
allowing users to disable a configuration section for testing purposes,
and then re-enable it easily without needing to copy and paste any
command.

Configuration reloads
~~~~~~~~~~~~~~~~~~~~~

Given the limitations of the previous northbound architecture, the FRR
daemons didn’t have the ability to reload their configuration files by
themselves. The SIGHUP handler of most daemons would only re-read the
configuration file and merge it into the running configuration. In most
cases, however, what is desired is to replace the running configuration
by the updated configuration file. The *frr-reload.py* script was
written to work around this problem and it does it well to a certain
extent. The problem with the *frr-reload.py* script is that it’s full of
special cases here and there, which makes it fragile and unreliable.
Maintaining the script is also an additional burden for FRR developers,
few of whom are familiar with its code or know when it needs to be
updated to account for a new feature.

In the new northbound architecture, reloading the configuration file can
be easily implemented using a configuration transaction. Once the FRR
northbound retrofitting process is complete, all daemons should have the
ability to reload their configuration files upon receiving the SIGHUP
signal, or when the ``configuration load [...] replace`` command is
used. Once that point is reached, the *frr-reload.py* script will no
longer be necessary and should be removed from the FRR repository.

Configuration changes coming from the kernel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This
`post <http://discuss.tail-f.com/t/who-should-not-set-configuration-once-a-system-is-up-and-running/111>`__
from the Tail-f’s® forum describes the problem of letting systems
configure themselves behind the users back. Here are some selected
snippets from it: > Traditionally, northbound interface users are the
ones in charge of providing configuration data for systems. > > In some
systems, we see a deviation from this traditional practice; allowing
systems to configure “themselves” behind the scenes (or behind the users
back). > > While there might be a business case for such a practice,
this kind of configuration remains “dangerous” from northbound users
perspective and makes systems hard to predict and even harder to debug.
(…) > > With the advent of transactional Network configuration, this
practice can not work anymore. The fact that systems are given the right
to change configuration is a key here in breaking transactional
configuration in a Network.

FRR is immune to some of the problems described in the aforementioned
post. Management clients can configure interfaces that don’t yet exist,
and once an interface is deleted from the kernel, its configuration is
retained in FRR.

There are however some cases where information learned from the kernel
(e.g. using netlink) can affect the running configuration of all FRR
daemons. Examples: interface rename events, VRF rename events, interface
being moved to a different VRF, etc. In these cases, since these events
can’t be ignored, the best we can do is to send YANG notifications to
the management clients to inform about the configuration changes. The
management clients should then be prepared to handle such notifications
and react accordingly.

Interfaces and VRFs
~~~~~~~~~~~~~~~~~~~

As of now zebra doesn’t have the ability to create VRFs or virtual
interfaces in the kernel. The ``vrf`` and ``interface`` commands only
create pre-provisioned VRFs and interfaces that are only activated when
the corresponding information is learned from the kernel. When
configuring FRR using an external management client, like a NETCONF
client, it might be desirable to actually create functional VRFs and
virtual interfaces (e.g. VLAN subinterfaces, bridges, etc) that are
installed in the kernel using OS-specific APIs (e.g. netlink, routing
socket, etc). Work needs to be done in this area to make this possible.

Shared configuration objects
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

One of the existing problems in FRR is that it’s hard to ensure that all
daemons are in sync with respect to the shared configuration objects
(e.g. interfaces, VRFs, route-maps, ACLs, etc). When a route-map is
configured using *vtysh*, the same command is sent to all relevant
daemons (the daemons that implement route-maps), which ensures
synchronization among them. The problem is when a daemon starts after
the route-maps are created. In this case this daemon wouldn’t be aware
of the previously configured route-maps (unlike the other daemons),
which can lead to a lot of confusion and unexpected problems.

With the new northbound architecture, configuration objects can be
manipulated using higher level abstractions, which opens more
possibilities to solve this decades-long problem. As an example, one
solution would be to make the FRR daemons fetch the shared configuration
objects from zebra using the ZAPI interface during initialization. The
shared configuration objects could be requested using a list of XPaths
expressions in the ``ZEBRA_HELLO`` message, which zebra would respond by
sending the shared configuration objects encoded in the JSON format.
This solution however doesn’t address the case where zebra starts or
restarts after the other FRR daemons. Other solution would be to store
the shared configuration objects in the northbound SQL database and make
all daemons fetch these objects from there. So far no work has been made
on this area as more investigation needs to be done.

vtysh support
~~~~~~~~~~~~~

As explained in the [[Transactional CLI]] page, all commands introduced
by the transactional CLI are not yet available in *vtysh*. This needs to
be addressed in the short term future. Some challenges for doing that
work include: \* How to display configurations (running, candidates and
rollbacks) in a more clever way? The implementation of the
``show running-config`` command in *vtysh* is not something that should
be followed as an example. A better idea would be to fetch the desired
configuration from all daemons (encoded in JSON for example), merge them
all into a single ``lyd_node`` variable and then display the combined
configurations from this variable (the configuration merges would
transparently take care of combining the shared configuration objects).
In order to be able to manipulate the JSON configurations, *vtysh* will
need to load the YANG modules from all daemons at startup (this might
have a minimal impact on startup time). The only issue with this
approach is that the ``cli_show()`` callbacks from all daemons are
embedded in their binaries and thus not accessible externally. It might
be necessary to compile these callbacks on a separate shared library so
that they are accessible to *vtysh* too. Other than that, displaying the
combined configurations in the JSON/XML formats should be
straightforward. \* With the current design, transaction IDs are
per-daemon and not global across all FRR daemons. This means that the
same transaction ID can represent different transactions on different
daemons. Given this observation, how to implement the
``rollback configuration`` command in *vtysh*? The easy solution would
be to add a ``daemon WORD`` argument to specify the context of the
rollback, but per-daemon rollbacks would certainly be confusing and
convoluted to end users. A better idea would be to attack the root of
the problem: change configuration transactions to be global instead of
being per-daemon. This involves a bigger change in the northbound
architecture, and would have implications on how transactions are stored
in the SQL database (daemon-specific and shared configuration objects
would need to have their own tables or columns). \* Loading
configuration files in the JSON or XML formats will be tricky, as
*vtysh* will need to know which sections of the configuration should be
sent to which daemons. *vtysh* will either need to fetch the YANG
modules implemented by all daemons at runtime or obtain this information
at compile-time somehow.

Detecting type mismatches at compile-time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

As described in the [[Retrofitting Configuration Commands]] page, the
northbound configuration callbacks detect type mismatches at runtime
when fetching data from the the ``dnode`` parameter (which represents
the configuration node being created, modified, deleted or moved). When
a type mismatch is detected, the program aborts and displays a backtrace
showing where the problem happened. It would be desirable to detect such
type mismatches at compile-time, the earlier the problems are detected
the sooner they are fixed.

One possible solution to this problem would be to auto-generate C
structures from the YANG models and provide a function that converts a
libyang’s ``lyd_node`` variable to a C structure containing the same
information. The northbound callbacks could then fetch configuration
data from this C structure, which would naturally lead to type
mismatches being detected at compile time. One of the challenges of
doing this would be the handling of YANG lists and leaf-lists. It would
be necessary to use dynamic data structures like hashes or rb-trees to
hold all elements of the lists and leaf-lists, and the process of
converting a ``lyd_node`` to an auto-generated C-structure could be
expensive. At this point it’s unclear if it’s worth adding more
complexity in the northbound architecture to solve this specific
problem.