1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
|
=========================
NXP SJA1105 switch driver
=========================
Overview
========
The NXP SJA1105 is a family of 6 devices:
- SJA1105E: First generation, no TTEthernet
- SJA1105T: First generation, TTEthernet
- SJA1105P: Second generation, no TTEthernet, no SGMII
- SJA1105Q: Second generation, TTEthernet, no SGMII
- SJA1105R: Second generation, no TTEthernet, SGMII
- SJA1105S: Second generation, TTEthernet, SGMII
These are SPI-managed automotive switches, with all ports being gigabit
capable, and supporting MII/RMII/RGMII and optionally SGMII on one port.
Being automotive parts, their configuration interface is geared towards
set-and-forget use, with minimal dynamic interaction at runtime. They
require a static configuration to be composed by software and packed
with CRC and table headers, and sent over SPI.
The static configuration is composed of several configuration tables. Each
table takes a number of entries. Some configuration tables can be (partially)
reconfigured at runtime, some not. Some tables are mandatory, some not:
============================= ================== =============================
Table Mandatory Reconfigurable
============================= ================== =============================
Schedule no no
Schedule entry points if Scheduling no
VL Lookup no no
VL Policing if VL Lookup no
VL Forwarding if VL Lookup no
L2 Lookup no no
L2 Policing yes no
VLAN Lookup yes yes
L2 Forwarding yes partially (fully on P/Q/R/S)
MAC Config yes partially (fully on P/Q/R/S)
Schedule Params if Scheduling no
Schedule Entry Points Params if Scheduling no
VL Forwarding Params if VL Forwarding no
L2 Lookup Params no partially (fully on P/Q/R/S)
L2 Forwarding Params yes no
Clock Sync Params no no
AVB Params no no
General Params yes partially
Retagging no yes
xMII Params yes no
SGMII no yes
============================= ================== =============================
Also the configuration is write-only (software cannot read it back from the
switch except for very few exceptions).
The driver creates a static configuration at probe time, and keeps it at
all times in memory, as a shadow for the hardware state. When required to
change a hardware setting, the static configuration is also updated.
If that changed setting can be transmitted to the switch through the dynamic
reconfiguration interface, it is; otherwise the switch is reset and
reprogrammed with the updated static configuration.
Traffic support
===============
The switches do not have hardware support for DSA tags, except for "slow
protocols" for switch control as STP and PTP. For these, the switches have two
programmable filters for link-local destination MACs.
These are used to trap BPDUs and PTP traffic to the master netdevice, and are
further used to support STP and 1588 ordinary clock/boundary clock
functionality. For frames trapped to the CPU, source port and switch ID
information is encoded by the hardware into the frames.
But by leveraging ``CONFIG_NET_DSA_TAG_8021Q`` (a software-defined DSA tagging
format based on VLANs), general-purpose traffic termination through the network
stack can be supported under certain circumstances.
Depending on VLAN awareness state, the following operating modes are possible
with the switch:
- Mode 1 (VLAN-unaware): a port is in this mode when it is used as a standalone
net device, or when it is enslaved to a bridge with ``vlan_filtering=0``.
- Mode 2 (fully VLAN-aware): a port is in this mode when it is enslaved to a
bridge with ``vlan_filtering=1``. Access to the entire VLAN range is given to
the user through ``bridge vlan`` commands, but general-purpose (anything
other than STP, PTP etc) traffic termination is not possible through the
switch net devices. The other packets can be still by user space processed
through the DSA master interface (similar to ``DSA_TAG_PROTO_NONE``).
- Mode 3 (best-effort VLAN-aware): a port is in this mode when enslaved to a
bridge with ``vlan_filtering=1``, and the devlink property of its parent
switch named ``best_effort_vlan_filtering`` is set to ``true``. When
configured like this, the range of usable VIDs is reduced (0 to 1023 and 3072
to 4094), so is the number of usable VIDs (maximum of 7 non-pvid VLANs per
port*), and shared VLAN learning is performed (FDB lookup is done only by
DMAC, not also by VID).
To summarize, in each mode, the following types of traffic are supported over
the switch net devices:
+-------------+-----------+--------------+------------+
| | Mode 1 | Mode 2 | Mode 3 |
+=============+===========+==============+============+
| Regular | Yes | No | Yes |
| traffic | | (use master) | |
+-------------+-----------+--------------+------------+
| Management | Yes | Yes | Yes |
| traffic | | | |
| (BPDU, PTP) | | | |
+-------------+-----------+--------------+------------+
To configure the switch to operate in Mode 3, the following steps can be
followed::
ip link add dev br0 type bridge
# swp2 operates in Mode 1 now
ip link set dev swp2 master br0
# swp2 temporarily moves to Mode 2
ip link set dev br0 type bridge vlan_filtering 1
[ 61.204770] sja1105 spi0.1: Reset switch and programmed static config. Reason: VLAN filtering
[ 61.239944] sja1105 spi0.1: Disabled switch tagging
# swp3 now operates in Mode 3
devlink dev param set spi/spi0.1 name best_effort_vlan_filtering value true cmode runtime
[ 64.682927] sja1105 spi0.1: Reset switch and programmed static config. Reason: VLAN filtering
[ 64.711925] sja1105 spi0.1: Enabled switch tagging
# Cannot use VLANs in range 1024-3071 while in Mode 3.
bridge vlan add dev swp2 vid 1025 untagged pvid
RTNETLINK answers: Operation not permitted
bridge vlan add dev swp2 vid 100
bridge vlan add dev swp2 vid 101 untagged
bridge vlan
port vlan ids
swp5 1 PVID Egress Untagged
swp2 1 PVID Egress Untagged
100
101 Egress Untagged
swp3 1 PVID Egress Untagged
swp4 1 PVID Egress Untagged
br0 1 PVID Egress Untagged
bridge vlan add dev swp2 vid 102
bridge vlan add dev swp2 vid 103
bridge vlan add dev swp2 vid 104
bridge vlan add dev swp2 vid 105
bridge vlan add dev swp2 vid 106
bridge vlan add dev swp2 vid 107
# Cannot use mode than 7 VLANs per port while in Mode 3.
[ 3885.216832] sja1105 spi0.1: No more free subvlans
\* "maximum of 7 non-pvid VLANs per port": Decoding VLAN-tagged packets on the
CPU in mode 3 is possible through VLAN retagging of packets that go from the
switch to the CPU. In cross-chip topologies, the port that goes to the CPU
might also go to other switches. In that case, those other switches will see
only a retagged packet (which only has meaning for the CPU). So if they are
interested in this VLAN, they need to apply retagging in the reverse direction,
to recover the original value from it. This consumes extra hardware resources
for this switch. There is a maximum of 32 entries in the Retagging Table of
each switch device.
As an example, consider this cross-chip topology::
+-------------------------------------------------+
| Host SoC |
| +-------------------------+ |
| | DSA master for embedded | |
| | switch (non-sja1105) | |
| +--------+-------------------------+--------+ |
| | embedded L2 switch | |
| | | |
| | +--------------+ +--------------+ | |
| | |DSA master for| |DSA master for| | |
| | | SJA1105 1 | | SJA1105 2 | | |
+--+---+--------------+-----+--------------+---+--+
+-----------------------+ +-----------------------+
| SJA1105 switch 1 | | SJA1105 switch 2 |
+-----+-----+-----+-----+ +-----+-----+-----+-----+
|sw1p0|sw1p1|sw1p2|sw1p3| |sw2p0|sw2p1|sw2p2|sw2p3|
+-----+-----+-----+-----+ +-----+-----+-----+-----+
To reach the CPU, SJA1105 switch 1 (spi/spi2.1) uses the same port as is uses
to reach SJA1105 switch 2 (spi/spi2.2), which would be port 4 (not drawn).
Similarly for SJA1105 switch 2.
Also consider the following commands, that add VLAN 100 to every sja1105 user
port::
devlink dev param set spi/spi2.1 name best_effort_vlan_filtering value true cmode runtime
devlink dev param set spi/spi2.2 name best_effort_vlan_filtering value true cmode runtime
ip link add dev br0 type bridge
for port in sw1p0 sw1p1 sw1p2 sw1p3 \
sw2p0 sw2p1 sw2p2 sw2p3; do
ip link set dev $port master br0
done
ip link set dev br0 type bridge vlan_filtering 1
for port in sw1p0 sw1p1 sw1p2 sw1p3 \
sw2p0 sw2p1 sw2p2; do
bridge vlan add dev $port vid 100
done
ip link add link br0 name br0.100 type vlan id 100 && ip link set dev br0.100 up
ip addr add 192.168.100.3/24 dev br0.100
bridge vlan add dev br0 vid 100 self
bridge vlan
port vlan ids
sw1p0 1 PVID Egress Untagged
100
sw1p1 1 PVID Egress Untagged
100
sw1p2 1 PVID Egress Untagged
100
sw1p3 1 PVID Egress Untagged
100
sw2p0 1 PVID Egress Untagged
100
sw2p1 1 PVID Egress Untagged
100
sw2p2 1 PVID Egress Untagged
100
sw2p3 1 PVID Egress Untagged
br0 1 PVID Egress Untagged
100
SJA1105 switch 1 consumes 1 retagging entry for each VLAN on each user port
towards the CPU. It also consumes 1 retagging entry for each non-pvid VLAN that
it is also interested in, which is configured on any port of any neighbor
switch.
In this case, SJA1105 switch 1 consumes a total of 11 retagging entries, as
follows:
- 8 retagging entries for VLANs 1 and 100 installed on its user ports
(``sw1p0`` - ``sw1p3``)
- 3 retagging entries for VLAN 100 installed on the user ports of SJA1105
switch 2 (``sw2p0`` - ``sw2p2``), because it also has ports that are
interested in it. The VLAN 1 is a pvid on SJA1105 switch 2 and does not need
reverse retagging.
SJA1105 switch 2 also consumes 11 retagging entries, but organized as follows:
- 7 retagging entries for the bridge VLANs on its user ports (``sw2p0`` -
``sw2p3``).
- 4 retagging entries for VLAN 100 installed on the user ports of SJA1105
switch 1 (``sw1p0`` - ``sw1p3``).
Switching features
==================
The driver supports the configuration of L2 forwarding rules in hardware for
port bridging. The forwarding, broadcast and flooding domain between ports can
be restricted through two methods: either at the L2 forwarding level (isolate
one bridge's ports from another's) or at the VLAN port membership level
(isolate ports within the same bridge). The final forwarding decision taken by
the hardware is a logical AND of these two sets of rules.
The hardware tags all traffic internally with a port-based VLAN (pvid), or it
decodes the VLAN information from the 802.1Q tag. Advanced VLAN classification
is not possible. Once attributed a VLAN tag, frames are checked against the
port's membership rules and dropped at ingress if they don't match any VLAN.
This behavior is available when switch ports are enslaved to a bridge with
``vlan_filtering 1``.
Normally the hardware is not configurable with respect to VLAN awareness, but
by changing what TPID the switch searches 802.1Q tags for, the semantics of a
bridge with ``vlan_filtering 0`` can be kept (accept all traffic, tagged or
untagged), and therefore this mode is also supported.
Segregating the switch ports in multiple bridges is supported (e.g. 2 + 2), but
all bridges should have the same level of VLAN awareness (either both have
``vlan_filtering`` 0, or both 1). Also an inevitable limitation of the fact
that VLAN awareness is global at the switch level is that once a bridge with
``vlan_filtering`` enslaves at least one switch port, the other un-bridged
ports are no longer available for standalone traffic termination.
Topology and loop detection through STP is supported.
L2 FDB manipulation (add/delete/dump) is currently possible for the first
generation devices. Aging time of FDB entries, as well as enabling fully static
management (no address learning and no flooding of unknown traffic) is not yet
configurable in the driver.
A special comment about bridging with other netdevices (illustrated with an
example):
A board has eth0, eth1, swp0@eth1, swp1@eth1, swp2@eth1, swp3@eth1.
The switch ports (swp0-3) are under br0.
It is desired that eth0 is turned into another switched port that communicates
with swp0-3.
If br0 has vlan_filtering 0, then eth0 can simply be added to br0 with the
intended results.
If br0 has vlan_filtering 1, then a new br1 interface needs to be created that
enslaves eth0 and eth1 (the DSA master of the switch ports). This is because in
this mode, the switch ports beneath br0 are not capable of regular traffic, and
are only used as a conduit for switchdev operations.
Offloads
========
Time-aware scheduling
---------------------
The switch supports a variation of the enhancements for scheduled traffic
specified in IEEE 802.1Q-2018 (formerly 802.1Qbv). This means it can be used to
ensure deterministic latency for priority traffic that is sent in-band with its
gate-open event in the network schedule.
This capability can be managed through the tc-taprio offload ('flags 2'). The
difference compared to the software implementation of taprio is that the latter
would only be able to shape traffic originated from the CPU, but not
autonomously forwarded flows.
The device has 8 traffic classes, and maps incoming frames to one of them based
on the VLAN PCP bits (if no VLAN is present, the port-based default is used).
As described in the previous sections, depending on the value of
``vlan_filtering``, the EtherType recognized by the switch as being VLAN can
either be the typical 0x8100 or a custom value used internally by the driver
for tagging. Therefore, the switch ignores the VLAN PCP if used in standalone
or bridge mode with ``vlan_filtering=0``, as it will not recognize the 0x8100
EtherType. In these modes, injecting into a particular TX queue can only be
done by the DSA net devices, which populate the PCP field of the tagging header
on egress. Using ``vlan_filtering=1``, the behavior is the other way around:
offloaded flows can be steered to TX queues based on the VLAN PCP, but the DSA
net devices are no longer able to do that. To inject frames into a hardware TX
queue with VLAN awareness active, it is necessary to create a VLAN
sub-interface on the DSA master port, and send normal (0x8100) VLAN-tagged
towards the switch, with the VLAN PCP bits set appropriately.
Management traffic (having DMAC 01-80-C2-xx-xx-xx or 01-19-1B-xx-xx-xx) is the
notable exception: the switch always treats it with a fixed priority and
disregards any VLAN PCP bits even if present. The traffic class for management
traffic has a value of 7 (highest priority) at the moment, which is not
configurable in the driver.
Below is an example of configuring a 500 us cyclic schedule on egress port
``swp5``. The traffic class gate for management traffic (7) is open for 100 us,
and the gates for all other traffic classes are open for 400 us::
#!/bin/bash
set -e -u -o pipefail
NSEC_PER_SEC="1000000000"
gatemask() {
local tc_list="$1"
local mask=0
for tc in ${tc_list}; do
mask=$((${mask} | (1 << ${tc})))
done
printf "%02x" ${mask}
}
if ! systemctl is-active --quiet ptp4l; then
echo "Please start the ptp4l service"
exit
fi
now=$(phc_ctl /dev/ptp1 get | gawk '/clock time is/ { print $5; }')
# Phase-align the base time to the start of the next second.
sec=$(echo "${now}" | gawk -F. '{ print $1; }')
base_time="$(((${sec} + 1) * ${NSEC_PER_SEC}))"
tc qdisc add dev swp5 parent root handle 100 taprio \
num_tc 8 \
map 0 1 2 3 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S $(gatemask 7) 100000 \
sched-entry S $(gatemask "0 1 2 3 4 5 6") 400000 \
flags 2
It is possible to apply the tc-taprio offload on multiple egress ports. There
are hardware restrictions related to the fact that no gate event may trigger
simultaneously on two ports. The driver checks the consistency of the schedules
against this restriction and errors out when appropriate. Schedule analysis is
needed to avoid this, which is outside the scope of the document.
Routing actions (redirect, trap, drop)
--------------------------------------
The switch is able to offload flow-based redirection of packets to a set of
destination ports specified by the user. Internally, this is implemented by
making use of Virtual Links, a TTEthernet concept.
The driver supports 2 types of keys for Virtual Links:
- VLAN-aware virtual links: these match on destination MAC address, VLAN ID and
VLAN PCP.
- VLAN-unaware virtual links: these match on destination MAC address only.
The VLAN awareness state of the bridge (vlan_filtering) cannot be changed while
there are virtual link rules installed.
Composing multiple actions inside the same rule is supported. When only routing
actions are requested, the driver creates a "non-critical" virtual link. When
the action list also contains tc-gate (more details below), the virtual link
becomes "time-critical" (draws frame buffers from a reserved memory partition,
etc).
The 3 routing actions that are supported are "trap", "drop" and "redirect".
Example 1: send frames received on swp2 with a DA of 42:be:24:9b:76:20 to the
CPU and to swp3. This type of key (DA only) when the port's VLAN awareness
state is off::
tc qdisc add dev swp2 clsact
tc filter add dev swp2 ingress flower skip_sw dst_mac 42:be:24:9b:76:20 \
action mirred egress redirect dev swp3 \
action trap
Example 2: drop frames received on swp2 with a DA of 42:be:24:9b:76:20, a VID
of 100 and a PCP of 0::
tc filter add dev swp2 ingress protocol 802.1Q flower skip_sw \
dst_mac 42:be:24:9b:76:20 vlan_id 100 vlan_prio 0 action drop
Time-based ingress policing
---------------------------
The TTEthernet hardware abilities of the switch can be constrained to act
similarly to the Per-Stream Filtering and Policing (PSFP) clause specified in
IEEE 802.1Q-2018 (formerly 802.1Qci). This means it can be used to perform
tight timing-based admission control for up to 1024 flows (identified by a
tuple composed of destination MAC address, VLAN ID and VLAN PCP). Packets which
are received outside their expected reception window are dropped.
This capability can be managed through the offload of the tc-gate action. As
routing actions are intrinsic to virtual links in TTEthernet (which performs
explicit routing of time-critical traffic and does not leave that in the hands
of the FDB, flooding etc), the tc-gate action may never appear alone when
asking sja1105 to offload it. One (or more) redirect or trap actions must also
follow along.
Example: create a tc-taprio schedule that is phase-aligned with a tc-gate
schedule (the clocks must be synchronized by a 1588 application stack, which is
outside the scope of this document). No packet delivered by the sender will be
dropped. Note that the reception window is larger than the transmission window
(and much more so, in this example) to compensate for the packet propagation
delay of the link (which can be determined by the 1588 application stack).
Receiver (sja1105)::
tc qdisc add dev swp2 clsact
now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate base-time ${base_time} \
sched-entry OPEN 60000 -1 -1 \
sched-entry CLOSE 40000 -1 -1 \
action trap
Sender::
now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc qdisc add dev eno0 parent root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S 01 50000 \
sched-entry S 00 50000 \
flags 2
The engine used to schedule the ingress gate operations is the same that the
one used for the tc-taprio offload. Therefore, the restrictions regarding the
fact that no two gate actions (either tc-gate or tc-taprio gates) may fire at
the same time (during the same 200 ns slot) still apply.
To come in handy, it is possible to share time-triggered virtual links across
more than 1 ingress port, via flow blocks. In this case, the restriction of
firing at the same time does not apply because there is a single schedule in
the system, that of the shared virtual link::
tc qdisc add dev swp2 ingress_block 1 clsact
tc qdisc add dev swp3 ingress_block 1 clsact
tc filter add block 1 flower skip_sw dst_mac 42:be:24:9b:76:20 \
action gate index 2 \
base-time 0 \
sched-entry OPEN 50000000 -1 -1 \
sched-entry CLOSE 50000000 -1 -1 \
action trap
Hardware statistics for each flow are also available ("pkts" counts the number
of dropped frames, which is a sum of frames dropped due to timing violations,
lack of destination ports and MTU enforcement checks). Byte-level counters are
not available.
Device Tree bindings and board design
=====================================
This section references ``Documentation/devicetree/bindings/net/dsa/sja1105.txt``
and aims to showcase some potential switch caveats.
RMII PHY role and out-of-band signaling
---------------------------------------
In the RMII spec, the 50 MHz clock signals are either driven by the MAC or by
an external oscillator (but not by the PHY).
But the spec is rather loose and devices go outside it in several ways.
Some PHYs go against the spec and may provide an output pin where they source
the 50 MHz clock themselves, in an attempt to be helpful.
On the other hand, the SJA1105 is only binary configurable - when in the RMII
MAC role it will also attempt to drive the clock signal. To prevent this from
happening it must be put in RMII PHY role.
But doing so has some unintended consequences.
In the RMII spec, the PHY can transmit extra out-of-band signals via RXD[1:0].
These are practically some extra code words (/J/ and /K/) sent prior to the
preamble of each frame. The MAC does not have this out-of-band signaling
mechanism defined by the RMII spec.
So when the SJA1105 port is put in PHY role to avoid having 2 drivers on the
clock signal, inevitably an RMII PHY-to-PHY connection is created. The SJA1105
emulates a PHY interface fully and generates the /J/ and /K/ symbols prior to
frame preambles, which the real PHY is not expected to understand. So the PHY
simply encodes the extra symbols received from the SJA1105-as-PHY onto the
100Base-Tx wire.
On the other side of the wire, some link partners might discard these extra
symbols, while others might choke on them and discard the entire Ethernet
frames that follow along. This looks like packet loss with some link partners
but not with others.
The take-away is that in RMII mode, the SJA1105 must be let to drive the
reference clock if connected to a PHY.
RGMII fixed-link and internal delays
------------------------------------
As mentioned in the bindings document, the second generation of devices has
tunable delay lines as part of the MAC, which can be used to establish the
correct RGMII timing budget.
When powered up, these can shift the Rx and Tx clocks with a phase difference
between 73.8 and 101.7 degrees.
The catch is that the delay lines need to lock onto a clock signal with a
stable frequency. This means that there must be at least 2 microseconds of
silence between the clock at the old vs at the new frequency. Otherwise the
lock is lost and the delay lines must be reset (powered down and back up).
In RGMII the clock frequency changes with link speed (125 MHz at 1000 Mbps, 25
MHz at 100 Mbps and 2.5 MHz at 10 Mbps), and link speed might change during the
AN process.
In the situation where the switch port is connected through an RGMII fixed-link
to a link partner whose link state life cycle is outside the control of Linux
(such as a different SoC), then the delay lines would remain unlocked (and
inactive) until there is manual intervention (ifdown/ifup on the switch port).
The take-away is that in RGMII mode, the switch's internal delays are only
reliable if the link partner never changes link speeds, or if it does, it does
so in a way that is coordinated with the switch port (practically, both ends of
the fixed-link are under control of the same Linux system).
As to why would a fixed-link interface ever change link speeds: there are
Ethernet controllers out there which come out of reset in 100 Mbps mode, and
their driver inevitably needs to change the speed and clock frequency if it's
required to work at gigabit.
MDIO bus and PHY management
---------------------------
The SJA1105 does not have an MDIO bus and does not perform in-band AN either.
Therefore there is no link state notification coming from the switch device.
A board would need to hook up the PHYs connected to the switch to any other
MDIO bus available to Linux within the system (e.g. to the DSA master's MDIO
bus). Link state management then works by the driver manually keeping in sync
(over SPI commands) the MAC link speed with the settings negotiated by the PHY.
|