diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-27 10:05:51 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-27 10:05:51 +0000 |
commit | 5d1646d90e1f2cceb9f0828f4b28318cd0ec7744 (patch) | |
tree | a94efe259b9009378be6d90eb30d2b019d95c194 /Documentation/networking/dsa | |
parent | Initial commit. (diff) | |
download | linux-upstream/5.10.209.tar.xz linux-upstream/5.10.209.zip |
Adding upstream version 5.10.209.upstream/5.10.209upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'Documentation/networking/dsa')
-rw-r--r-- | Documentation/networking/dsa/b53.rst | 183 | ||||
-rw-r--r-- | Documentation/networking/dsa/bcm_sf2.rst | 115 | ||||
-rw-r--r-- | Documentation/networking/dsa/configuration.rst | 292 | ||||
-rw-r--r-- | Documentation/networking/dsa/dsa.rst | 587 | ||||
-rw-r--r-- | Documentation/networking/dsa/index.rst | 13 | ||||
-rw-r--r-- | Documentation/networking/dsa/lan9303.rst | 37 | ||||
-rw-r--r-- | Documentation/networking/dsa/sja1105.rst | 581 |
7 files changed, 1808 insertions, 0 deletions
diff --git a/Documentation/networking/dsa/b53.rst b/Documentation/networking/dsa/b53.rst new file mode 100644 index 000000000..b41637cdb --- /dev/null +++ b/Documentation/networking/dsa/b53.rst @@ -0,0 +1,183 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================================== +Broadcom RoboSwitch Ethernet switch driver +========================================== + +The Broadcom RoboSwitch Ethernet switch family is used in quite a range of +xDSL router, cable modems and other multimedia devices. + +The actual implementation supports the devices BCM5325E, BCM5365, BCM539x, +BCM53115 and BCM53125 as well as BCM63XX. + +Implementation details +====================== + +The driver is located in ``drivers/net/dsa/b53/`` and is implemented as a +DSA driver; see ``Documentation/networking/dsa/dsa.rst`` for details on the +subsystem and what it provides. + +The switch is, if possible, configured to enable a Broadcom specific 4-bytes +switch tag which gets inserted by the switch for every packet forwarded to the +CPU interface, conversely, the CPU network interface should insert a similar +tag for packets entering the CPU port. The tag format is described in +``net/dsa/tag_brcm.c``. + +The configuration of the device depends on whether or not tagging is +supported. + +The interface names and example network configuration are used according the +configuration described in the :ref:`dsa-config-showcases`. + +Configuration with tagging support +---------------------------------- + +The tagging based configuration is desired. It is not specific to the b53 +DSA driver and will work like all DSA drivers which supports tagging. + +See :ref:`dsa-tagged-configuration`. + +Configuration without tagging support +------------------------------------- + +Older models (5325, 5365) support a different tag format that is not supported +yet. 539x and 531x5 require managed mode and some special handling, which is +also not yet supported. The tagging support is disabled in these cases and the +switch need a different configuration. + +The configuration slightly differ from the :ref:`dsa-vlan-configuration`. + +The b53 tags the CPU port in all VLANs, since otherwise any PVID untagged +VLAN programming would basically change the CPU port's default PVID and make +it untagged, undesirable. + +In difference to the configuration described in :ref:`dsa-vlan-configuration` +the default VLAN 1 has to be removed from the slave interface configuration in +single port and gateway configuration, while there is no need to add an extra +VLAN configuration in the bridge showcase. + +single port +~~~~~~~~~~~ +The configuration can only be set up via VLAN tagging and bridge setup. +By default packages are tagged with vid 1: + +.. code-block:: sh + + # tag traffic on CPU port + ip link add link eth0 name eth0.1 type vlan id 1 + ip link add link eth0 name eth0.2 type vlan id 2 + ip link add link eth0 name eth0.3 type vlan id 3 + + # The master interface needs to be brought up before the slave ports. + ip link set eth0 up + ip link set eth0.1 up + ip link set eth0.2 up + ip link set eth0.3 up + + # bring up the slave interfaces + ip link set wan up + ip link set lan1 up + ip link set lan2 up + + # create bridge + ip link add name br0 type bridge + + # activate VLAN filtering + ip link set dev br0 type bridge vlan_filtering 1 + + # add ports to bridges + ip link set dev wan master br0 + ip link set dev lan1 master br0 + ip link set dev lan2 master br0 + + # tag traffic on ports + bridge vlan add dev lan1 vid 2 pvid untagged + bridge vlan del dev lan1 vid 1 + bridge vlan add dev lan2 vid 3 pvid untagged + bridge vlan del dev lan2 vid 1 + + # configure the VLANs + ip addr add 192.0.2.1/30 dev eth0.1 + ip addr add 192.0.2.5/30 dev eth0.2 + ip addr add 192.0.2.9/30 dev eth0.3 + + # bring up the bridge devices + ip link set br0 up + + +bridge +~~~~~~ + +.. code-block:: sh + + # tag traffic on CPU port + ip link add link eth0 name eth0.1 type vlan id 1 + + # The master interface needs to be brought up before the slave ports. + ip link set eth0 up + ip link set eth0.1 up + + # bring up the slave interfaces + ip link set wan up + ip link set lan1 up + ip link set lan2 up + + # create bridge + ip link add name br0 type bridge + + # activate VLAN filtering + ip link set dev br0 type bridge vlan_filtering 1 + + # add ports to bridge + ip link set dev wan master br0 + ip link set dev lan1 master br0 + ip link set dev lan2 master br0 + ip link set eth0.1 master br0 + + # configure the bridge + ip addr add 192.0.2.129/25 dev br0 + + # bring up the bridge + ip link set dev br0 up + +gateway +~~~~~~~ + +.. code-block:: sh + + # tag traffic on CPU port + ip link add link eth0 name eth0.1 type vlan id 1 + ip link add link eth0 name eth0.2 type vlan id 2 + + # The master interface needs to be brought up before the slave ports. + ip link set eth0 up + ip link set eth0.1 up + ip link set eth0.2 up + + # bring up the slave interfaces + ip link set wan up + ip link set lan1 up + ip link set lan2 up + + # create bridge + ip link add name br0 type bridge + + # activate VLAN filtering + ip link set dev br0 type bridge vlan_filtering 1 + + # add ports to bridges + ip link set dev wan master br0 + ip link set eth0.1 master br0 + ip link set dev lan1 master br0 + ip link set dev lan2 master br0 + + # tag traffic on ports + bridge vlan add dev wan vid 2 pvid untagged + bridge vlan del dev wan vid 1 + + # configure the VLANs + ip addr add 192.0.2.1/30 dev eth0.2 + ip addr add 192.0.2.129/25 dev br0 + + # bring up the bridge devices + ip link set br0 up diff --git a/Documentation/networking/dsa/bcm_sf2.rst b/Documentation/networking/dsa/bcm_sf2.rst new file mode 100644 index 000000000..dee234039 --- /dev/null +++ b/Documentation/networking/dsa/bcm_sf2.rst @@ -0,0 +1,115 @@ +============================================= +Broadcom Starfighter 2 Ethernet switch driver +============================================= + +Broadcom's Starfighter 2 Ethernet switch hardware block is commonly found and +deployed in the following products: + +- xDSL gateways such as BCM63138 +- streaming/multimedia Set Top Box such as BCM7445 +- Cable Modem/residential gateways such as BCM7145/BCM3390 + +The switch is typically deployed in a configuration involving between 5 to 13 +ports, offering a range of built-in and customizable interfaces: + +- single integrated Gigabit PHY +- quad integrated Gigabit PHY +- quad external Gigabit PHY w/ MDIO multiplexer +- integrated MoCA PHY +- several external MII/RevMII/GMII/RGMII interfaces + +The switch also supports specific congestion control features which allow MoCA +fail-over not to lose packets during a MoCA role re-election, as well as out of +band back-pressure to the host CPU network interface when downstream interfaces +are connected at a lower speed. + +The switch hardware block is typically interfaced using MMIO accesses and +contains a bunch of sub-blocks/registers: + +- ``SWITCH_CORE``: common switch registers +- ``SWITCH_REG``: external interfaces switch register +- ``SWITCH_MDIO``: external MDIO bus controller (there is another one in SWITCH_CORE, + which is used for indirect PHY accesses) +- ``SWITCH_INDIR_RW``: 64-bits wide register helper block +- ``SWITCH_INTRL2_0/1``: Level-2 interrupt controllers +- ``SWITCH_ACB``: Admission control block +- ``SWITCH_FCB``: Fail-over control block + +Implementation details +====================== + +The driver is located in ``drivers/net/dsa/bcm_sf2.c`` and is implemented as a DSA +driver; see ``Documentation/networking/dsa/dsa.rst`` for details on the subsystem +and what it provides. + +The SF2 switch is configured to enable a Broadcom specific 4-bytes switch tag +which gets inserted by the switch for every packet forwarded to the CPU +interface, conversely, the CPU network interface should insert a similar tag for +packets entering the CPU port. The tag format is described in +``net/dsa/tag_brcm.c``. + +Overall, the SF2 driver is a fairly regular DSA driver; there are a few +specifics covered below. + +Device Tree probing +------------------- + +The DSA platform device driver is probed using a specific compatible string +provided in ``net/dsa/dsa.c``. The reason for that is because the DSA subsystem gets +registered as a platform device driver currently. DSA will provide the needed +device_node pointers which are then accessible by the switch driver setup +function to setup resources such as register ranges and interrupts. This +currently works very well because none of the of_* functions utilized by the +driver require a struct device to be bound to a struct device_node, but things +may change in the future. + +MDIO indirect accesses +---------------------- + +Due to a limitation in how Broadcom switches have been designed, external +Broadcom switches connected to a SF2 require the use of the DSA slave MDIO bus +in order to properly configure them. By default, the SF2 pseudo-PHY address, and +an external switch pseudo-PHY address will both be snooping for incoming MDIO +transactions, since they are at the same address (30), resulting in some kind of +"double" programming. Using DSA, and setting ``ds->phys_mii_mask`` accordingly, we +selectively divert reads and writes towards external Broadcom switches +pseudo-PHY addresses. Newer revisions of the SF2 hardware have introduced a +configurable pseudo-PHY address which circumvents the initial design limitation. + +Multimedia over CoAxial (MoCA) interfaces +----------------------------------------- + +MoCA interfaces are fairly specific and require the use of a firmware blob which +gets loaded onto the MoCA processor(s) for packet processing. The switch +hardware contains logic which will assert/de-assert link states accordingly for +the MoCA interface whenever the MoCA coaxial cable gets disconnected or the +firmware gets reloaded. The SF2 driver relies on such events to properly set its +MoCA interface carrier state and properly report this to the networking stack. + +The MoCA interfaces are supported using the PHY library's fixed PHY/emulated PHY +device and the switch driver registers a ``fixed_link_update`` callback for such +PHYs which reflects the link state obtained from the interrupt handler. + + +Power Management +---------------- + +Whenever possible, the SF2 driver tries to minimize the overall switch power +consumption by applying a combination of: + +- turning off internal buffers/memories +- disabling packet processing logic +- putting integrated PHYs in IDDQ/low-power +- reducing the switch core clock based on the active port count +- enabling and advertising EEE +- turning off RGMII data processing logic when the link goes down + +Wake-on-LAN +----------- + +Wake-on-LAN is currently implemented by utilizing the host processor Ethernet +MAC controller wake-on logic. Whenever Wake-on-LAN is requested, an intersection +between the user request and the supported host Ethernet interface WoL +capabilities is done and the intersection result gets configured. During +system-wide suspend/resume, only ports not participating in Wake-on-LAN are +disabled. diff --git a/Documentation/networking/dsa/configuration.rst b/Documentation/networking/dsa/configuration.rst new file mode 100644 index 000000000..11bd5e610 --- /dev/null +++ b/Documentation/networking/dsa/configuration.rst @@ -0,0 +1,292 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======================================= +DSA switch configuration from userspace +======================================= + +The DSA switch configuration is not integrated into the main userspace +network configuration suites by now and has to be performed manualy. + +.. _dsa-config-showcases: + +Configuration showcases +----------------------- + +To configure a DSA switch a couple of commands need to be executed. In this +documentation some common configuration scenarios are handled as showcases: + +*single port* + Every switch port acts as a different configurable Ethernet port + +*bridge* + Every switch port is part of one configurable Ethernet bridge + +*gateway* + Every switch port except one upstream port is part of a configurable + Ethernet bridge. + The upstream port acts as different configurable Ethernet port. + +All configurations are performed with tools from iproute2, which is available +at https://www.kernel.org/pub/linux/utils/net/iproute2/ + +Through DSA every port of a switch is handled like a normal linux Ethernet +interface. The CPU port is the switch port connected to an Ethernet MAC chip. +The corresponding linux Ethernet interface is called the master interface. +All other corresponding linux interfaces are called slave interfaces. + +The slave interfaces depend on the master interface. They can only brought up, +when the master interface is up. + +In this documentation the following Ethernet interfaces are used: + +*eth0* + the master interface + +*lan1* + a slave interface + +*lan2* + another slave interface + +*lan3* + a third slave interface + +*wan* + A slave interface dedicated for upstream traffic + +Further Ethernet interfaces can be configured similar. +The configured IPs and networks are: + +*single port* + * lan1: 192.0.2.1/30 (192.0.2.0 - 192.0.2.3) + * lan2: 192.0.2.5/30 (192.0.2.4 - 192.0.2.7) + * lan3: 192.0.2.9/30 (192.0.2.8 - 192.0.2.11) + +*bridge* + * br0: 192.0.2.129/25 (192.0.2.128 - 192.0.2.255) + +*gateway* + * br0: 192.0.2.129/25 (192.0.2.128 - 192.0.2.255) + * wan: 192.0.2.1/30 (192.0.2.0 - 192.0.2.3) + +.. _dsa-tagged-configuration: + +Configuration with tagging support +---------------------------------- + +The tagging based configuration is desired and supported by the majority of +DSA switches. These switches are capable to tag incoming and outgoing traffic +without using a VLAN based configuration. + +single port +~~~~~~~~~~~ + +.. code-block:: sh + + # configure each interface + ip addr add 192.0.2.1/30 dev lan1 + ip addr add 192.0.2.5/30 dev lan2 + ip addr add 192.0.2.9/30 dev lan3 + + # The master interface needs to be brought up before the slave ports. + ip link set eth0 up + + # bring up the slave interfaces + ip link set lan1 up + ip link set lan2 up + ip link set lan3 up + +bridge +~~~~~~ + +.. code-block:: sh + + # The master interface needs to be brought up before the slave ports. + ip link set eth0 up + + # bring up the slave interfaces + ip link set lan1 up + ip link set lan2 up + ip link set lan3 up + + # create bridge + ip link add name br0 type bridge + + # add ports to bridge + ip link set dev lan1 master br0 + ip link set dev lan2 master br0 + ip link set dev lan3 master br0 + + # configure the bridge + ip addr add 192.0.2.129/25 dev br0 + + # bring up the bridge + ip link set dev br0 up + +gateway +~~~~~~~ + +.. code-block:: sh + + # The master interface needs to be brought up before the slave ports. + ip link set eth0 up + + # bring up the slave interfaces + ip link set wan up + ip link set lan1 up + ip link set lan2 up + + # configure the upstream port + ip addr add 192.0.2.1/30 dev wan + + # create bridge + ip link add name br0 type bridge + + # add ports to bridge + ip link set dev lan1 master br0 + ip link set dev lan2 master br0 + + # configure the bridge + ip addr add 192.0.2.129/25 dev br0 + + # bring up the bridge + ip link set dev br0 up + +.. _dsa-vlan-configuration: + +Configuration without tagging support +------------------------------------- + +A minority of switches are not capable to use a taging protocol +(DSA_TAG_PROTO_NONE). These switches can be configured by a VLAN based +configuration. + +single port +~~~~~~~~~~~ +The configuration can only be set up via VLAN tagging and bridge setup. + +.. code-block:: sh + + # tag traffic on CPU port + ip link add link eth0 name eth0.1 type vlan id 1 + ip link add link eth0 name eth0.2 type vlan id 2 + ip link add link eth0 name eth0.3 type vlan id 3 + + # The master interface needs to be brought up before the slave ports. + ip link set eth0 up + ip link set eth0.1 up + ip link set eth0.2 up + ip link set eth0.3 up + + # bring up the slave interfaces + ip link set lan1 up + ip link set lan2 up + ip link set lan3 up + + # create bridge + ip link add name br0 type bridge + + # activate VLAN filtering + ip link set dev br0 type bridge vlan_filtering 1 + + # add ports to bridges + ip link set dev lan1 master br0 + ip link set dev lan2 master br0 + ip link set dev lan3 master br0 + + # tag traffic on ports + bridge vlan add dev lan1 vid 1 pvid untagged + bridge vlan add dev lan2 vid 2 pvid untagged + bridge vlan add dev lan3 vid 3 pvid untagged + + # configure the VLANs + ip addr add 192.0.2.1/30 dev eth0.1 + ip addr add 192.0.2.5/30 dev eth0.2 + ip addr add 192.0.2.9/30 dev eth0.3 + + # bring up the bridge devices + ip link set br0 up + + +bridge +~~~~~~ + +.. code-block:: sh + + # tag traffic on CPU port + ip link add link eth0 name eth0.1 type vlan id 1 + + # The master interface needs to be brought up before the slave ports. + ip link set eth0 up + ip link set eth0.1 up + + # bring up the slave interfaces + ip link set lan1 up + ip link set lan2 up + ip link set lan3 up + + # create bridge + ip link add name br0 type bridge + + # activate VLAN filtering + ip link set dev br0 type bridge vlan_filtering 1 + + # add ports to bridge + ip link set dev lan1 master br0 + ip link set dev lan2 master br0 + ip link set dev lan3 master br0 + ip link set eth0.1 master br0 + + # tag traffic on ports + bridge vlan add dev lan1 vid 1 pvid untagged + bridge vlan add dev lan2 vid 1 pvid untagged + bridge vlan add dev lan3 vid 1 pvid untagged + + # configure the bridge + ip addr add 192.0.2.129/25 dev br0 + + # bring up the bridge + ip link set dev br0 up + +gateway +~~~~~~~ + +.. code-block:: sh + + # tag traffic on CPU port + ip link add link eth0 name eth0.1 type vlan id 1 + ip link add link eth0 name eth0.2 type vlan id 2 + + # The master interface needs to be brought up before the slave ports. + ip link set eth0 up + ip link set eth0.1 up + ip link set eth0.2 up + + # bring up the slave interfaces + ip link set wan up + ip link set lan1 up + ip link set lan2 up + + # create bridge + ip link add name br0 type bridge + + # activate VLAN filtering + ip link set dev br0 type bridge vlan_filtering 1 + + # add ports to bridges + ip link set dev wan master br0 + ip link set eth0.1 master br0 + ip link set dev lan1 master br0 + ip link set dev lan2 master br0 + + # tag traffic on ports + bridge vlan add dev lan1 vid 1 pvid untagged + bridge vlan add dev lan2 vid 1 pvid untagged + bridge vlan add dev wan vid 2 pvid untagged + + # configure the VLANs + ip addr add 192.0.2.1/30 dev eth0.2 + ip addr add 192.0.2.129/25 dev br0 + + # bring up the bridge devices + ip link set br0 up diff --git a/Documentation/networking/dsa/dsa.rst b/Documentation/networking/dsa/dsa.rst new file mode 100644 index 000000000..a8d15dd2b --- /dev/null +++ b/Documentation/networking/dsa/dsa.rst @@ -0,0 +1,587 @@ +============ +Architecture +============ + +This document describes the **Distributed Switch Architecture (DSA)** subsystem +design principles, limitations, interactions with other subsystems, and how to +develop drivers for this subsystem as well as a TODO for developers interested +in joining the effort. + +Design principles +================= + +The Distributed Switch Architecture is a subsystem which was primarily designed +to support Marvell Ethernet switches (MV88E6xxx, a.k.a Linkstreet product line) +using Linux, but has since evolved to support other vendors as well. + +The original philosophy behind this design was to be able to use unmodified +Linux tools such as bridge, iproute2, ifconfig to work transparently whether +they configured/queried a switch port network device or a regular network +device. + +An Ethernet switch is typically comprised of multiple front-panel ports, and one +or more CPU or management port. The DSA subsystem currently relies on the +presence of a management port connected to an Ethernet controller capable of +receiving Ethernet frames from the switch. This is a very common setup for all +kinds of Ethernet switches found in Small Home and Office products: routers, +gateways, or even top-of-the rack switches. This host Ethernet controller will +be later referred to as "master" and "cpu" in DSA terminology and code. + +The D in DSA stands for Distributed, because the subsystem has been designed +with the ability to configure and manage cascaded switches on top of each other +using upstream and downstream Ethernet links between switches. These specific +ports are referred to as "dsa" ports in DSA terminology and code. A collection +of multiple switches connected to each other is called a "switch tree". + +For each front-panel port, DSA will create specialized network devices which are +used as controlling and data-flowing endpoints for use by the Linux networking +stack. These specialized network interfaces are referred to as "slave" network +interfaces in DSA terminology and code. + +The ideal case for using DSA is when an Ethernet switch supports a "switch tag" +which is a hardware feature making the switch insert a specific tag for each +Ethernet frames it received to/from specific ports to help the management +interface figure out: + +- what port is this frame coming from +- what was the reason why this frame got forwarded +- how to send CPU originated traffic to specific ports + +The subsystem does support switches not capable of inserting/stripping tags, but +the features might be slightly limited in that case (traffic separation relies +on Port-based VLAN IDs). + +Note that DSA does not currently create network interfaces for the "cpu" and +"dsa" ports because: + +- the "cpu" port is the Ethernet switch facing side of the management + controller, and as such, would create a duplication of feature, since you + would get two interfaces for the same conduit: master netdev, and "cpu" netdev + +- the "dsa" port(s) are just conduits between two or more switches, and as such + cannot really be used as proper network interfaces either, only the + downstream, or the top-most upstream interface makes sense with that model + +Switch tagging protocols +------------------------ + +DSA currently supports 5 different tagging protocols, and a tag-less mode as +well. The different protocols are implemented in: + +- ``net/dsa/tag_trailer.c``: Marvell's 4 trailer tag mode (legacy) +- ``net/dsa/tag_dsa.c``: Marvell's original DSA tag +- ``net/dsa/tag_edsa.c``: Marvell's enhanced DSA tag +- ``net/dsa/tag_brcm.c``: Broadcom's 4 bytes tag +- ``net/dsa/tag_qca.c``: Qualcomm's 2 bytes tag + +The exact format of the tag protocol is vendor specific, but in general, they +all contain something which: + +- identifies which port the Ethernet frame came from/should be sent to +- provides a reason why this frame was forwarded to the management interface + +Master network devices +---------------------- + +Master network devices are regular, unmodified Linux network device drivers for +the CPU/management Ethernet interface. Such a driver might occasionally need to +know whether DSA is enabled (e.g.: to enable/disable specific offload features), +but the DSA subsystem has been proven to work with industry standard drivers: +``e1000e,`` ``mv643xx_eth`` etc. without having to introduce modifications to these +drivers. Such network devices are also often referred to as conduit network +devices since they act as a pipe between the host processor and the hardware +Ethernet switch. + +Networking stack hooks +---------------------- + +When a master netdev is used with DSA, a small hook is placed in the +networking stack is in order to have the DSA subsystem process the Ethernet +switch specific tagging protocol. DSA accomplishes this by registering a +specific (and fake) Ethernet type (later becoming ``skb->protocol``) with the +networking stack, this is also known as a ``ptype`` or ``packet_type``. A typical +Ethernet Frame receive sequence looks like this: + +Master network device (e.g.: e1000e): + +1. Receive interrupt fires: + + - receive function is invoked + - basic packet processing is done: getting length, status etc. + - packet is prepared to be processed by the Ethernet layer by calling + ``eth_type_trans`` + +2. net/ethernet/eth.c:: + + eth_type_trans(skb, dev) + if (dev->dsa_ptr != NULL) + -> skb->protocol = ETH_P_XDSA + +3. drivers/net/ethernet/\*:: + + netif_receive_skb(skb) + -> iterate over registered packet_type + -> invoke handler for ETH_P_XDSA, calls dsa_switch_rcv() + +4. net/dsa/dsa.c:: + + -> dsa_switch_rcv() + -> invoke switch tag specific protocol handler in 'net/dsa/tag_*.c' + +5. net/dsa/tag_*.c: + + - inspect and strip switch tag protocol to determine originating port + - locate per-port network device + - invoke ``eth_type_trans()`` with the DSA slave network device + - invoked ``netif_receive_skb()`` + +Past this point, the DSA slave network devices get delivered regular Ethernet +frames that can be processed by the networking stack. + +Slave network devices +--------------------- + +Slave network devices created by DSA are stacked on top of their master network +device, each of these network interfaces will be responsible for being a +controlling and data-flowing end-point for each front-panel port of the switch. +These interfaces are specialized in order to: + +- insert/remove the switch tag protocol (if it exists) when sending traffic + to/from specific switch ports +- query the switch for ethtool operations: statistics, link state, + Wake-on-LAN, register dumps... +- external/internal PHY management: link, auto-negotiation etc. + +These slave network devices have custom net_device_ops and ethtool_ops function +pointers which allow DSA to introduce a level of layering between the networking +stack/ethtool, and the switch driver implementation. + +Upon frame transmission from these slave network devices, DSA will look up which +switch tagging protocol is currently registered with these network devices, and +invoke a specific transmit routine which takes care of adding the relevant +switch tag in the Ethernet frames. + +These frames are then queued for transmission using the master network device +``ndo_start_xmit()`` function, since they contain the appropriate switch tag, the +Ethernet switch will be able to process these incoming frames from the +management interface and delivers these frames to the physical switch port. + +Graphical representation +------------------------ + +Summarized, this is basically how DSA looks like from a network device +perspective:: + + + |--------------------------- + | CPU network device (eth0)| + ---------------------------- + | <tag added by switch | + | | + | | + | tag added by CPU> | + |--------------------------------------------| + | Switch driver | + |--------------------------------------------| + || || || + |-------| |-------| |-------| + | sw0p0 | | sw0p1 | | sw0p2 | + |-------| |-------| |-------| + + + +Slave MDIO bus +-------------- + +In order to be able to read to/from a switch PHY built into it, DSA creates a +slave MDIO bus which allows a specific switch driver to divert and intercept +MDIO reads/writes towards specific PHY addresses. In most MDIO-connected +switches, these functions would utilize direct or indirect PHY addressing mode +to return standard MII registers from the switch builtin PHYs, allowing the PHY +library and/or to return link status, link partner pages, auto-negotiation +results etc.. + +For Ethernet switches which have both external and internal MDIO busses, the +slave MII bus can be utilized to mux/demux MDIO reads and writes towards either +internal or external MDIO devices this switch might be connected to: internal +PHYs, external PHYs, or even external switches. + +Data structures +--------------- + +DSA data structures are defined in ``include/net/dsa.h`` as well as +``net/dsa/dsa_priv.h``: + +- ``dsa_chip_data``: platform data configuration for a given switch device, + this structure describes a switch device's parent device, its address, as + well as various properties of its ports: names/labels, and finally a routing + table indication (when cascading switches) + +- ``dsa_platform_data``: platform device configuration data which can reference + a collection of dsa_chip_data structure if multiples switches are cascaded, + the master network device this switch tree is attached to needs to be + referenced + +- ``dsa_switch_tree``: structure assigned to the master network device under + ``dsa_ptr``, this structure references a dsa_platform_data structure as well as + the tagging protocol supported by the switch tree, and which receive/transmit + function hooks should be invoked, information about the directly attached + switch is also provided: CPU port. Finally, a collection of dsa_switch are + referenced to address individual switches in the tree. + +- ``dsa_switch``: structure describing a switch device in the tree, referencing + a ``dsa_switch_tree`` as a backpointer, slave network devices, master network + device, and a reference to the backing``dsa_switch_ops`` + +- ``dsa_switch_ops``: structure referencing function pointers, see below for a + full description. + +Design limitations +================== + +Limits on the number of devices and ports +----------------------------------------- + +DSA currently limits the number of maximum switches within a tree to 4 +(``DSA_MAX_SWITCHES``), and the number of ports per switch to 12 (``DSA_MAX_PORTS``). +These limits could be extended to support larger configurations would this need +arise. + +Lack of CPU/DSA network devices +------------------------------- + +DSA does not currently create slave network devices for the CPU or DSA ports, as +described before. This might be an issue in the following cases: + +- inability to fetch switch CPU port statistics counters using ethtool, which + can make it harder to debug MDIO switch connected using xMII interfaces + +- inability to configure the CPU port link parameters based on the Ethernet + controller capabilities attached to it: http://patchwork.ozlabs.org/patch/509806/ + +- inability to configure specific VLAN IDs / trunking VLANs between switches + when using a cascaded setup + +Common pitfalls using DSA setups +-------------------------------- + +Once a master network device is configured to use DSA (dev->dsa_ptr becomes +non-NULL), and the switch behind it expects a tagging protocol, this network +interface can only exclusively be used as a conduit interface. Sending packets +directly through this interface (e.g.: opening a socket using this interface) +will not make us go through the switch tagging protocol transmit function, so +the Ethernet switch on the other end, expecting a tag will typically drop this +frame. + +Slave network devices check that the master network device is UP before allowing +you to administratively bring UP these slave network devices. A common +configuration mistake is forgetting to bring UP the master network device first. + +Interactions with other subsystems +================================== + +DSA currently leverages the following subsystems: + +- MDIO/PHY library: ``drivers/net/phy/phy.c``, ``mdio_bus.c`` +- Switchdev:``net/switchdev/*`` +- Device Tree for various of_* functions + +MDIO/PHY library +---------------- + +Slave network devices exposed by DSA may or may not be interfacing with PHY +devices (``struct phy_device`` as defined in ``include/linux/phy.h)``, but the DSA +subsystem deals with all possible combinations: + +- internal PHY devices, built into the Ethernet switch hardware +- external PHY devices, connected via an internal or external MDIO bus +- internal PHY devices, connected via an internal MDIO bus +- special, non-autonegotiated or non MDIO-managed PHY devices: SFPs, MoCA; a.k.a + fixed PHYs + +The PHY configuration is done by the ``dsa_slave_phy_setup()`` function and the +logic basically looks like this: + +- if Device Tree is used, the PHY device is looked up using the standard + "phy-handle" property, if found, this PHY device is created and registered + using ``of_phy_connect()`` + +- if Device Tree is used, and the PHY device is "fixed", that is, conforms to + the definition of a non-MDIO managed PHY as defined in + ``Documentation/devicetree/bindings/net/fixed-link.txt``, the PHY is registered + and connected transparently using the special fixed MDIO bus driver + +- finally, if the PHY is built into the switch, as is very common with + standalone switch packages, the PHY is probed using the slave MII bus created + by DSA + + +SWITCHDEV +--------- + +DSA directly utilizes SWITCHDEV when interfacing with the bridge layer, and +more specifically with its VLAN filtering portion when configuring VLANs on top +of per-port slave network devices. Since DSA primarily deals with +MDIO-connected switches, although not exclusively, SWITCHDEV's +prepare/abort/commit phases are often simplified into a prepare phase which +checks whether the operation is supported by the DSA switch driver, and a commit +phase which applies the changes. + +As of today, the only SWITCHDEV objects supported by DSA are the FDB and VLAN +objects. + +Device Tree +----------- + +DSA features a standardized binding which is documented in +``Documentation/devicetree/bindings/net/dsa/dsa.txt``. PHY/MDIO library helper +functions such as ``of_get_phy_mode()``, ``of_phy_connect()`` are also used to query +per-port PHY specific details: interface connection, MDIO bus location etc.. + +Driver development +================== + +DSA switch drivers need to implement a dsa_switch_ops structure which will +contain the various members described below. + +``register_switch_driver()`` registers this dsa_switch_ops in its internal list +of drivers to probe for. ``unregister_switch_driver()`` does the exact opposite. + +Unless requested differently by setting the priv_size member accordingly, DSA +does not allocate any driver private context space. + +Switch configuration +-------------------- + +- ``tag_protocol``: this is to indicate what kind of tagging protocol is supported, + should be a valid value from the ``dsa_tag_protocol`` enum + +- ``probe``: probe routine which will be invoked by the DSA platform device upon + registration to test for the presence/absence of a switch device. For MDIO + devices, it is recommended to issue a read towards internal registers using + the switch pseudo-PHY and return whether this is a supported device. For other + buses, return a non-NULL string + +- ``setup``: setup function for the switch, this function is responsible for setting + up the ``dsa_switch_ops`` private structure with all it needs: register maps, + interrupts, mutexes, locks etc.. This function is also expected to properly + configure the switch to separate all network interfaces from each other, that + is, they should be isolated by the switch hardware itself, typically by creating + a Port-based VLAN ID for each port and allowing only the CPU port and the + specific port to be in the forwarding vector. Ports that are unused by the + platform should be disabled. Past this function, the switch is expected to be + fully configured and ready to serve any kind of request. It is recommended + to issue a software reset of the switch during this setup function in order to + avoid relying on what a previous software agent such as a bootloader/firmware + may have previously configured. + +PHY devices and link management +------------------------------- + +- ``get_phy_flags``: Some switches are interfaced to various kinds of Ethernet PHYs, + if the PHY library PHY driver needs to know about information it cannot obtain + on its own (e.g.: coming from switch memory mapped registers), this function + should return a 32-bits bitmask of "flags", that is private between the switch + driver and the Ethernet PHY driver in ``drivers/net/phy/\*``. + +- ``phy_read``: Function invoked by the DSA slave MDIO bus when attempting to read + the switch port MDIO registers. If unavailable, return 0xffff for each read. + For builtin switch Ethernet PHYs, this function should allow reading the link + status, auto-negotiation results, link partner pages etc.. + +- ``phy_write``: Function invoked by the DSA slave MDIO bus when attempting to write + to the switch port MDIO registers. If unavailable return a negative error + code. + +- ``adjust_link``: Function invoked by the PHY library when a slave network device + is attached to a PHY device. This function is responsible for appropriately + configuring the switch port link parameters: speed, duplex, pause based on + what the ``phy_device`` is providing. + +- ``fixed_link_update``: Function invoked by the PHY library, and specifically by + the fixed PHY driver asking the switch driver for link parameters that could + not be auto-negotiated, or obtained by reading the PHY registers through MDIO. + This is particularly useful for specific kinds of hardware such as QSGMII, + MoCA or other kinds of non-MDIO managed PHYs where out of band link + information is obtained + +Ethtool operations +------------------ + +- ``get_strings``: ethtool function used to query the driver's strings, will + typically return statistics strings, private flags strings etc. + +- ``get_ethtool_stats``: ethtool function used to query per-port statistics and + return their values. DSA overlays slave network devices general statistics: + RX/TX counters from the network device, with switch driver specific statistics + per port + +- ``get_sset_count``: ethtool function used to query the number of statistics items + +- ``get_wol``: ethtool function used to obtain Wake-on-LAN settings per-port, this + function may, for certain implementations also query the master network device + Wake-on-LAN settings if this interface needs to participate in Wake-on-LAN + +- ``set_wol``: ethtool function used to configure Wake-on-LAN settings per-port, + direct counterpart to set_wol with similar restrictions + +- ``set_eee``: ethtool function which is used to configure a switch port EEE (Green + Ethernet) settings, can optionally invoke the PHY library to enable EEE at the + PHY level if relevant. This function should enable EEE at the switch port MAC + controller and data-processing logic + +- ``get_eee``: ethtool function which is used to query a switch port EEE settings, + this function should return the EEE state of the switch port MAC controller + and data-processing logic as well as query the PHY for its currently configured + EEE settings + +- ``get_eeprom_len``: ethtool function returning for a given switch the EEPROM + length/size in bytes + +- ``get_eeprom``: ethtool function returning for a given switch the EEPROM contents + +- ``set_eeprom``: ethtool function writing specified data to a given switch EEPROM + +- ``get_regs_len``: ethtool function returning the register length for a given + switch + +- ``get_regs``: ethtool function returning the Ethernet switch internal register + contents. This function might require user-land code in ethtool to + pretty-print register values and registers + +Power management +---------------- + +- ``suspend``: function invoked by the DSA platform device when the system goes to + suspend, should quiesce all Ethernet switch activities, but keep ports + participating in Wake-on-LAN active as well as additional wake-up logic if + supported + +- ``resume``: function invoked by the DSA platform device when the system resumes, + should resume all Ethernet switch activities and re-configure the switch to be + in a fully active state + +- ``port_enable``: function invoked by the DSA slave network device ndo_open + function when a port is administratively brought up, this function should be + fully enabling a given switch port. DSA takes care of marking the port with + ``BR_STATE_BLOCKING`` if the port is a bridge member, or ``BR_STATE_FORWARDING`` if it + was not, and propagating these changes down to the hardware + +- ``port_disable``: function invoked by the DSA slave network device ndo_close + function when a port is administratively brought down, this function should be + fully disabling a given switch port. DSA takes care of marking the port with + ``BR_STATE_DISABLED`` and propagating changes to the hardware if this port is + disabled while being a bridge member + +Bridge layer +------------ + +- ``port_bridge_join``: bridge layer function invoked when a given switch port is + added to a bridge, this function should be doing the necessary at the switch + level to permit the joining port from being added to the relevant logical + domain for it to ingress/egress traffic with other members of the bridge. + +- ``port_bridge_leave``: bridge layer function invoked when a given switch port is + removed from a bridge, this function should be doing the necessary at the + switch level to deny the leaving port from ingress/egress traffic from the + remaining bridge members. When the port leaves the bridge, it should be aged + out at the switch hardware for the switch to (re) learn MAC addresses behind + this port. + +- ``port_stp_state_set``: bridge layer function invoked when a given switch port STP + state is computed by the bridge layer and should be propagated to switch + hardware to forward/block/learn traffic. The switch driver is responsible for + computing a STP state change based on current and asked parameters and perform + the relevant ageing based on the intersection results + +Bridge VLAN filtering +--------------------- + +- ``port_vlan_filtering``: bridge layer function invoked when the bridge gets + configured for turning on or off VLAN filtering. If nothing specific needs to + be done at the hardware level, this callback does not need to be implemented. + When VLAN filtering is turned on, the hardware must be programmed with + rejecting 802.1Q frames which have VLAN IDs outside of the programmed allowed + VLAN ID map/rules. If there is no PVID programmed into the switch port, + untagged frames must be rejected as well. When turned off the switch must + accept any 802.1Q frames irrespective of their VLAN ID, and untagged frames are + allowed. + +- ``port_vlan_prepare``: bridge layer function invoked when the bridge prepares the + configuration of a VLAN on the given port. If the operation is not supported + by the hardware, this function should return ``-EOPNOTSUPP`` to inform the bridge + code to fallback to a software implementation. No hardware setup must be done + in this function. See port_vlan_add for this and details. + +- ``port_vlan_add``: bridge layer function invoked when a VLAN is configured + (tagged or untagged) for the given switch port + +- ``port_vlan_del``: bridge layer function invoked when a VLAN is removed from the + given switch port + +- ``port_vlan_dump``: bridge layer function invoked with a switchdev callback + function that the driver has to call for each VLAN the given port is a member + of. A switchdev object is used to carry the VID and bridge flags. + +- ``port_fdb_add``: bridge layer function invoked when the bridge wants to install a + Forwarding Database entry, the switch hardware should be programmed with the + specified address in the specified VLAN Id in the forwarding database + associated with this VLAN ID. If the operation is not supported, this + function should return ``-EOPNOTSUPP`` to inform the bridge code to fallback to + a software implementation. + +.. note:: VLAN ID 0 corresponds to the port private database, which, in the context + of DSA, would be its port-based VLAN, used by the associated bridge device. + +- ``port_fdb_del``: bridge layer function invoked when the bridge wants to remove a + Forwarding Database entry, the switch hardware should be programmed to delete + the specified MAC address from the specified VLAN ID if it was mapped into + this port forwarding database + +- ``port_fdb_dump``: bridge layer function invoked with a switchdev callback + function that the driver has to call for each MAC address known to be behind + the given port. A switchdev object is used to carry the VID and FDB info. + +- ``port_mdb_prepare``: bridge layer function invoked when the bridge prepares the + installation of a multicast database entry. If the operation is not supported, + this function should return ``-EOPNOTSUPP`` to inform the bridge code to fallback + to a software implementation. No hardware setup must be done in this function. + See ``port_fdb_add`` for this and details. + +- ``port_mdb_add``: bridge layer function invoked when the bridge wants to install + a multicast database entry, the switch hardware should be programmed with the + specified address in the specified VLAN ID in the forwarding database + associated with this VLAN ID. + +.. note:: VLAN ID 0 corresponds to the port private database, which, in the context + of DSA, would be its port-based VLAN, used by the associated bridge device. + +- ``port_mdb_del``: bridge layer function invoked when the bridge wants to remove a + multicast database entry, the switch hardware should be programmed to delete + the specified MAC address from the specified VLAN ID if it was mapped into + this port forwarding database. + +- ``port_mdb_dump``: bridge layer function invoked with a switchdev callback + function that the driver has to call for each MAC address known to be behind + the given port. A switchdev object is used to carry the VID and MDB info. + +TODO +==== + +Making SWITCHDEV and DSA converge towards an unified codebase +------------------------------------------------------------- + +SWITCHDEV properly takes care of abstracting the networking stack with offload +capable hardware, but does not enforce a strict switch device driver model. On +the other DSA enforces a fairly strict device driver model, and deals with most +of the switch specific. At some point we should envision a merger between these +two subsystems and get the best of both worlds. + +Other hanging fruits +-------------------- + +- making the number of ports fully dynamic and not dependent on ``DSA_MAX_PORTS`` +- allowing more than one CPU/management interface: + http://comments.gmane.org/gmane.linux.network/365657 +- porting more drivers from other vendors: + http://comments.gmane.org/gmane.linux.network/365510 diff --git a/Documentation/networking/dsa/index.rst b/Documentation/networking/dsa/index.rst new file mode 100644 index 000000000..ee631e2d6 --- /dev/null +++ b/Documentation/networking/dsa/index.rst @@ -0,0 +1,13 @@ +=============================== +Distributed Switch Architecture +=============================== + +.. toctree:: + :maxdepth: 1 + + dsa + b53 + bcm_sf2 + lan9303 + sja1105 + configuration diff --git a/Documentation/networking/dsa/lan9303.rst b/Documentation/networking/dsa/lan9303.rst new file mode 100644 index 000000000..e3c820db2 --- /dev/null +++ b/Documentation/networking/dsa/lan9303.rst @@ -0,0 +1,37 @@ +============================== +LAN9303 Ethernet switch driver +============================== + +The LAN9303 is a three port 10/100 Mbps ethernet switch with integrated phys for +the two external ethernet ports. The third port is an RMII/MII interface to a +host master network interface (e.g. fixed link). + + +Driver details +============== + +The driver is implemented as a DSA driver, see ``Documentation/networking/dsa/dsa.rst``. + +See ``Documentation/devicetree/bindings/net/dsa/lan9303.txt`` for device tree +binding. + +The LAN9303 can be managed both via MDIO and I2C, both supported by this driver. + +At startup the driver configures the device to provide two separate network +interfaces (which is the default state of a DSA device). Due to HW limitations, +no HW MAC learning takes place in this mode. + +When both user ports are joined to the same bridge, the normal HW MAC learning +is enabled. This means that unicast traffic is forwarded in HW. Broadcast and +multicast is flooded in HW. STP is also supported in this mode. The driver +support fdb/mdb operations as well, meaning IGMP snooping is supported. + +If one of the user ports leave the bridge, the ports goes back to the initial +separated operation. + + +Driver limitations +================== + + - Support for VLAN filtering is not implemented + - The HW does not support VLAN-specific fdb entries diff --git a/Documentation/networking/dsa/sja1105.rst b/Documentation/networking/dsa/sja1105.rst new file mode 100644 index 000000000..7395a33ba --- /dev/null +++ b/Documentation/networking/dsa/sja1105.rst @@ -0,0 +1,581 @@ +========================= +NXP SJA1105 switch driver +========================= + +Overview +======== + +The NXP SJA1105 is a family of 6 devices: + +- SJA1105E: First generation, no TTEthernet +- SJA1105T: First generation, TTEthernet +- SJA1105P: Second generation, no TTEthernet, no SGMII +- SJA1105Q: Second generation, TTEthernet, no SGMII +- SJA1105R: Second generation, no TTEthernet, SGMII +- SJA1105S: Second generation, TTEthernet, SGMII + +These are SPI-managed automotive switches, with all ports being gigabit +capable, and supporting MII/RMII/RGMII and optionally SGMII on one port. + +Being automotive parts, their configuration interface is geared towards +set-and-forget use, with minimal dynamic interaction at runtime. They +require a static configuration to be composed by software and packed +with CRC and table headers, and sent over SPI. + +The static configuration is composed of several configuration tables. Each +table takes a number of entries. Some configuration tables can be (partially) +reconfigured at runtime, some not. Some tables are mandatory, some not: + +============================= ================== ============================= +Table Mandatory Reconfigurable +============================= ================== ============================= +Schedule no no +Schedule entry points if Scheduling no +VL Lookup no no +VL Policing if VL Lookup no +VL Forwarding if VL Lookup no +L2 Lookup no no +L2 Policing yes no +VLAN Lookup yes yes +L2 Forwarding yes partially (fully on P/Q/R/S) +MAC Config yes partially (fully on P/Q/R/S) +Schedule Params if Scheduling no +Schedule Entry Points Params if Scheduling no +VL Forwarding Params if VL Forwarding no +L2 Lookup Params no partially (fully on P/Q/R/S) +L2 Forwarding Params yes no +Clock Sync Params no no +AVB Params no no +General Params yes partially +Retagging no yes +xMII Params yes no +SGMII no yes +============================= ================== ============================= + + +Also the configuration is write-only (software cannot read it back from the +switch except for very few exceptions). + +The driver creates a static configuration at probe time, and keeps it at +all times in memory, as a shadow for the hardware state. When required to +change a hardware setting, the static configuration is also updated. +If that changed setting can be transmitted to the switch through the dynamic +reconfiguration interface, it is; otherwise the switch is reset and +reprogrammed with the updated static configuration. + +Traffic support +=============== + +The switches do not have hardware support for DSA tags, except for "slow +protocols" for switch control as STP and PTP. For these, the switches have two +programmable filters for link-local destination MACs. +These are used to trap BPDUs and PTP traffic to the master netdevice, and are +further used to support STP and 1588 ordinary clock/boundary clock +functionality. For frames trapped to the CPU, source port and switch ID +information is encoded by the hardware into the frames. + +But by leveraging ``CONFIG_NET_DSA_TAG_8021Q`` (a software-defined DSA tagging +format based on VLANs), general-purpose traffic termination through the network +stack can be supported under certain circumstances. + +Depending on VLAN awareness state, the following operating modes are possible +with the switch: + +- Mode 1 (VLAN-unaware): a port is in this mode when it is used as a standalone + net device, or when it is enslaved to a bridge with ``vlan_filtering=0``. +- Mode 2 (fully VLAN-aware): a port is in this mode when it is enslaved to a + bridge with ``vlan_filtering=1``. Access to the entire VLAN range is given to + the user through ``bridge vlan`` commands, but general-purpose (anything + other than STP, PTP etc) traffic termination is not possible through the + switch net devices. The other packets can be still by user space processed + through the DSA master interface (similar to ``DSA_TAG_PROTO_NONE``). +- Mode 3 (best-effort VLAN-aware): a port is in this mode when enslaved to a + bridge with ``vlan_filtering=1``, and the devlink property of its parent + switch named ``best_effort_vlan_filtering`` is set to ``true``. When + configured like this, the range of usable VIDs is reduced (0 to 1023 and 3072 + to 4094), so is the number of usable VIDs (maximum of 7 non-pvid VLANs per + port*), and shared VLAN learning is performed (FDB lookup is done only by + DMAC, not also by VID). + +To summarize, in each mode, the following types of traffic are supported over +the switch net devices: + ++-------------+-----------+--------------+------------+ +| | Mode 1 | Mode 2 | Mode 3 | ++=============+===========+==============+============+ +| Regular | Yes | No | Yes | +| traffic | | (use master) | | ++-------------+-----------+--------------+------------+ +| Management | Yes | Yes | Yes | +| traffic | | | | +| (BPDU, PTP) | | | | ++-------------+-----------+--------------+------------+ + +To configure the switch to operate in Mode 3, the following steps can be +followed:: + + ip link add dev br0 type bridge + # swp2 operates in Mode 1 now + ip link set dev swp2 master br0 + # swp2 temporarily moves to Mode 2 + ip link set dev br0 type bridge vlan_filtering 1 + [ 61.204770] sja1105 spi0.1: Reset switch and programmed static config. Reason: VLAN filtering + [ 61.239944] sja1105 spi0.1: Disabled switch tagging + # swp3 now operates in Mode 3 + devlink dev param set spi/spi0.1 name best_effort_vlan_filtering value true cmode runtime + [ 64.682927] sja1105 spi0.1: Reset switch and programmed static config. Reason: VLAN filtering + [ 64.711925] sja1105 spi0.1: Enabled switch tagging + # Cannot use VLANs in range 1024-3071 while in Mode 3. + bridge vlan add dev swp2 vid 1025 untagged pvid + RTNETLINK answers: Operation not permitted + bridge vlan add dev swp2 vid 100 + bridge vlan add dev swp2 vid 101 untagged + bridge vlan + port vlan ids + swp5 1 PVID Egress Untagged + + swp2 1 PVID Egress Untagged + 100 + 101 Egress Untagged + + swp3 1 PVID Egress Untagged + + swp4 1 PVID Egress Untagged + + br0 1 PVID Egress Untagged + bridge vlan add dev swp2 vid 102 + bridge vlan add dev swp2 vid 103 + bridge vlan add dev swp2 vid 104 + bridge vlan add dev swp2 vid 105 + bridge vlan add dev swp2 vid 106 + bridge vlan add dev swp2 vid 107 + # Cannot use mode than 7 VLANs per port while in Mode 3. + [ 3885.216832] sja1105 spi0.1: No more free subvlans + +\* "maximum of 7 non-pvid VLANs per port": Decoding VLAN-tagged packets on the +CPU in mode 3 is possible through VLAN retagging of packets that go from the +switch to the CPU. In cross-chip topologies, the port that goes to the CPU +might also go to other switches. In that case, those other switches will see +only a retagged packet (which only has meaning for the CPU). So if they are +interested in this VLAN, they need to apply retagging in the reverse direction, +to recover the original value from it. This consumes extra hardware resources +for this switch. There is a maximum of 32 entries in the Retagging Table of +each switch device. + +As an example, consider this cross-chip topology:: + + +-------------------------------------------------+ + | Host SoC | + | +-------------------------+ | + | | DSA master for embedded | | + | | switch (non-sja1105) | | + | +--------+-------------------------+--------+ | + | | embedded L2 switch | | + | | | | + | | +--------------+ +--------------+ | | + | | |DSA master for| |DSA master for| | | + | | | SJA1105 1 | | SJA1105 2 | | | + +--+---+--------------+-----+--------------+---+--+ + + +-----------------------+ +-----------------------+ + | SJA1105 switch 1 | | SJA1105 switch 2 | + +-----+-----+-----+-----+ +-----+-----+-----+-----+ + |sw1p0|sw1p1|sw1p2|sw1p3| |sw2p0|sw2p1|sw2p2|sw2p3| + +-----+-----+-----+-----+ +-----+-----+-----+-----+ + +To reach the CPU, SJA1105 switch 1 (spi/spi2.1) uses the same port as is uses +to reach SJA1105 switch 2 (spi/spi2.2), which would be port 4 (not drawn). +Similarly for SJA1105 switch 2. + +Also consider the following commands, that add VLAN 100 to every sja1105 user +port:: + + devlink dev param set spi/spi2.1 name best_effort_vlan_filtering value true cmode runtime + devlink dev param set spi/spi2.2 name best_effort_vlan_filtering value true cmode runtime + ip link add dev br0 type bridge + for port in sw1p0 sw1p1 sw1p2 sw1p3 \ + sw2p0 sw2p1 sw2p2 sw2p3; do + ip link set dev $port master br0 + done + ip link set dev br0 type bridge vlan_filtering 1 + for port in sw1p0 sw1p1 sw1p2 sw1p3 \ + sw2p0 sw2p1 sw2p2; do + bridge vlan add dev $port vid 100 + done + ip link add link br0 name br0.100 type vlan id 100 && ip link set dev br0.100 up + ip addr add 192.168.100.3/24 dev br0.100 + bridge vlan add dev br0 vid 100 self + + bridge vlan + port vlan ids + sw1p0 1 PVID Egress Untagged + 100 + + sw1p1 1 PVID Egress Untagged + 100 + + sw1p2 1 PVID Egress Untagged + 100 + + sw1p3 1 PVID Egress Untagged + 100 + + sw2p0 1 PVID Egress Untagged + 100 + + sw2p1 1 PVID Egress Untagged + 100 + + sw2p2 1 PVID Egress Untagged + 100 + + sw2p3 1 PVID Egress Untagged + + br0 1 PVID Egress Untagged + 100 + +SJA1105 switch 1 consumes 1 retagging entry for each VLAN on each user port +towards the CPU. It also consumes 1 retagging entry for each non-pvid VLAN that +it is also interested in, which is configured on any port of any neighbor +switch. + +In this case, SJA1105 switch 1 consumes a total of 11 retagging entries, as +follows: + +- 8 retagging entries for VLANs 1 and 100 installed on its user ports + (``sw1p0`` - ``sw1p3``) +- 3 retagging entries for VLAN 100 installed on the user ports of SJA1105 + switch 2 (``sw2p0`` - ``sw2p2``), because it also has ports that are + interested in it. The VLAN 1 is a pvid on SJA1105 switch 2 and does not need + reverse retagging. + +SJA1105 switch 2 also consumes 11 retagging entries, but organized as follows: + +- 7 retagging entries for the bridge VLANs on its user ports (``sw2p0`` - + ``sw2p3``). +- 4 retagging entries for VLAN 100 installed on the user ports of SJA1105 + switch 1 (``sw1p0`` - ``sw1p3``). + +Switching features +================== + +The driver supports the configuration of L2 forwarding rules in hardware for +port bridging. The forwarding, broadcast and flooding domain between ports can +be restricted through two methods: either at the L2 forwarding level (isolate +one bridge's ports from another's) or at the VLAN port membership level +(isolate ports within the same bridge). The final forwarding decision taken by +the hardware is a logical AND of these two sets of rules. + +The hardware tags all traffic internally with a port-based VLAN (pvid), or it +decodes the VLAN information from the 802.1Q tag. Advanced VLAN classification +is not possible. Once attributed a VLAN tag, frames are checked against the +port's membership rules and dropped at ingress if they don't match any VLAN. +This behavior is available when switch ports are enslaved to a bridge with +``vlan_filtering 1``. + +Normally the hardware is not configurable with respect to VLAN awareness, but +by changing what TPID the switch searches 802.1Q tags for, the semantics of a +bridge with ``vlan_filtering 0`` can be kept (accept all traffic, tagged or +untagged), and therefore this mode is also supported. + +Segregating the switch ports in multiple bridges is supported (e.g. 2 + 2), but +all bridges should have the same level of VLAN awareness (either both have +``vlan_filtering`` 0, or both 1). Also an inevitable limitation of the fact +that VLAN awareness is global at the switch level is that once a bridge with +``vlan_filtering`` enslaves at least one switch port, the other un-bridged +ports are no longer available for standalone traffic termination. + +Topology and loop detection through STP is supported. + +L2 FDB manipulation (add/delete/dump) is currently possible for the first +generation devices. Aging time of FDB entries, as well as enabling fully static +management (no address learning and no flooding of unknown traffic) is not yet +configurable in the driver. + +A special comment about bridging with other netdevices (illustrated with an +example): + +A board has eth0, eth1, swp0@eth1, swp1@eth1, swp2@eth1, swp3@eth1. +The switch ports (swp0-3) are under br0. +It is desired that eth0 is turned into another switched port that communicates +with swp0-3. + +If br0 has vlan_filtering 0, then eth0 can simply be added to br0 with the +intended results. +If br0 has vlan_filtering 1, then a new br1 interface needs to be created that +enslaves eth0 and eth1 (the DSA master of the switch ports). This is because in +this mode, the switch ports beneath br0 are not capable of regular traffic, and +are only used as a conduit for switchdev operations. + +Offloads +======== + +Time-aware scheduling +--------------------- + +The switch supports a variation of the enhancements for scheduled traffic +specified in IEEE 802.1Q-2018 (formerly 802.1Qbv). This means it can be used to +ensure deterministic latency for priority traffic that is sent in-band with its +gate-open event in the network schedule. + +This capability can be managed through the tc-taprio offload ('flags 2'). The +difference compared to the software implementation of taprio is that the latter +would only be able to shape traffic originated from the CPU, but not +autonomously forwarded flows. + +The device has 8 traffic classes, and maps incoming frames to one of them based +on the VLAN PCP bits (if no VLAN is present, the port-based default is used). +As described in the previous sections, depending on the value of +``vlan_filtering``, the EtherType recognized by the switch as being VLAN can +either be the typical 0x8100 or a custom value used internally by the driver +for tagging. Therefore, the switch ignores the VLAN PCP if used in standalone +or bridge mode with ``vlan_filtering=0``, as it will not recognize the 0x8100 +EtherType. In these modes, injecting into a particular TX queue can only be +done by the DSA net devices, which populate the PCP field of the tagging header +on egress. Using ``vlan_filtering=1``, the behavior is the other way around: +offloaded flows can be steered to TX queues based on the VLAN PCP, but the DSA +net devices are no longer able to do that. To inject frames into a hardware TX +queue with VLAN awareness active, it is necessary to create a VLAN +sub-interface on the DSA master port, and send normal (0x8100) VLAN-tagged +towards the switch, with the VLAN PCP bits set appropriately. + +Management traffic (having DMAC 01-80-C2-xx-xx-xx or 01-19-1B-xx-xx-xx) is the +notable exception: the switch always treats it with a fixed priority and +disregards any VLAN PCP bits even if present. The traffic class for management +traffic has a value of 7 (highest priority) at the moment, which is not +configurable in the driver. + +Below is an example of configuring a 500 us cyclic schedule on egress port +``swp5``. The traffic class gate for management traffic (7) is open for 100 us, +and the gates for all other traffic classes are open for 400 us:: + + #!/bin/bash + + set -e -u -o pipefail + + NSEC_PER_SEC="1000000000" + + gatemask() { + local tc_list="$1" + local mask=0 + + for tc in ${tc_list}; do + mask=$((${mask} | (1 << ${tc}))) + done + + printf "%02x" ${mask} + } + + if ! systemctl is-active --quiet ptp4l; then + echo "Please start the ptp4l service" + exit + fi + + now=$(phc_ctl /dev/ptp1 get | gawk '/clock time is/ { print $5; }') + # Phase-align the base time to the start of the next second. + sec=$(echo "${now}" | gawk -F. '{ print $1; }') + base_time="$(((${sec} + 1) * ${NSEC_PER_SEC}))" + + tc qdisc add dev swp5 parent root handle 100 taprio \ + num_tc 8 \ + map 0 1 2 3 5 6 7 \ + queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ + base-time ${base_time} \ + sched-entry S $(gatemask 7) 100000 \ + sched-entry S $(gatemask "0 1 2 3 4 5 6") 400000 \ + flags 2 + +It is possible to apply the tc-taprio offload on multiple egress ports. There +are hardware restrictions related to the fact that no gate event may trigger +simultaneously on two ports. The driver checks the consistency of the schedules +against this restriction and errors out when appropriate. Schedule analysis is +needed to avoid this, which is outside the scope of the document. + +Routing actions (redirect, trap, drop) +-------------------------------------- + +The switch is able to offload flow-based redirection of packets to a set of +destination ports specified by the user. Internally, this is implemented by +making use of Virtual Links, a TTEthernet concept. + +The driver supports 2 types of keys for Virtual Links: + +- VLAN-aware virtual links: these match on destination MAC address, VLAN ID and + VLAN PCP. +- VLAN-unaware virtual links: these match on destination MAC address only. + +The VLAN awareness state of the bridge (vlan_filtering) cannot be changed while +there are virtual link rules installed. + +Composing multiple actions inside the same rule is supported. When only routing +actions are requested, the driver creates a "non-critical" virtual link. When +the action list also contains tc-gate (more details below), the virtual link +becomes "time-critical" (draws frame buffers from a reserved memory partition, +etc). + +The 3 routing actions that are supported are "trap", "drop" and "redirect". + +Example 1: send frames received on swp2 with a DA of 42:be:24:9b:76:20 to the +CPU and to swp3. This type of key (DA only) when the port's VLAN awareness +state is off:: + + tc qdisc add dev swp2 clsact + tc filter add dev swp2 ingress flower skip_sw dst_mac 42:be:24:9b:76:20 \ + action mirred egress redirect dev swp3 \ + action trap + +Example 2: drop frames received on swp2 with a DA of 42:be:24:9b:76:20, a VID +of 100 and a PCP of 0:: + + tc filter add dev swp2 ingress protocol 802.1Q flower skip_sw \ + dst_mac 42:be:24:9b:76:20 vlan_id 100 vlan_prio 0 action drop + +Time-based ingress policing +--------------------------- + +The TTEthernet hardware abilities of the switch can be constrained to act +similarly to the Per-Stream Filtering and Policing (PSFP) clause specified in +IEEE 802.1Q-2018 (formerly 802.1Qci). This means it can be used to perform +tight timing-based admission control for up to 1024 flows (identified by a +tuple composed of destination MAC address, VLAN ID and VLAN PCP). Packets which +are received outside their expected reception window are dropped. + +This capability can be managed through the offload of the tc-gate action. As +routing actions are intrinsic to virtual links in TTEthernet (which performs +explicit routing of time-critical traffic and does not leave that in the hands +of the FDB, flooding etc), the tc-gate action may never appear alone when +asking sja1105 to offload it. One (or more) redirect or trap actions must also +follow along. + +Example: create a tc-taprio schedule that is phase-aligned with a tc-gate +schedule (the clocks must be synchronized by a 1588 application stack, which is +outside the scope of this document). No packet delivered by the sender will be +dropped. Note that the reception window is larger than the transmission window +(and much more so, in this example) to compensate for the packet propagation +delay of the link (which can be determined by the 1588 application stack). + +Receiver (sja1105):: + + tc qdisc add dev swp2 clsact + now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \ + sec=$(echo $now | awk -F. '{print $1}') && \ + base_time="$(((sec + 2) * 1000000000))" && \ + echo "base time ${base_time}" + tc filter add dev swp2 ingress flower skip_sw \ + dst_mac 42:be:24:9b:76:20 \ + action gate base-time ${base_time} \ + sched-entry OPEN 60000 -1 -1 \ + sched-entry CLOSE 40000 -1 -1 \ + action trap + +Sender:: + + now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \ + sec=$(echo $now | awk -F. '{print $1}') && \ + base_time="$(((sec + 2) * 1000000000))" && \ + echo "base time ${base_time}" + tc qdisc add dev eno0 parent root taprio \ + num_tc 8 \ + map 0 1 2 3 4 5 6 7 \ + queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ + base-time ${base_time} \ + sched-entry S 01 50000 \ + sched-entry S 00 50000 \ + flags 2 + +The engine used to schedule the ingress gate operations is the same that the +one used for the tc-taprio offload. Therefore, the restrictions regarding the +fact that no two gate actions (either tc-gate or tc-taprio gates) may fire at +the same time (during the same 200 ns slot) still apply. + +To come in handy, it is possible to share time-triggered virtual links across +more than 1 ingress port, via flow blocks. In this case, the restriction of +firing at the same time does not apply because there is a single schedule in +the system, that of the shared virtual link:: + + tc qdisc add dev swp2 ingress_block 1 clsact + tc qdisc add dev swp3 ingress_block 1 clsact + tc filter add block 1 flower skip_sw dst_mac 42:be:24:9b:76:20 \ + action gate index 2 \ + base-time 0 \ + sched-entry OPEN 50000000 -1 -1 \ + sched-entry CLOSE 50000000 -1 -1 \ + action trap + +Hardware statistics for each flow are also available ("pkts" counts the number +of dropped frames, which is a sum of frames dropped due to timing violations, +lack of destination ports and MTU enforcement checks). Byte-level counters are +not available. + +Device Tree bindings and board design +===================================== + +This section references ``Documentation/devicetree/bindings/net/dsa/sja1105.txt`` +and aims to showcase some potential switch caveats. + +RMII PHY role and out-of-band signaling +--------------------------------------- + +In the RMII spec, the 50 MHz clock signals are either driven by the MAC or by +an external oscillator (but not by the PHY). +But the spec is rather loose and devices go outside it in several ways. +Some PHYs go against the spec and may provide an output pin where they source +the 50 MHz clock themselves, in an attempt to be helpful. +On the other hand, the SJA1105 is only binary configurable - when in the RMII +MAC role it will also attempt to drive the clock signal. To prevent this from +happening it must be put in RMII PHY role. +But doing so has some unintended consequences. +In the RMII spec, the PHY can transmit extra out-of-band signals via RXD[1:0]. +These are practically some extra code words (/J/ and /K/) sent prior to the +preamble of each frame. The MAC does not have this out-of-band signaling +mechanism defined by the RMII spec. +So when the SJA1105 port is put in PHY role to avoid having 2 drivers on the +clock signal, inevitably an RMII PHY-to-PHY connection is created. The SJA1105 +emulates a PHY interface fully and generates the /J/ and /K/ symbols prior to +frame preambles, which the real PHY is not expected to understand. So the PHY +simply encodes the extra symbols received from the SJA1105-as-PHY onto the +100Base-Tx wire. +On the other side of the wire, some link partners might discard these extra +symbols, while others might choke on them and discard the entire Ethernet +frames that follow along. This looks like packet loss with some link partners +but not with others. +The take-away is that in RMII mode, the SJA1105 must be let to drive the +reference clock if connected to a PHY. + +RGMII fixed-link and internal delays +------------------------------------ + +As mentioned in the bindings document, the second generation of devices has +tunable delay lines as part of the MAC, which can be used to establish the +correct RGMII timing budget. +When powered up, these can shift the Rx and Tx clocks with a phase difference +between 73.8 and 101.7 degrees. +The catch is that the delay lines need to lock onto a clock signal with a +stable frequency. This means that there must be at least 2 microseconds of +silence between the clock at the old vs at the new frequency. Otherwise the +lock is lost and the delay lines must be reset (powered down and back up). +In RGMII the clock frequency changes with link speed (125 MHz at 1000 Mbps, 25 +MHz at 100 Mbps and 2.5 MHz at 10 Mbps), and link speed might change during the +AN process. +In the situation where the switch port is connected through an RGMII fixed-link +to a link partner whose link state life cycle is outside the control of Linux +(such as a different SoC), then the delay lines would remain unlocked (and +inactive) until there is manual intervention (ifdown/ifup on the switch port). +The take-away is that in RGMII mode, the switch's internal delays are only +reliable if the link partner never changes link speeds, or if it does, it does +so in a way that is coordinated with the switch port (practically, both ends of +the fixed-link are under control of the same Linux system). +As to why would a fixed-link interface ever change link speeds: there are +Ethernet controllers out there which come out of reset in 100 Mbps mode, and +their driver inevitably needs to change the speed and clock frequency if it's +required to work at gigabit. + +MDIO bus and PHY management +--------------------------- + +The SJA1105 does not have an MDIO bus and does not perform in-band AN either. +Therefore there is no link state notification coming from the switch device. +A board would need to hook up the PHYs connected to the switch to any other +MDIO bus available to Linux within the system (e.g. to the DSA master's MDIO +bus). Link state management then works by the driver manually keeping in sync +(over SPI commands) the MAC link speed with the settings negotiated by the PHY. |