summaryrefslogtreecommitdiffstats
path: root/Documentation/networking/bridge.rst
blob: ef8b73e157b268c242cd6ab75e42b1208ac17694 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
.. SPDX-License-Identifier: GPL-2.0

=================
Ethernet Bridging
=================

Introduction
============

The IEEE 802.1Q-2022 (Bridges and Bridged Networks) standard defines the
operation of bridges in computer networks. A bridge, in the context of this
standard, is a device that connects two or more network segments and operates
at the data link layer (Layer 2) of the OSI (Open Systems Interconnection)
model. The purpose of a bridge is to filter and forward frames between
different segments based on the destination MAC (Media Access Control) address.

Bridge kAPI
===========

Here are some core structures of bridge code. Note that the kAPI is *unstable*,
and can be changed at any time.

.. kernel-doc:: net/bridge/br_private.h
   :identifiers: net_bridge_vlan

Bridge uAPI
===========

Modern Linux bridge uAPI is accessed via Netlink interface. You can find
below files where the bridge and bridge port netlink attributes are defined.

Bridge netlink attributes
-------------------------

.. kernel-doc:: include/uapi/linux/if_link.h
   :doc: Bridge enum definition

Bridge port netlink attributes
------------------------------

.. kernel-doc:: include/uapi/linux/if_link.h
   :doc: Bridge port enum definition

Bridge sysfs
------------

The sysfs interface is deprecated and should not be extended if new
options are added.

STP
===

The STP (Spanning Tree Protocol) implementation in the Linux bridge driver
is a critical feature that helps prevent loops and broadcast storms in
Ethernet networks by identifying and disabling redundant links. In a Linux
bridge context, STP is crucial for network stability and availability.

STP is a Layer 2 protocol that operates at the Data Link Layer of the OSI
model. It was originally developed as IEEE 802.1D and has since evolved into
multiple versions, including Rapid Spanning Tree Protocol (RSTP) and
`Multiple Spanning Tree Protocol (MSTP)
<https://lore.kernel.org/netdev/20220316150857.2442916-1-tobias@waldekranz.com/>`_.

The 802.1D-2004 removed the original Spanning Tree Protocol, instead
incorporating the Rapid Spanning Tree Protocol (RSTP). By 2014, all the
functionality defined by IEEE 802.1D has been incorporated into either
IEEE 802.1Q (Bridges and Bridged Networks) or IEEE 802.1AC (MAC Service
Definition). 802.1D has been officially withdrawn in 2022.

Bridge Ports and STP States
---------------------------

In the context of STP, bridge ports can be in one of the following states:
  * Blocking: The port is disabled for data traffic and only listens for
    BPDUs (Bridge Protocol Data Units) from other devices to determine the
    network topology.
  * Listening: The port begins to participate in the STP process and listens
    for BPDUs.
  * Learning: The port continues to listen for BPDUs and begins to learn MAC
    addresses from incoming frames but does not forward data frames.
  * Forwarding: The port is fully operational and forwards both BPDUs and
    data frames.
  * Disabled: The port is administratively disabled and does not participate
    in the STP process. The data frames forwarding are also disabled.

Root Bridge and Convergence
---------------------------

In the context of networking and Ethernet bridging in Linux, the root bridge
is a designated switch in a bridged network that serves as a reference point
for the spanning tree algorithm to create a loop-free topology.

Here's how the STP works and root bridge is chosen:
  1. Bridge Priority: Each bridge running a spanning tree protocol, has a
     configurable Bridge Priority value. The lower the value, the higher the
     priority. By default, the Bridge Priority is set to a standard value
     (e.g., 32768).
  2. Bridge ID: The Bridge ID is composed of two components: Bridge Priority
     and the MAC address of the bridge. It uniquely identifies each bridge
     in the network. The Bridge ID is used to compare the priorities of
     different bridges.
  3. Bridge Election: When the network starts, all bridges initially assume
     that they are the root bridge. They start advertising Bridge Protocol
     Data Units (BPDU) to their neighbors, containing their Bridge ID and
     other information.
  4. BPDU Comparison: Bridges exchange BPDUs to determine the root bridge.
     Each bridge examines the received BPDUs, including the Bridge Priority
     and Bridge ID, to determine if it should adjust its own priorities.
     The bridge with the lowest Bridge ID will become the root bridge.
  5. Root Bridge Announcement: Once the root bridge is determined, it sends
     BPDUs with information about the root bridge to all other bridges in the
     network. This information is used by other bridges to calculate the
     shortest path to the root bridge and, in doing so, create a loop-free
     topology.
  6. Forwarding Ports: After the root bridge is selected and the spanning tree
     topology is established, each bridge determines which of its ports should
     be in the forwarding state (used for data traffic) and which should be in
     the blocking state (used to prevent loops). The root bridge's ports are
     all in the forwarding state. while other bridges have some ports in the
     blocking state to avoid loops.
  7. Root Ports: After the root bridge is selected and the spanning tree
     topology is established, each non-root bridge processes incoming
     BPDUs and determines which of its ports provides the shortest path to the
     root bridge based on the information in the received BPDUs. This port is
     designated as the root port. And it is in the Forwarding state, allowing
     it to actively forward network traffic.
  8. Designated ports: A designated port is the port through which the non-root
     bridge will forward traffic towards the designated segment. Designated ports
     are placed in the Forwarding state. All other ports on the non-root
     bridge that are not designated for specific segments are placed in the
     Blocking state to prevent network loops.

STP ensures network convergence by calculating the shortest path and disabling
redundant links. When network topology changes occur (e.g., a link failure),
STP recalculates the network topology to restore connectivity while avoiding loops.

Proper configuration of STP parameters, such as the bridge priority, can
influence network performance, path selection and which bridge becomes the
Root Bridge.

User space STP helper
---------------------

The user space STP helper *bridge-stp* is a program to control whether to use
user mode spanning tree. The ``/sbin/bridge-stp <bridge> <start|stop>`` is
called by the kernel when STP is enabled/disabled on a bridge
(via ``brctl stp <bridge> <on|off>`` or ``ip link set <bridge> type bridge
stp_state <0|1>``).  The kernel enables user_stp mode if that command returns
0, or enables kernel_stp mode if that command returns any other value.

VLAN
====

A LAN (Local Area Network) is a network that covers a small geographic area,
typically within a single building or a campus. LANs are used to connect
computers, servers, printers, and other networked devices within a localized
area. LANs can be wired (using Ethernet cables) or wireless (using Wi-Fi).

A VLAN (Virtual Local Area Network) is a logical segmentation of a physical
network into multiple isolated broadcast domains. VLANs are used to divide
a single physical LAN into multiple virtual LANs, allowing different groups of
devices to communicate as if they were on separate physical networks.

Typically there are two VLAN implementations, IEEE 802.1Q and IEEE 802.1ad
(also known as QinQ). IEEE 802.1Q is a standard for VLAN tagging in Ethernet
networks. It allows network administrators to create logical VLANs on a
physical network and tag Ethernet frames with VLAN information, which is
called *VLAN-tagged frames*. IEEE 802.1ad, commonly known as QinQ or Double
VLAN, is an extension of the IEEE 802.1Q standard. QinQ allows for the
stacking of multiple VLAN tags within a single Ethernet frame. The Linux
bridge supports both the IEEE 802.1Q and `802.1AD
<https://lore.kernel.org/netdev/1402401565-15423-1-git-send-email-makita.toshiaki@lab.ntt.co.jp/>`_
protocol for VLAN tagging.

`VLAN filtering <https://lore.kernel.org/netdev/1360792820-14116-1-git-send-email-vyasevic@redhat.com/>`_
on a bridge is disabled by default. After enabling VLAN filtering on a bridge,
it will start forwarding frames to appropriate destinations based on their
destination MAC address and VLAN tag (both must match).

Multicast
=========

The Linux bridge driver has multicast support allowing it to process Internet
Group Management Protocol (IGMP) or Multicast Listener Discovery (MLD)
messages, and to efficiently forward multicast data packets. The bridge
driver supports IGMPv2/IGMPv3 and MLDv1/MLDv2.

Multicast snooping
------------------

Multicast snooping is a networking technology that allows network switches
to intelligently manage multicast traffic within a local area network (LAN).

The switch maintains a multicast group table, which records the association
between multicast group addresses and the ports where hosts have joined these
groups. The group table is dynamically updated based on the IGMP/MLD messages
received. With the multicast group information gathered through snooping, the
switch optimizes the forwarding of multicast traffic. Instead of blindly
broadcasting the multicast traffic to all ports, it sends the multicast
traffic based on the destination MAC address only to ports which have
subscribed the respective destination multicast group.

When created, the Linux bridge devices have multicast snooping enabled by
default. It maintains a Multicast forwarding database (MDB) which keeps track
of port and group relationships.

IGMPv3/MLDv2 EHT support
------------------------

The Linux bridge supports IGMPv3/MLDv2 EHT (Explicit Host Tracking), which
was added by `474ddb37fa3a ("net: bridge: multicast: add EHT allow/block handling")
<https://lore.kernel.org/netdev/20210120145203.1109140-1-razor@blackwall.org/>`_

The explicit host tracking enables the device to keep track of each
individual host that is joined to a particular group or channel. The main
benefit of the explicit host tracking in IGMP is to allow minimal leave
latencies when a host leaves a multicast group or channel.

The length of time between a host wanting to leave and a device stopping
traffic forwarding is called the IGMP leave latency. A device configured
with IGMPv3 or MLDv2 and explicit tracking can immediately stop forwarding
traffic if the last host to request to receive traffic from the device
indicates that it no longer wants to receive traffic. The leave latency
is thus bound only by the packet transmission latencies in the multiaccess
network and the processing time in the device.

Other multicast features
------------------------

The Linux bridge also supports `per-VLAN multicast snooping
<https://lore.kernel.org/netdev/20210719170637.435541-1-razor@blackwall.org/>`_,
which is disabled by default but can be enabled. And `Multicast Router Discovery
<https://lore.kernel.org/netdev/20190121062628.2710-1-linus.luessing@c0d3.blue/>`_,
which help identify the location of multicast routers.

Switchdev
=========

Linux Bridge Switchdev is a feature in the Linux kernel that extends the
capabilities of the traditional Linux bridge to work more efficiently with
hardware switches that support switchdev. With Linux Bridge Switchdev, certain
networking functions like forwarding, filtering, and learning of Ethernet
frames can be offloaded to a hardware switch. This offloading reduces the
burden on the Linux kernel and CPU, leading to improved network performance
and lower latency.

To use Linux Bridge Switchdev, you need hardware switches that support the
switchdev interface. This means that the switch hardware needs to have the
necessary drivers and functionality to work in conjunction with the Linux
kernel.

Please see the :ref:`switchdev` document for more details.

Netfilter
=========

The bridge netfilter module is a legacy feature that allows to filter bridged
packets with iptables and ip6tables. Its use is discouraged. Users should
consider using nftables for packet filtering.

The older ebtables tool is more feature-limited compared to nftables, but
just like nftables it doesn't need this module either to function.

The br_netfilter module intercepts packets entering the bridge, performs
minimal sanity tests on ipv4 and ipv6 packets and then pretends that
these packets are being routed, not bridged. br_netfilter then calls
the ip and ipv6 netfilter hooks from the bridge layer, i.e. ip(6)tables
rulesets will also see these packets.

br_netfilter is also the reason for the iptables *physdev* match:
This match is the only way to reliably tell routed and bridged packets
apart in an iptables ruleset.

Note that ebtables and nftables will work fine without the br_netfilter module.
iptables/ip6tables/arptables do not work for bridged traffic because they
plug in the routing stack. nftables rules in ip/ip6/inet/arp families won't
see traffic that is forwarded by a bridge either, but that's very much how it
should be.

Historically the feature set of ebtables was very limited (it still is),
this module was added to pretend packets are routed and invoke the ipv4/ipv6
netfilter hooks from the bridge so users had access to the more feature-rich
iptables matching capabilities (including conntrack). nftables doesn't have
this limitation, pretty much all features work regardless of the protocol family.

So, br_netfilter is only needed if users, for some reason, need to use
ip(6)tables to filter packets forwarded by the bridge, or NAT bridged
traffic. For pure link layer filtering, this module isn't needed.

Other Features
==============

The Linux bridge also supports `IEEE 802.11 Proxy ARP
<https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=958501163ddd6ea22a98f94fa0e7ce6d4734e5c4>`_,
`Media Redundancy Protocol (MRP)
<https://lore.kernel.org/netdev/20200426132208.3232-1-horatiu.vultur@microchip.com/>`_,
`Media Redundancy Protocol (MRP) LC mode
<https://lore.kernel.org/r/20201124082525.273820-1-horatiu.vultur@microchip.com>`_,
`IEEE 802.1X port authentication
<https://lore.kernel.org/netdev/20220218155148.2329797-1-schultz.hans+netdev@gmail.com/>`_,
and `MAC Authentication Bypass (MAB)
<https://lore.kernel.org/netdev/20221101193922.2125323-2-idosch@nvidia.com/>`_.

FAQ
===

What does a bridge do?
----------------------

A bridge transparently forwards traffic between multiple network interfaces.
In plain English this means that a bridge connects two or more physical
Ethernet networks, to form one larger (logical) Ethernet network.

Is it L3 protocol independent?
------------------------------

Yes. The bridge sees all frames, but it *uses* only L2 headers/information.
As such, the bridging functionality is protocol independent, and there should
be no trouble forwarding IPX, NetBEUI, IP, IPv6, etc.

Contact Info
============

The code is currently maintained by Roopa Prabhu <roopa@nvidia.com> and
Nikolay Aleksandrov <razor@blackwall.org>. Bridge bugs and enhancements
are discussed on the linux-netdev mailing list netdev@vger.kernel.org and
bridge@lists.linux.dev.

The list is open to anyone interested: http://vger.kernel.org/vger-lists.html#netdev

External Links
==============

The old Documentation for Linux bridging is on:
https://wiki.linuxfoundation.org/networking/bridge