summaryrefslogtreecommitdiffstats
path: root/Documentation/networking
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/networking')
-rw-r--r--Documentation/networking/af_xdp.rst33
-rw-r--r--Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst11
-rw-r--r--Documentation/networking/devlink/devlink-info.rst5
-rw-r--r--Documentation/networking/devlink/devlink-port.rst33
-rw-r--r--Documentation/networking/devlink/devlink-region.rst2
-rw-r--r--Documentation/networking/devlink/hns3.rst5
-rw-r--r--Documentation/networking/devlink/ice.rst47
-rw-r--r--Documentation/networking/devlink/nfp.rst5
-rw-r--r--Documentation/networking/dns_resolver.rst4
-rw-r--r--Documentation/networking/ethtool-netlink.rst30
-rw-r--r--Documentation/networking/filter.rst4
-rw-r--r--Documentation/networking/index.rst1
-rw-r--r--Documentation/networking/nf_conntrack-sysctl.rst4
-rw-r--r--Documentation/networking/pse-pd/index.rst10
-rw-r--r--Documentation/networking/pse-pd/introduction.rst73
-rw-r--r--Documentation/networking/pse-pd/pse-pi.rst301
-rw-r--r--Documentation/networking/xfrm_proc.rst6
-rw-r--r--Documentation/networking/xsk-tx-metadata.rst16
18 files changed, 557 insertions, 33 deletions
diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
index 72da7057e4..dceeb0d763 100644
--- a/Documentation/networking/af_xdp.rst
+++ b/Documentation/networking/af_xdp.rst
@@ -329,24 +329,23 @@ XDP_SHARED_UMEM option and provide the initial socket's fd in the
sxdp_shared_umem_fd field as you registered the UMEM on that
socket. These two sockets will now share one and the same UMEM.
-In this case, it is possible to use the NIC's packet steering
-capabilities to steer the packets to the right queue. This is not
-possible in the previous example as there is only one queue shared
-among sockets, so the NIC cannot do this steering as it can only steer
-between queues.
-
-In libxdp (or libbpf prior to version 1.0), you need to use the
-xsk_socket__create_shared() API as it takes a reference to a FILL ring
-and a COMPLETION ring that will be created for you and bound to the
-shared UMEM. You can use this function for all the sockets you create,
-or you can use it for the second and following ones and use
-xsk_socket__create() for the first one. Both methods yield the same
-result.
+There is no need to supply an XDP program like the one in the previous
+case where sockets were bound to the same queue id and
+device. Instead, use the NIC's packet steering capabilities to steer
+the packets to the right queue. In the previous example, there is only
+one queue shared among sockets, so the NIC cannot do this steering. It
+can only steer between queues.
+
+In libbpf, you need to use the xsk_socket__create_shared() API as it
+takes a reference to a FILL ring and a COMPLETION ring that will be
+created for you and bound to the shared UMEM. You can use this
+function for all the sockets you create, or you can use it for the
+second and following ones and use xsk_socket__create() for the first
+one. Both methods yield the same result.
Note that a UMEM can be shared between sockets on the same queue id
and device, as well as between queues on the same device and between
-devices at the same time. It is also possible to redirect to any
-socket as long as it is bound to the same umem with XDP_SHARED_UMEM.
+devices at the same time.
XDP_USE_NEED_WAKEUP bind flag
-----------------------------
@@ -823,10 +822,6 @@ A: The short answer is no, that is not supported at the moment. The
switch, or other distribution mechanism, in your NIC to direct
traffic to the correct queue id and socket.
- Note that if you are using the XDP_SHARED_UMEM option, it is
- possible to switch traffic between any socket bound to the same
- umem.
-
Q: My packets are sometimes corrupted. What is wrong?
A: Care has to be taken not to feed the same buffer in the UMEM into
diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
index f69ee1ebee..fed821ef9b 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
@@ -300,6 +300,11 @@ the software port.
in the beginning of the queue. This is a normal condition.
- Informative
+ * - `tx[i]_timestamps`
+ - Transmitted packets that were hardware timestamped at the device's DMA
+ layer.
+ - Informative
+
* - `tx[i]_added_vlan_packets`
- The number of packets sent where vlan tag insertion was offloaded to the
hardware.
@@ -702,6 +707,12 @@ the software port.
the device typically ensures not posting the CQE.
- Error
+ * - `ptp_cq[i]_lost_cqe`
+ - Number of times a CQE is expected to not be delivered on the PTP
+ timestamping CQE by the device due to a time delta elapsing. If such a
+ CQE is somehow delivered, `ptp_cq[i]_late_cqe` is incremented.
+ - Error
+
.. [#ring_global] The corresponding ring and global counters do not share the
same name (i.e. do not follow the common naming scheme).
diff --git a/Documentation/networking/devlink/devlink-info.rst b/Documentation/networking/devlink/devlink-info.rst
index 1242b0e682..23073bc219 100644
--- a/Documentation/networking/devlink/devlink-info.rst
+++ b/Documentation/networking/devlink/devlink-info.rst
@@ -146,6 +146,11 @@ board.manufacture
An identifier of the company or the facility which produced the part.
+board.part_number
+-----------------
+
+Part number of the board and its components.
+
fw
--
diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentation/networking/devlink/devlink-port.rst
index 562f46b412..9d22d41a7c 100644
--- a/Documentation/networking/devlink/devlink-port.rst
+++ b/Documentation/networking/devlink/devlink-port.rst
@@ -134,6 +134,9 @@ Users may also set the IPsec crypto capability of the function using
Users may also set the IPsec packet capability of the function using
`devlink port function set ipsec_packet` command.
+Users may also set the maximum IO event queues of the function
+using `devlink port function set max_io_eqs` command.
+
Function attributes
===================
@@ -295,6 +298,36 @@ policy is processed in software by the kernel.
function:
hw_addr 00:00:00:00:00:00 ipsec_packet enabled
+Maximum IO events queues setup
+------------------------------
+When user sets maximum number of IO event queues for a SF or
+a VF, such function driver is limited to consume only enforced
+number of IO event queues.
+
+IO event queues deliver events related to IO queues, including network
+device transmit and receive queues (txq and rxq) and RDMA Queue Pairs (QPs).
+For example, the number of netdevice channels and RDMA device completion
+vectors are derived from the function's IO event queues. Usually, the number
+of interrupt vectors consumed by the driver is limited by the number of IO
+event queues per device, as each of the IO event queues is connected to an
+interrupt vector.
+
+- Get maximum IO event queues of the VF device::
+
+ $ devlink port show pci/0000:06:00.0/2
+ pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
+ function:
+ hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 10
+
+- Set maximum IO event queues of the VF device::
+
+ $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32
+
+ $ devlink port show pci/0000:06:00.0/2
+ pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
+ function:
+ hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 32
+
Subfunction
============
diff --git a/Documentation/networking/devlink/devlink-region.rst b/Documentation/networking/devlink/devlink-region.rst
index 9232cd7da3..5d0b68f752 100644
--- a/Documentation/networking/devlink/devlink-region.rst
+++ b/Documentation/networking/devlink/devlink-region.rst
@@ -49,7 +49,7 @@ example usage
$ devlink region show [ DEV/REGION ]
$ devlink region del DEV/REGION snapshot SNAPSHOT_ID
$ devlink region dump DEV/REGION [ snapshot SNAPSHOT_ID ]
- $ devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ] address ADDRESS length length
+ $ devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ] address ADDRESS length LENGTH
# Show all of the exposed regions with region sizes:
$ devlink region show
diff --git a/Documentation/networking/devlink/hns3.rst b/Documentation/networking/devlink/hns3.rst
index 4562a6e478..72bc1b9f37 100644
--- a/Documentation/networking/devlink/hns3.rst
+++ b/Documentation/networking/devlink/hns3.rst
@@ -23,3 +23,8 @@ The ``hns3`` driver reports the following versions
* - ``fw``
- running
- Used to represent the firmware version.
+ * - ``fw.scc``
+ - running
+ - Used to represent the Soft Congestion Control (SSC) firmware version.
+ SCC is a firmware component which provides multiple RDMA congestion
+ control algorithms, including DCQCN.
diff --git a/Documentation/networking/devlink/ice.rst b/Documentation/networking/devlink/ice.rst
index 7f30ebd5de..830c043542 100644
--- a/Documentation/networking/devlink/ice.rst
+++ b/Documentation/networking/devlink/ice.rst
@@ -21,6 +21,53 @@ Parameters
* - ``enable_iwarp``
- runtime
- mutually exclusive with ``enable_roce``
+ * - ``tx_scheduling_layers``
+ - permanent
+ - The ice hardware uses hierarchical scheduling for Tx with a fixed
+ number of layers in the scheduling tree. Each of them are decision
+ points. Root node represents a port, while all the leaves represent
+ the queues. This way of configuring the Tx scheduler allows features
+ like DCB or devlink-rate (documented below) to configure how much
+ bandwidth is given to any given queue or group of queues, enabling
+ fine-grained control because scheduling parameters can be configured
+ at any given layer of the tree.
+
+ The default 9-layer tree topology was deemed best for most workloads,
+ as it gives an optimal ratio of performance to configurability. However,
+ for some specific cases, this 9-layer topology might not be desired.
+ One example would be sending traffic to queues that are not a multiple
+ of 8. Because the maximum radix is limited to 8 in 9-layer topology,
+ the 9th queue has a different parent than the rest, and it's given
+ more bandwidth credits. This causes a problem when the system is
+ sending traffic to 9 queues:
+
+ | tx_queue_0_packets: 24163396
+ | tx_queue_1_packets: 24164623
+ | tx_queue_2_packets: 24163188
+ | tx_queue_3_packets: 24163701
+ | tx_queue_4_packets: 24163683
+ | tx_queue_5_packets: 24164668
+ | tx_queue_6_packets: 23327200
+ | tx_queue_7_packets: 24163853
+ | tx_queue_8_packets: 91101417 < Too much traffic is sent from 9th
+
+ To address this need, you can switch to a 5-layer topology, which
+ changes the maximum topology radix to 512. With this enhancement,
+ the performance characteristic is equal as all queues can be assigned
+ to the same parent in the tree. The obvious drawback of this solution
+ is a lower configuration depth of the tree.
+
+ Use the ``tx_scheduling_layer`` parameter with the devlink command
+ to change the transmit scheduler topology. To use 5-layer topology,
+ use a value of 5. For example:
+ $ devlink dev param set pci/0000:16:00.0 name tx_scheduling_layers
+ value 5 cmode permanent
+ Use a value of 9 to set it back to the default value.
+
+ You must do PCI slot powercycle for the selected topology to take effect.
+
+ To verify that value has been set:
+ $ devlink dev param show pci/0000:16:00.0 name tx_scheduling_layers
Info versions
=============
diff --git a/Documentation/networking/devlink/nfp.rst b/Documentation/networking/devlink/nfp.rst
index a1717db0df..3093642bda 100644
--- a/Documentation/networking/devlink/nfp.rst
+++ b/Documentation/networking/devlink/nfp.rst
@@ -32,7 +32,7 @@ The ``nfp`` driver reports the following versions
- Description
* - ``board.id``
- fixed
- - Part number identifying the board design
+ - Identifier of the board design
* - ``board.rev``
- fixed
- Revision of the board design
@@ -42,6 +42,9 @@ The ``nfp`` driver reports the following versions
* - ``board.model``
- fixed
- Model name of the board design
+ * - ``board.part_number``
+ - fixed
+ - Part number of the board and its components
* - ``fw.bundle_id``
- stored, running
- Firmware bundle id
diff --git a/Documentation/networking/dns_resolver.rst b/Documentation/networking/dns_resolver.rst
index add4d59a99..c0364f7070 100644
--- a/Documentation/networking/dns_resolver.rst
+++ b/Documentation/networking/dns_resolver.rst
@@ -118,7 +118,7 @@ Keys of dns_resolver type can be read from userspace using keyctl_read() or
Mechanism
=========
-The dnsresolver module registers a key type called "dns_resolver". Keys of
+The dns_resolver module registers a key type called "dns_resolver". Keys of
this type are used to transport and cache DNS lookup results from userspace.
When dns_query() is invoked, it calls request_key() to search the local
@@ -152,4 +152,4 @@ Debugging
Debugging messages can be turned on dynamically by writing a 1 into the
following file::
- /sys/module/dnsresolver/parameters/debug
+ /sys/module/dns_resolver/parameters/debug
diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index d583d9abf2..0d8c487be3 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -1237,12 +1237,21 @@ Kernel response contents:
``ETHTOOL_A_TSINFO_TX_TYPES`` bitset supported Tx types
``ETHTOOL_A_TSINFO_RX_FILTERS`` bitset supported Rx filters
``ETHTOOL_A_TSINFO_PHC_INDEX`` u32 PTP hw clock index
+ ``ETHTOOL_A_TSINFO_STATS`` nested HW timestamping statistics
===================================== ====== ==========================
``ETHTOOL_A_TSINFO_PHC_INDEX`` is absent if there is no associated PHC (there
is no special value for this case). The bitset attributes are omitted if they
would be empty (no bit set).
+Additional hardware timestamping statistics response contents:
+
+ ===================================== ====== ===================================
+ ``ETHTOOL_A_TS_STAT_TX_PKTS`` uint Packets with Tx HW timestamps
+ ``ETHTOOL_A_TS_STAT_TX_LOST`` uint Tx HW timestamp not arrived count
+ ``ETHTOOL_A_TS_STAT_TX_ERR`` uint HW error request Tx timestamp count
+ ===================================== ====== ===================================
+
CABLE_TEST
==========
@@ -1717,6 +1726,10 @@ Kernel response contents:
PSE functions
``ETHTOOL_A_PODL_PSE_PW_D_STATUS`` u32 power detection status of the
PoDL PSE.
+ ``ETHTOOL_A_C33_PSE_ADMIN_STATE`` u32 Operational state of the PoE
+ PSE functions.
+ ``ETHTOOL_A_C33_PSE_PW_D_STATUS`` u32 power detection status of the
+ PoE PSE.
====================================== ====== =============================
When set, the optional ``ETHTOOL_A_PODL_PSE_ADMIN_STATE`` attribute identifies
@@ -1728,6 +1741,12 @@ aPoDLPSEAdminState. Possible values are:
.. kernel-doc:: include/uapi/linux/ethtool.h
:identifiers: ethtool_podl_pse_admin_state
+The same goes for ``ETHTOOL_A_C33_PSE_ADMIN_STATE`` implementing
+``IEEE 802.3-2022`` 30.9.1.1.2 aPSEAdminState.
+
+.. kernel-doc:: include/uapi/linux/ethtool.h
+ :identifiers: ethtool_c33_pse_admin_state
+
When set, the optional ``ETHTOOL_A_PODL_PSE_PW_D_STATUS`` attribute identifies
the power detection status of the PoDL PSE. The status depend on internal PSE
state machine and automatic PD classification support. This option is
@@ -1737,6 +1756,12 @@ Possible values are:
.. kernel-doc:: include/uapi/linux/ethtool.h
:identifiers: ethtool_podl_pse_pw_d_status
+The same goes for ``ETHTOOL_A_C33_PSE_ADMIN_PW_D_STATUS`` implementing
+``IEEE 802.3-2022`` 30.9.1.1.5 aPSEPowerDetectionStatus.
+
+.. kernel-doc:: include/uapi/linux/ethtool.h
+ :identifiers: ethtool_c33_pse_pw_d_status
+
PSE_SET
=======
@@ -1747,6 +1772,7 @@ Request contents:
====================================== ====== =============================
``ETHTOOL_A_PSE_HEADER`` nested request header
``ETHTOOL_A_PODL_PSE_ADMIN_CONTROL`` u32 Control PoDL PSE Admin state
+ ``ETHTOOL_A_C33_PSE_ADMIN_CONTROL`` u32 Control PSE Admin state
====================================== ====== =============================
When set, the optional ``ETHTOOL_A_PODL_PSE_ADMIN_CONTROL`` attribute is used
@@ -1754,6 +1780,9 @@ to control PoDL PSE Admin functions. This option is implementing
``IEEE 802.3-2018`` 30.15.1.2.1 acPoDLPSEAdminControl. See
``ETHTOOL_A_PODL_PSE_ADMIN_STATE`` for supported values.
+The same goes for ``ETHTOOL_A_C33_PSE_ADMIN_CONTROL`` implementing
+``IEEE 802.3-2022`` 30.9.1.2.1 acPSEAdminControl.
+
RSS_GET
=======
@@ -1771,6 +1800,7 @@ Kernel response contents:
===================================== ====== ==========================
``ETHTOOL_A_RSS_HEADER`` nested reply header
+ ``ETHTOOL_A_RSS_CONTEXT`` u32 context number
``ETHTOOL_A_RSS_HFUNC`` u32 RSS hash func
``ETHTOOL_A_RSS_INDIR`` binary Indir table bytes
``ETHTOOL_A_RSS_HKEY`` binary Hash key bytes
diff --git a/Documentation/networking/filter.rst b/Documentation/networking/filter.rst
index 7d8c538049..8eb9a5d40f 100644
--- a/Documentation/networking/filter.rst
+++ b/Documentation/networking/filter.rst
@@ -513,7 +513,7 @@ JIT compiler
------------
The Linux kernel has a built-in BPF JIT compiler for x86_64, SPARC,
-PowerPC, ARM, ARM64, MIPS, RISC-V and s390 and can be enabled through
+PowerPC, ARM, ARM64, MIPS, RISC-V, s390, and ARC and can be enabled through
CONFIG_BPF_JIT. The JIT compiler is transparently invoked for each
attached filter from user space or for internal kernel users if it has
been previously enabled by root::
@@ -650,7 +650,7 @@ before a conversion to the new layout is being done behind the scenes!
Currently, the classic BPF format is being used for JITing on most
32-bit architectures, whereas x86-64, aarch64, s390x, powerpc64,
-sparc64, arm32, riscv64, riscv32, loongarch64 perform JIT compilation
+sparc64, arm32, riscv64, riscv32, loongarch64, arc perform JIT compilation
from eBPF instruction set.
Testing
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index 473d72c36d..7664c0bfe4 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -93,6 +93,7 @@ Contents:
plip
ppp_generic
proc_net_tcp
+ pse-pd/index
radiotap-headers
rds
regulatory
diff --git a/Documentation/networking/nf_conntrack-sysctl.rst b/Documentation/networking/nf_conntrack-sysctl.rst
index c383a394c6..238b66d0e0 100644
--- a/Documentation/networking/nf_conntrack-sysctl.rst
+++ b/Documentation/networking/nf_conntrack-sysctl.rst
@@ -222,11 +222,11 @@ nf_flowtable_tcp_timeout - INTEGER (seconds)
Control offload timeout for tcp connections.
TCP connections may be offloaded from nf conntrack to nf flow table.
- Once aged, the connection is returned to nf conntrack with tcp pickup timeout.
+ Once aged, the connection is returned to nf conntrack.
nf_flowtable_udp_timeout - INTEGER (seconds)
default 30
Control offload timeout for udp connections.
UDP connections may be offloaded from nf conntrack to nf flow table.
- Once aged, the connection is returned to nf conntrack with udp pickup timeout.
+ Once aged, the connection is returned to nf conntrack.
diff --git a/Documentation/networking/pse-pd/index.rst b/Documentation/networking/pse-pd/index.rst
new file mode 100644
index 0000000000..de28a5aee3
--- /dev/null
+++ b/Documentation/networking/pse-pd/index.rst
@@ -0,0 +1,10 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Power Sourcing Equipment (PSE) Documentation
+============================================
+
+.. toctree::
+ :maxdepth: 2
+
+ introduction
+ pse-pi
diff --git a/Documentation/networking/pse-pd/introduction.rst b/Documentation/networking/pse-pd/introduction.rst
new file mode 100644
index 0000000000..e3d3faaef7
--- /dev/null
+++ b/Documentation/networking/pse-pd/introduction.rst
@@ -0,0 +1,73 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Power Sourcing Equipment (PSE) in IEEE 802.3 Standard
+=====================================================
+
+Overview
+--------
+
+Power Sourcing Equipment (PSE) is essential in networks for delivering power
+along with data over Ethernet cables. It usually refers to devices like
+switches and hubs that supply power to Powered Devices (PDs) such as IP
+cameras, VoIP phones, and wireless access points.
+
+PSE vs. PoDL PSE
+----------------
+
+PSE in the IEEE 802.3 standard generally refers to equipment that provides
+power alongside data over Ethernet cables, typically associated with Power over
+Ethernet (PoE).
+
+PoDL PSE, or Power over Data Lines PSE, specifically denotes PSEs operating
+with single balanced twisted-pair PHYs, as per Clause 104 of IEEE 802.3. PoDL
+is significant in contexts like automotive and industrial controls where power
+and data delivery over a single pair is advantageous.
+
+IEEE 802.3-2018 Addendums and Related Clauses
+---------------------------------------------
+
+Key addenda to the IEEE 802.3-2018 standard relevant to power delivery over
+Ethernet are as follows:
+
+- **802.3af (Approved in 2003-06-12)**: Known as PoE in the market, detailed in
+ Clause 33, delivering up to 15.4W of power.
+- **802.3at (Approved in 2009-09-11)**: Marketed as PoE+, enhancing PoE as
+ covered in Clause 33, increasing power delivery to up to 30W.
+- **802.3bt (Approved in 2018-09-27)**: Known as 4PPoE in the market, outlined
+ in Clause 33. Type 3 delivers up to 60W, and Type 4 up to 100W.
+- **802.3bu (Approved in 2016-12-07)**: Formerly referred to as PoDL, detailed
+ in Clause 104. Introduces Classes 0 - 9. Class 9 PoDL PSE delivers up to ~65W
+
+Kernel Naming Convention Recommendations
+----------------------------------------
+
+For clarity and consistency within the Linux kernel's networking subsystem, the
+following naming conventions are recommended:
+
+- For general PSE (PoE) code, use "c33_pse" key words. For example:
+ ``enum ethtool_c33_pse_admin_state c33_admin_control;``.
+ This aligns with Clause 33, encompassing various PoE forms.
+
+- For PoDL PSE - specific code, use "podl_pse". For example:
+ ``enum ethtool_podl_pse_admin_state podl_admin_control;`` to differentiate
+ PoDL PSE settings according to Clause 104.
+
+Summary of Clause 33: Data Terminal Equipment (DTE) Power via Media Dependent Interface (MDI)
+---------------------------------------------------------------------------------------------
+
+Clause 33 of the IEEE 802.3 standard defines the functional and electrical
+characteristics of Powered Device (PD) and Power Sourcing Equipment (PSE).
+These entities enable power delivery using the same generic cabling as for data
+transmission, integrating power with data communication for devices such as
+10BASE-T, 100BASE-TX, or 1000BASE-T.
+
+Summary of Clause 104: Power over Data Lines (PoDL) of Single Balanced Twisted-Pair Ethernet
+--------------------------------------------------------------------------------------------
+
+Clause 104 of the IEEE 802.3 standard delineates the functional and electrical
+characteristics of PoDL Powered Devices (PDs) and PoDL Power Sourcing Equipment
+(PSEs). These are designed for use with single balanced twisted-pair Ethernet
+Physical Layers. In this clause, 'PSE' refers specifically to PoDL PSE, and
+'PD' to PoDL PD. The key intent is to provide devices with a unified interface
+for both data and the power required to process this data over a single
+balanced twisted-pair Ethernet connection.
diff --git a/Documentation/networking/pse-pd/pse-pi.rst b/Documentation/networking/pse-pd/pse-pi.rst
new file mode 100644
index 0000000000..5cad14fedc
--- /dev/null
+++ b/Documentation/networking/pse-pd/pse-pi.rst
@@ -0,0 +1,301 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+PSE Power Interface (PSE PI) Documentation
+==========================================
+
+The Power Sourcing Equipment Power Interface (PSE PI) plays a pivotal role in
+the architecture of Power over Ethernet (PoE) systems. It is essentially a
+blueprint that outlines how one or multiple power sources are connected to the
+eight-pin modular jack, commonly known as the Ethernet RJ45 port. This
+connection scheme is crucial for enabling the delivery of power alongside data
+over Ethernet cables.
+
+Documentation and Standards
+---------------------------
+
+The IEEE 802.3 standard provides detailed documentation on the PSE PI.
+Specifically:
+
+- Section "33.2.3 PI pin assignments" covers the pin assignments for PoE
+ systems that utilize two pairs for power delivery.
+- Section "145.2.4 PSE PI" addresses the configuration for PoE systems that
+ deliver power over all four pairs of an Ethernet cable.
+
+PSE PI and Single Pair Ethernet
+-------------------------------
+
+Single Pair Ethernet (SPE) represents a different approach to Ethernet
+connectivity, utilizing just one pair of conductors for both data and power
+transmission. Unlike the configurations detailed in the PSE PI for standard
+Ethernet, which can involve multiple power sourcing arrangements across four or
+two pairs of wires, SPE operates on a simpler model due to its single-pair
+design. As a result, the complexities of choosing between alternative pin
+assignments for power delivery, as described in the PSE PI for multi-pair
+Ethernet, are not applicable to SPE.
+
+Understanding PSE PI
+--------------------
+
+The Power Sourcing Equipment Power Interface (PSE PI) is a framework defining
+how Power Sourcing Equipment (PSE) delivers power to Powered Devices (PDs) over
+Ethernet cables. It details two main configurations for power delivery, known
+as Alternative A and Alternative B, which are distinguished not only by their
+method of power transmission but also by the implications for polarity and data
+transmission direction.
+
+Alternative A and B Overview
+----------------------------
+
+- **Alternative A:** Utilizes RJ45 conductors 1, 2, 3 and 6. In either case of
+ networks 10/100BaseT or 1G/2G/5G/10GBaseT, the pairs used are carrying data.
+ The power delivery's polarity in this alternative can vary based on the MDI
+ (Medium Dependent Interface) or MDI-X (Medium Dependent Interface Crossover)
+ configuration.
+
+- **Alternative B:** Utilizes RJ45 conductors 4, 5, 7 and 8. In case of
+ 10/100BaseT network the pairs used are spare pairs without data and are less
+ influenced by data transmission direction. This is not the case for
+ 1G/2G/5G/10GBaseT network. Alternative B includes two configurations with
+ different polarities, known as variant X and variant S, to accommodate
+ different network requirements and device specifications.
+
+Table 145-3 PSE Pinout Alternatives
+-----------------------------------
+
+The following table outlines the pin configurations for both Alternative A and
+Alternative B.
+
++------------+-------------------+-----------------+-----------------+-----------------+
+| Conductor | Alternative A | Alternative A | Alternative B | Alternative B |
+| | (MDI-X) | (MDI) | (X) | (S) |
++============+===================+=================+=================+=================+
+| 1 | Negative V | Positive V | - | - |
++------------+-------------------+-----------------+-----------------+-----------------+
+| 2 | Negative V | Positive V | - | - |
++------------+-------------------+-----------------+-----------------+-----------------+
+| 3 | Positive V | Negative V | - | - |
++------------+-------------------+-----------------+-----------------+-----------------+
+| 4 | - | - | Negative V | Positive V |
++------------+-------------------+-----------------+-----------------+-----------------+
+| 5 | - | - | Negative V | Positive V |
++------------+-------------------+-----------------+-----------------+-----------------+
+| 6 | Positive V | Negative V | - | - |
++------------+-------------------+-----------------+-----------------+-----------------+
+| 7 | - | - | Positive V | Negative V |
++------------+-------------------+-----------------+-----------------+-----------------+
+| 8 | - | - | Positive V | Negative V |
++------------+-------------------+-----------------+-----------------+-----------------+
+
+.. note::
+ - "Positive V" and "Negative V" indicate the voltage polarity for each pin.
+ - "-" indicates that the pin is not used for power delivery in that
+ specific configuration.
+
+PSE PI compatibilities
+----------------------
+
+The following table outlines the compatibility between the pinout alternative
+and the 1000/2.5G/5G/10GBaseT in the PSE 2 pairs connection.
+
++---------+---------------+---------------------+-----------------------+
+| Variant | Alternative | Power Feeding Type | Compatibility with |
+| | (A/B) | (Direct/Phantom) | 1000/2.5G/5G/10GBaseT |
++=========+===============+=====================+=======================+
+| 1 | A | Phantom | Yes |
++---------+---------------+---------------------+-----------------------+
+| 2 | B | Phantom | Yes |
++---------+---------------+---------------------+-----------------------+
+| 3 | B | Direct | No |
++---------+---------------+---------------------+-----------------------+
+
+.. note::
+ - "Direct" indicate a variant where the power is injected directly to pairs
+ without using magnetics in case of spare pairs.
+ - "Phantom" indicate power path over coils/magnetics as it is done for
+ Alternative A variant.
+
+In case of PSE 4 pairs, a PSE supporting only 10/100BaseT (which mean Direct
+Power on pinout Alternative B) is not compatible with a 4 pairs
+1000/2.5G/5G/10GBaseT.
+
+PSE Power Interface (PSE PI) Connection Diagram
+-----------------------------------------------
+
+The diagram below illustrates the connection architecture between the RJ45
+port, the Ethernet PHY (Physical Layer), and the PSE PI (Power Sourcing
+Equipment Power Interface), demonstrating how power and data are delivered
+simultaneously through an Ethernet cable. The RJ45 port serves as the physical
+interface for these connections, with each of its eight pins connected to both
+the Ethernet PHY for data transmission and the PSE PI for power delivery.
+
+.. code-block::
+
+ +--------------------------+
+ | |
+ | RJ45 Port |
+ | |
+ +--+--+--+--+--+--+--+--+--+ +-------------+
+ 1| 2| 3| 4| 5| 6| 7| 8| | |
+ | | | | | | | o-------------------+ |
+ | | | | | | o--|-------------------+ +<--- PSE 1
+ | | | | | o--|--|-------------------+ |
+ | | | | o--|--|--|-------------------+ |
+ | | | o--|--|--|--|-------------------+ PSE PI |
+ | | o--|--|--|--|--|-------------------+ |
+ | o--|--|--|--|--|--|-------------------+ +<--- PSE 2 (optional)
+ o--|--|--|--|--|--|--|-------------------+ |
+ | | | | | | | | | |
+ +--+--+--+--+--+--+--+--+--+ +-------------+
+ | |
+ | Ethernet PHY |
+ | |
+ +--------------------------+
+
+Simple PSE PI Configuration for Alternative A
+---------------------------------------------
+
+The diagram below illustrates a straightforward PSE PI (Power Sourcing
+Equipment Power Interface) configuration designed to support the Alternative A
+setup for Power over Ethernet (PoE). This implementation is tailored to provide
+power delivery through the data-carrying pairs of an Ethernet cable, suitable
+for either MDI or MDI-X configurations, albeit supporting one variation at a
+time.
+
+.. code-block::
+
+ +-------------+
+ | PSE PI |
+ 8 -----+ +-------------+
+ 7 -----+ Rail 1 |
+ 6 -----+------+----------------------+
+ 5 -----+ | |
+ 4 -----+ | Rail 2 | PSE 1
+ 3 -----+------/ +------------+
+ 2 -----+--+-------------/ |
+ 1 -----+--/ +-------------+
+ |
+ +-------------+
+
+In this configuration:
+
+- Pins 1 and 2, as well as pins 3 and 6, are utilized for power delivery in
+ addition to data transmission. This aligns with the standard wiring for
+ 10/100BaseT Ethernet networks where these pairs are used for data.
+- Rail 1 and Rail 2 represent the positive and negative voltage rails, with
+ Rail 1 connected to pins 1 and 2, and Rail 2 connected to pins 3 and 6.
+ More advanced PSE PI configurations may include integrated or external
+ switches to change the polarity of the voltage rails, allowing for
+ compatibility with both MDI and MDI-X configurations.
+
+More complex PSE PI configurations may include additional components, to support
+Alternative B, or to provide additional features such as power management, or
+additional power delivery capabilities such as 2-pair or 4-pair power delivery.
+
+.. code-block::
+
+ +-------------+
+ | PSE PI |
+ | +---+
+ 8 -----+--------+ | +-------------+
+ 7 -----+--------+ | Rail 1 |
+ 6 -----+--------+ +-----------------+
+ 5 -----+--------+ | |
+ 4 -----+--------+ | Rail 2 | PSE 1
+ 3 -----+--------+ +----------------+
+ 2 -----+--------+ | |
+ 1 -----+--------+ | +-------------+
+ | +---+
+ +-------------+
+
+Device Tree Configuration: Describing PSE PI Configurations
+-----------------------------------------------------------
+
+The necessity for a separate PSE PI node in the device tree is influenced by
+the intricacy of the Power over Ethernet (PoE) system's setup. Here are
+descriptions of both simple and complex PSE PI configurations to illustrate
+this decision-making process:
+
+**Simple PSE PI Configuration:**
+In a straightforward scenario, the PSE PI setup involves a direct, one-to-one
+connection between a single PSE controller and an Ethernet port. This setup
+typically supports basic PoE functionality without the need for dynamic
+configuration or management of multiple power delivery modes. For such simple
+configurations, detailing the PSE PI within the existing PSE controller's node
+may suffice, as the system does not encompass additional complexity that
+warrants a separate node. The primary focus here is on the clear and direct
+association of power delivery to a specific Ethernet port.
+
+**Complex PSE PI Configuration:**
+Contrastingly, a complex PSE PI setup may encompass multiple PSE controllers or
+auxiliary circuits that collectively manage power delivery to one Ethernet
+port. Such configurations might support a range of PoE standards and require
+the capability to dynamically configure power delivery based on the operational
+mode (e.g., PoE2 versus PoE4) or specific requirements of connected devices. In
+these instances, a dedicated PSE PI node becomes essential for accurately
+documenting the system architecture. This node would serve to detail the
+interactions between different PSE controllers, the support for various PoE
+modes, and any additional logic required to coordinate power delivery across
+the network infrastructure.
+
+**Guidance:**
+
+For simple PSE setups, including PSE PI information in the PSE controller node
+might suffice due to the straightforward nature of these systems. However,
+complex configurations, involving multiple components or advanced PoE features,
+benefit from a dedicated PSE PI node. This method adheres to IEEE 802.3
+specifications, improving documentation clarity and ensuring accurate
+representation of the PoE system's complexity.
+
+PSE PI Node: Essential Information
+----------------------------------
+
+The PSE PI (Power Sourcing Equipment Power Interface) node in a device tree can
+include several key pieces of information critical for defining the power
+delivery capabilities and configurations of a PoE (Power over Ethernet) system.
+Below is a list of such information, along with explanations for their
+necessity and reasons why they might not be found within a PSE controller node:
+
+1. **Powered Pairs Configuration**
+
+ - *Description:* Identifies the pairs used for power delivery in the
+ Ethernet cable.
+ - *Necessity:* Essential to ensure the correct pairs are powered according
+ to the board's design.
+ - *PSE Controller Node:* Typically lacks details on physical pair usage,
+ focusing on power regulation.
+
+2. **Polarity of Powered Pairs**
+
+ - *Description:* Specifies the polarity (positive or negative) for each
+ powered pair.
+ - *Necessity:* Critical for safe and effective power transmission to PDs.
+ - *PSE Controller Node:* Polarity management may exceed the standard
+ functionalities of PSE controllers.
+
+3. **PSE Cells Association**
+
+ - *Description:* Details the association of PSE cells with Ethernet ports or
+ pairs in multi-cell configurations.
+ - *Necessity:* Allows for optimized power resource allocation in complex
+ systems.
+ - *PSE Controller Node:* Controllers may not manage cell associations
+ directly, focusing instead on power flow regulation.
+
+4. **Support for PoE Standards**
+
+ - *Description:* Lists the PoE standards and configurations supported by the
+ system.
+ - *Necessity:* Ensures system compatibility with various PDs and adherence
+ to industry standards.
+ - *PSE Controller Node:* Specific capabilities may depend on the overall PSE
+ PI design rather than the controller alone. Multiple PSE cells per PI
+ do not necessarily imply support for multiple PoE standards.
+
+5. **Protection Mechanisms**
+
+ - *Description:* Outlines additional protection mechanisms, such as
+ overcurrent protection and thermal management.
+ - *Necessity:* Provides extra safety and stability, complementing PSE
+ controller protections.
+ - *PSE Controller Node:* Some protections may be implemented via
+ board-specific hardware or algorithms external to the controller.
diff --git a/Documentation/networking/xfrm_proc.rst b/Documentation/networking/xfrm_proc.rst
index 0a771c5a73..973d1571ac 100644
--- a/Documentation/networking/xfrm_proc.rst
+++ b/Documentation/networking/xfrm_proc.rst
@@ -73,6 +73,9 @@ XfrmAcquireError:
XfrmFwdHdrError:
Forward routing of a packet is not allowed
+XfrmInStateDirError:
+ State direction mismatch (lookup found an output state on the input path, expected input or no direction)
+
Outbound errors
~~~~~~~~~~~~~~~
XfrmOutError:
@@ -111,3 +114,6 @@ XfrmOutPolError:
XfrmOutStateInvalid:
State is invalid, perhaps expired
+
+XfrmOutStateDirError:
+ State direction mismatch (lookup found an input state on the output path, expected output or no direction)
diff --git a/Documentation/networking/xsk-tx-metadata.rst b/Documentation/networking/xsk-tx-metadata.rst
index bd033fe95c..e76b0cfc32 100644
--- a/Documentation/networking/xsk-tx-metadata.rst
+++ b/Documentation/networking/xsk-tx-metadata.rst
@@ -11,12 +11,16 @@ metadata on the receive side.
General Design
==============
-The headroom for the metadata is reserved via ``tx_metadata_len`` in
-``struct xdp_umem_reg``. The metadata length is therefore the same for
-every socket that shares the same umem. The metadata layout is a fixed UAPI,
-refer to ``union xsk_tx_metadata`` in ``include/uapi/linux/if_xdp.h``.
-Thus, generally, the ``tx_metadata_len`` field above should contain
-``sizeof(union xsk_tx_metadata)``.
+The headroom for the metadata is reserved via ``tx_metadata_len`` and
+``XDP_UMEM_TX_METADATA_LEN`` flag in ``struct xdp_umem_reg``. The metadata
+length is therefore the same for every socket that shares the same umem.
+The metadata layout is a fixed UAPI, refer to ``union xsk_tx_metadata`` in
+``include/uapi/linux/if_xdp.h``. Thus, generally, the ``tx_metadata_len``
+field above should contain ``sizeof(union xsk_tx_metadata)``.
+
+Note that in the original implementation the ``XDP_UMEM_TX_METADATA_LEN``
+flag was not required. Applications might attempt to create a umem
+with a flag first and if it fails, do another attempt without a flag.
The headroom and the metadata itself should be located right before
``xdp_desc->addr`` in the umem frame. Within a frame, the metadata