diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-27 10:05:51 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-27 10:05:51 +0000 |
commit | 5d1646d90e1f2cceb9f0828f4b28318cd0ec7744 (patch) | |
tree | a94efe259b9009378be6d90eb30d2b019d95c194 /Documentation/networking/devlink | |
parent | Initial commit. (diff) | |
download | linux-5d1646d90e1f2cceb9f0828f4b28318cd0ec7744.tar.xz linux-5d1646d90e1f2cceb9f0828f4b28318cd0ec7744.zip |
Adding upstream version 5.10.209.upstream/5.10.209upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'Documentation/networking/devlink')
22 files changed, 2466 insertions, 0 deletions
diff --git a/Documentation/networking/devlink/bnxt.rst b/Documentation/networking/devlink/bnxt.rst new file mode 100644 index 000000000..3dfd84ccb --- /dev/null +++ b/Documentation/networking/devlink/bnxt.rst @@ -0,0 +1,80 @@ +.. SPDX-License-Identifier: GPL-2.0 + +==================== +bnxt devlink support +==================== + +This document describes the devlink features implemented by the ``bnxt`` +device driver. + +Parameters +========== + +.. list-table:: Generic parameters implemented + + * - Name + - Mode + * - ``enable_sriov`` + - Permanent + * - ``ignore_ari`` + - Permanent + * - ``msix_vec_per_pf_max`` + - Permanent + * - ``msix_vec_per_pf_min`` + - Permanent + +The ``bnxt`` driver also implements the following driver-specific +parameters. + +.. list-table:: Driver-specific parameters implemented + :widths: 5 5 5 85 + + * - Name + - Type + - Mode + - Description + * - ``gre_ver_check`` + - Boolean + - Permanent + - Generic Routing Encapsulation (GRE) version check will be enabled in + the device. If disabled, the device will skip the version check for + incoming packets. + +Info versions +============= + +The ``bnxt_en`` driver reports the following versions + +.. list-table:: devlink info versions implemented + :widths: 5 5 90 + + * - Name + - Type + - Description + * - ``board.id`` + - fixed + - Part number identifying the board design + * - ``asic.id`` + - fixed + - ASIC design identifier + * - ``asic.rev`` + - fixed + - ASIC design revision + * - ``fw.psid`` + - stored, running + - Firmware parameter set version of the board + * - ``fw`` + - stored, running + - Overall board firmware version + * - ``fw.mgmt`` + - stored, running + - NIC hardware resource management firmware version + * - ``fw.mgmt.api`` + - running + - Minimum firmware interface spec version supported between driver and firmware + * - ``fw.nsci`` + - stored, running + - General platform management firmware version + * - ``fw.roce`` + - stored, running + - RoCE management firmware version diff --git a/Documentation/networking/devlink/devlink-dpipe.rst b/Documentation/networking/devlink/devlink-dpipe.rst new file mode 100644 index 000000000..468fe1001 --- /dev/null +++ b/Documentation/networking/devlink/devlink-dpipe.rst @@ -0,0 +1,252 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============= +Devlink DPIPE +============= + +Background +========== + +While performing the hardware offloading process, much of the hardware +specifics cannot be presented. These details are useful for debugging, and +``devlink-dpipe`` provides a standardized way to provide visibility into the +offloading process. + +For example, the routing longest prefix match (LPM) algorithm used by the +Linux kernel may differ from the hardware implementation. The pipeline debug +API (DPIPE) is aimed at providing the user visibility into the ASIC's +pipeline in a generic way. + +The hardware offload process is expected to be done in a way that the user +should not be able to distinguish between the hardware vs. software +implementation. In this process, hardware specifics are neglected. In +reality those details can have lots of meaning and should be exposed in some +standard way. + +This problem is made even more complex when one wishes to offload the +control path of the whole networking stack to a switch ASIC. Due to +differences in the hardware and software models some processes cannot be +represented correctly. + +One example is the kernel's LPM algorithm which in many cases differs +greatly to the hardware implementation. The configuration API is the same, +but one cannot rely on the Forward Information Base (FIB) to look like the +Level Path Compression trie (LPC-trie) in hardware. + +In many situations trying to analyze systems failure solely based on the +kernel's dump may not be enough. By combining this data with complementary +information about the underlying hardware, this debugging can be made +easier; additionally, the information can be useful when debugging +performance issues. + +Overview +======== + +The ``devlink-dpipe`` interface closes this gap. The hardware's pipeline is +modeled as a graph of match/action tables. Each table represents a specific +hardware block. This model is not new, first being used by the P4 language. + +Traditionally it has been used as an alternative model for hardware +configuration, but the ``devlink-dpipe`` interface uses it for visibility +purposes as a standard complementary tool. The system's view from +``devlink-dpipe`` should change according to the changes done by the +standard configuration tools. + +For example, it’s quiet common to implement Access Control Lists (ACL) +using Ternary Content Addressable Memory (TCAM). The TCAM memory can be +divided into TCAM regions. Complex TC filters can have multiple rules with +different priorities and different lookup keys. On the other hand hardware +TCAM regions have a predefined lookup key. Offloading the TC filter rules +using TCAM engine can result in multiple TCAM regions being interconnected +in a chain (which may affect the data path latency). In response to a new TC +filter new tables should be created describing those regions. + +Model +===== + +The ``DPIPE`` model introduces several objects: + + * headers + * tables + * entries + +A ``header`` describes packet formats and provides names for fields within +the packet. A ``table`` describes hardware blocks. An ``entry`` describes +the actual content of a specific table. + +The hardware pipeline is not port specific, but rather describes the whole +ASIC. Thus it is tied to the top of the ``devlink`` infrastructure. + +Drivers can register and unregister tables at run time, in order to support +dynamic behavior. This dynamic behavior is mandatory for describing hardware +blocks like TCAM regions which can be allocated and freed dynamically. + +``devlink-dpipe`` generally is not intended for configuration. The exception +is hardware counting for a specific table. + +The following commands are used to obtain the ``dpipe`` objects from +userspace: + + * ``table_get``: Receive a table's description. + * ``headers_get``: Receive a device's supported headers. + * ``entries_get``: Receive a table's current entries. + * ``counters_set``: Enable or disable counters on a table. + +Table +----- + +The driver should implement the following operations for each table: + + * ``matches_dump``: Dump the supported matches. + * ``actions_dump``: Dump the supported actions. + * ``entries_dump``: Dump the actual content of the table. + * ``counters_set_update``: Synchronize hardware with counters enabled or + disabled. + +Header/Field +------------ + +In a similar way to P4 headers and fields are used to describe a table's +behavior. There is a slight difference between the standard protocol headers +and specific ASIC metadata. The protocol headers should be declared in the +``devlink`` core API. On the other hand ASIC meta data is driver specific +and should be defined in the driver. Additionally, each driver-specific +devlink documentation file should document the driver-specific ``dpipe`` +headers it implements. The headers and fields are identified by enumeration. + +In order to provide further visibility some ASIC metadata fields could be +mapped to kernel objects. For example, internal router interface indexes can +be directly mapped to the net device ifindex. FIB table indexes used by +different Virtual Routing and Forwarding (VRF) tables can be mapped to +internal routing table indexes. + +Match +----- + +Matches are kept primitive and close to hardware operation. Match types like +LPM are not supported due to the fact that this is exactly a process we wish +to describe in full detail. Example of matches: + + * ``field_exact``: Exact match on a specific field. + * ``field_exact_mask``: Exact match on a specific field after masking. + * ``field_range``: Match on a specific range. + +The id's of the header and the field should be specified in order to +identify the specific field. Furthermore, the header index should be +specified in order to distinguish multiple headers of the same type in a +packet (tunneling). + +Action +------ + +Similar to match, the actions are kept primitive and close to hardware +operation. For example: + + * ``field_modify``: Modify the field value. + * ``field_inc``: Increment the field value. + * ``push_header``: Add a header. + * ``pop_header``: Remove a header. + +Entry +----- + +Entries of a specific table can be dumped on demand. Each eentry is +identified with an index and its properties are described by a list of +match/action values and specific counter. By dumping the tables content the +interactions between tables can be resolved. + +Abstraction Example +=================== + +The following is an example of the abstraction model of the L3 part of +Mellanox Spectrum ASIC. The blocks are described in the order they appear in +the pipeline. The table sizes in the following examples are not real +hardware sizes and are provided for demonstration purposes. + +LPM +--- + +The LPM algorithm can be implemented as a list of hash tables. Each hash +table contains routes with the same prefix length. The root of the list is +/32, and in case of a miss the hardware will continue to the next hash +table. The depth of the search will affect the data path latency. + +In case of a hit the entry contains information about the next stage of the +pipeline which resolves the MAC address. The next stage can be either local +host table for directly connected routes, or adjacency table for next-hops. +The ``meta.lpm_prefix`` field is used to connect two LPM tables. + +.. code:: + + table lpm_prefix_16 { + size: 4096, + counters_enabled: true, + match: { meta.vr_id: exact, + ipv4.dst_addr: exact_mask, + ipv6.dst_addr: exact_mask, + meta.lpm_prefix: exact }, + action: { meta.adj_index: set, + meta.adj_group_size: set, + meta.rif_port: set, + meta.lpm_prefix: set }, + } + +Local Host +---------- + +In the case of local routes the LPM lookup already resolves the egress +router interface (RIF), yet the exact MAC address is not known. The local +host table is a hash table combining the output interface id with +destination IP address as a key. The result is the MAC address. + +.. code:: + + table local_host { + size: 4096, + counters_enabled: true, + match: { meta.rif_port: exact, + ipv4.dst_addr: exact}, + action: { ethernet.daddr: set } + } + +Adjacency +--------- + +In case of remote routes this table does the ECMP. The LPM lookup results in +ECMP group size and index that serves as a global offset into this table. +Concurrently a hash of the packet is generated. Based on the ECMP group size +and the packet's hash a local offset is generated. Multiple LPM entries can +point to the same adjacency group. + +.. code:: + + table adjacency { + size: 4096, + counters_enabled: true, + match: { meta.adj_index: exact, + meta.adj_group_size: exact, + meta.packet_hash_index: exact }, + action: { ethernet.daddr: set, + meta.erif: set } + } + +ERIF +---- + +In case the egress RIF and destination MAC have been resolved by previous +tables this table does multiple operations like TTL decrease and MTU check. +Then the decision of forward/drop is taken and the port L3 statistics are +updated based on the packet's type (broadcast, unicast, multicast). + +.. code:: + + table erif { + size: 800, + counters_enabled: true, + match: { meta.rif_port: exact, + meta.is_l3_unicast: exact, + meta.is_l3_broadcast: exact, + meta.is_l3_multicast, exact }, + action: { meta.l3_drop: set, + meta.l3_forward: set } + } diff --git a/Documentation/networking/devlink/devlink-flash.rst b/Documentation/networking/devlink/devlink-flash.rst new file mode 100644 index 000000000..603e732f0 --- /dev/null +++ b/Documentation/networking/devlink/devlink-flash.rst @@ -0,0 +1,121 @@ +.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) + +.. _devlink_flash: + +============= +Devlink Flash +============= + +The ``devlink-flash`` API allows updating device firmware. It replaces the +older ``ethtool-flash`` mechanism, and doesn't require taking any +networking locks in the kernel to perform the flash update. Example use:: + + $ devlink dev flash pci/0000:05:00.0 file flash-boot.bin + +Note that the file name is a path relative to the firmware loading path +(usually ``/lib/firmware/``). Drivers may send status updates to inform +user space about the progress of the update operation. + +Overwrite Mask +============== + +The ``devlink-flash`` command allows optionally specifying a mask indicating +how the device should handle subsections of flash components when updating. +This mask indicates the set of sections which are allowed to be overwritten. + +.. list-table:: List of overwrite mask bits + :widths: 5 95 + + * - Name + - Description + * - ``DEVLINK_FLASH_OVERWRITE_SETTINGS`` + - Indicates that the device should overwrite settings in the components + being updated with the settings found in the provided image. + * - ``DEVLINK_FLASH_OVERWRITE_IDENTIFIERS`` + - Indicates that the device should overwrite identifiers in the + components being updated with the identifiers found in the provided + image. This includes MAC addresses, serial IDs, and similar device + identifiers. + +Multiple overwrite bits may be combined and requested together. If no bits +are provided, it is expected that the device only update firmware binaries +in the components being updated. Settings and identifiers are expected to be +preserved across the update. A device may not support every combination and +the driver for such a device must reject any combination which cannot be +faithfully implemented. + +Firmware Loading +================ + +Devices which require firmware to operate usually store it in non-volatile +memory on the board, e.g. flash. Some devices store only basic firmware on +the board, and the driver loads the rest from disk during probing. +``devlink-info`` allows users to query firmware information (loaded +components and versions). + +In other cases the device can both store the image on the board, load from +disk, or automatically flash a new image from disk. The ``fw_load_policy`` +devlink parameter can be used to control this behavior +(:ref:`Documentation/networking/devlink/devlink-params.rst <devlink_params_generic>`). + +On-disk firmware files are usually stored in ``/lib/firmware/``. + +Firmware Version Management +=========================== + +Drivers are expected to implement ``devlink-flash`` and ``devlink-info`` +functionality, which together allow for implementing vendor-independent +automated firmware update facilities. + +``devlink-info`` exposes the ``driver`` name and three version groups +(``fixed``, ``running``, ``stored``). + +The ``driver`` attribute and ``fixed`` group identify the specific device +design, e.g. for looking up applicable firmware updates. This is why +``serial_number`` is not part of the ``fixed`` versions (even though it +is fixed) - ``fixed`` versions should identify the design, not a single +device. + +``running`` and ``stored`` firmware versions identify the firmware running +on the device, and firmware which will be activated after reboot or device +reset. + +The firmware update agent is supposed to be able to follow this simple +algorithm to update firmware contents, regardless of the device vendor: + +.. code-block:: sh + + # Get unique HW design identifier + $hw_id = devlink-dev-info['fixed'] + + # Find out which FW flash we want to use for this NIC + $want_flash_vers = some-db-backed.lookup($hw_id, 'flash') + + # Update flash if necessary + if $want_flash_vers != devlink-dev-info['stored']: + $file = some-db-backed.download($hw_id, 'flash') + devlink-dev-flash($file) + + # Find out the expected overall firmware versions + $want_fw_vers = some-db-backed.lookup($hw_id, 'all') + + # Update on-disk file if necessary + if $want_fw_vers != devlink-dev-info['running']: + $file = some-db-backed.download($hw_id, 'disk') + write($file, '/lib/firmware/') + + # Try device reset, if available + if $want_fw_vers != devlink-dev-info['running']: + devlink-reset() + + # Reboot, if reset wasn't enough + if $want_fw_vers != devlink-dev-info['running']: + reboot() + +Note that each reference to ``devlink-dev-info`` in this pseudo-code +is expected to fetch up-to-date information from the kernel. + +For the convenience of identifying firmware files some vendors add +``bundle_id`` information to the firmware versions. This meta-version covers +multiple per-component versions and can be used e.g. in firmware file names +(all component versions could get rather long.) diff --git a/Documentation/networking/devlink/devlink-health.rst b/Documentation/networking/devlink/devlink-health.rst new file mode 100644 index 000000000..0c99b11f0 --- /dev/null +++ b/Documentation/networking/devlink/devlink-health.rst @@ -0,0 +1,114 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============== +Devlink Health +============== + +Background +========== + +The ``devlink`` health mechanism is targeted for Real Time Alerting, in +order to know when something bad happened to a PCI device. + + * Provide alert debug information. + * Self healing. + * If problem needs vendor support, provide a way to gather all needed + debugging information. + +Overview +======== + +The main idea is to unify and centralize driver health reports in the +generic ``devlink`` instance and allow the user to set different +attributes of the health reporting and recovery procedures. + +The ``devlink`` health reporter: +Device driver creates a "health reporter" per each error/health type. +Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error) +or unknown (driver specific). +For each registered health reporter a driver can issue error/health reports +asynchronously. All health reports handling is done by ``devlink``. +Device driver can provide specific callbacks for each "health reporter", e.g.: + + * Recovery procedures + * Diagnostics procedures + * Object dump procedures + * OOB initial parameters + +Different parts of the driver can register different types of health reporters +with different handlers. + +Actions +======= + +Once an error is reported, devlink health will perform the following actions: + + * A log is being send to the kernel trace events buffer + * Health status and statistics are being updated for the reporter instance + * Object dump is being taken and saved at the reporter instance (as long as + there is no other dump which is already stored) + * Auto recovery attempt is being done. Depends on: + - Auto-recovery configuration + - Grace period vs. time passed since last recover + +User Interface +============== + +User can access/change each reporter's parameters and driver specific callbacks +via ``devlink``, e.g per error type (per health reporter): + + * Configure reporter's generic parameters (like: disable/enable auto recovery) + * Invoke recovery procedure + * Run diagnostics + * Object dump + +.. list-table:: List of devlink health interfaces + :widths: 10 90 + + * - Name + - Description + * - ``DEVLINK_CMD_HEALTH_REPORTER_GET`` + - Retrieves status and configuration info per DEV and reporter. + * - ``DEVLINK_CMD_HEALTH_REPORTER_SET`` + - Allows reporter-related configuration setting. + * - ``DEVLINK_CMD_HEALTH_REPORTER_RECOVER`` + - Triggers a reporter's recovery procedure. + * - ``DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE`` + - Retrieves diagnostics data from a reporter on a device. + * - ``DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET`` + - Retrieves the last stored dump. Devlink health + saves a single dump. If an dump is not already stored by the devlink + for this reporter, devlink generates a new dump. + dump output is defined by the reporter. + * - ``DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR`` + - Clears the last saved dump file for the specified reporter. + +The following diagram provides a general overview of ``devlink-health``:: + + netlink + +--------------------------+ + | | + | + | + | | | + +--------------------------+ + |request for ops + |(diagnose, + mlx5_core devlink |recover, + |dump) + +--------+ +--------------------------+ + | | | reporter| | + | | | +---------v----------+ | + | | ops execution | | | | + | <----------------------------------+ | | + | | | | | | + | | | + ^------------------+ | + | | | | request for ops | + | | | | (recover, dump) | + | | | | | + | | | +-+------------------+ | + | | health report | | health handler | | + | +-------------------------------> | | + | | | +--------------------+ | + | | health reporter create | | + | +----------------------------> | + +--------+ +--------------------------+ diff --git a/Documentation/networking/devlink/devlink-info.rst b/Documentation/networking/devlink/devlink-info.rst new file mode 100644 index 000000000..7572bf6de --- /dev/null +++ b/Documentation/networking/devlink/devlink-info.rst @@ -0,0 +1,210 @@ +.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) + +============ +Devlink Info +============ + +The ``devlink-info`` mechanism enables device drivers to report device +(hardware and firmware) information in a standard, extensible fashion. + +The original motivation for the ``devlink-info`` API was twofold: + + - making it possible to automate device and firmware management in a fleet + of machines in a vendor-independent fashion (see also + :ref:`Documentation/networking/devlink/devlink-flash.rst <devlink_flash>`); + - name the per component FW versions (as opposed to the crowded ethtool + version string). + +``devlink-info`` supports reporting multiple types of objects. Reporting driver +versions is generally discouraged - here, and via any other Linux API. + +.. list-table:: List of top level info objects + :widths: 5 95 + + * - Name + - Description + * - ``driver`` + - Name of the currently used device driver, also available through sysfs. + + * - ``serial_number`` + - Serial number of the device. + + This is usually the serial number of the ASIC, also often available + in PCI config space of the device in the *Device Serial Number* + capability. + + The serial number should be unique per physical device. + Sometimes the serial number of the device is only 48 bits long (the + length of the Ethernet MAC address), and since PCI DSN is 64 bits long + devices pad or encode additional information into the serial number. + One example is adding port ID or PCI interface ID in the extra two bytes. + Drivers should make sure to strip or normalize any such padding + or interface ID, and report only the part of the serial number + which uniquely identifies the hardware. In other words serial number + reported for two ports of the same device or on two hosts of + a multi-host device should be identical. + + * - ``board.serial_number`` + - Board serial number of the device. + + This is usually the serial number of the board, often available in + PCI *Vital Product Data*. + + * - ``fixed`` + - Group for hardware identifiers, and versions of components + which are not field-updatable. + + Versions in this section identify the device design. For example, + component identifiers or the board version reported in the PCI VPD. + Data in ``devlink-info`` should be broken into the smallest logical + components, e.g. PCI VPD may concatenate various information + to form the Part Number string, while in ``devlink-info`` all parts + should be reported as separate items. + + This group must not contain any frequently changing identifiers, + such as serial numbers. See + :ref:`Documentation/networking/devlink/devlink-flash.rst <devlink_flash>` + to understand why. + + * - ``running`` + - Group for information about currently running software/firmware. + These versions often only update after a reboot, sometimes device reset. + + * - ``stored`` + - Group for software/firmware versions in device flash. + + Stored values must update to reflect changes in the flash even + if reboot has not yet occurred. If device is not capable of updating + ``stored`` versions when new software is flashed, it must not report + them. + +Each version can be reported at most once in each version group. Firmware +components stored on the flash should feature in both the ``running`` and +``stored`` sections, if device is capable of reporting ``stored`` versions +(see :ref:`Documentation/networking/devlink/devlink-flash.rst <devlink_flash>`). +In case software/firmware components are loaded from the disk (e.g. +``/lib/firmware``) only the running version should be reported via +the kernel API. + +Generic Versions +================ + +It is expected that drivers use the following generic names for exporting +version information. If a generic name for a given component doesn't exist yet, +driver authors should consult existing driver-specific versions and attempt +reuse. As last resort, if a component is truly unique, using driver-specific +names is allowed, but these should be documented in the driver-specific file. + +All versions should try to use the following terminology: + +.. list-table:: List of common version suffixes + :widths: 10 90 + + * - Name + - Description + * - ``id``, ``revision`` + - Identifiers of designs and revision, mostly used for hardware versions. + + * - ``api`` + - Version of API between components. API items are usually of limited + value to the user, and can be inferred from other versions by the vendor, + so adding API versions is generally discouraged as noise. + + * - ``bundle_id`` + - Identifier of a distribution package which was flashed onto the device. + This is an attribute of a firmware package which covers multiple versions + for ease of managing firmware images (see + :ref:`Documentation/networking/devlink/devlink-flash.rst <devlink_flash>`). + + ``bundle_id`` can appear in both ``running`` and ``stored`` versions, + but it must not be reported if any of the components covered by the + ``bundle_id`` was changed and no longer matches the version from + the bundle. + +board.id +-------- + +Unique identifier of the board design. + +board.rev +--------- + +Board design revision. + +asic.id +------- + +ASIC design identifier. + +asic.rev +-------- + +ASIC design revision/stepping. + +board.manufacture +----------------- + +An identifier of the company or the facility which produced the part. + +fw +-- + +Overall firmware version, often representing the collection of +fw.mgmt, fw.app, etc. + +fw.mgmt +------- + +Control unit firmware version. This firmware is responsible for house +keeping tasks, PHY control etc. but not the packet-by-packet data path +operation. + +fw.mgmt.api +----------- + +Firmware interface specification version of the software interfaces between +driver and firmware. + +fw.app +------ + +Data path microcode controlling high-speed packet processing. + +fw.undi +------- + +UNDI software, may include the UEFI driver, firmware or both. + +fw.ncsi +------- + +Version of the software responsible for supporting/handling the +Network Controller Sideband Interface. + +fw.psid +------- + +Unique identifier of the firmware parameter set. These are usually +parameters of a particular board, defined at manufacturing time. + +fw.roce +------- + +RoCE firmware version which is responsible for handling roce +management. + +fw.bundle_id +------------ + +Unique identifier of the entire firmware bundle. + +Future work +=========== + +The following extensions could be useful: + + - on-disk firmware file names - drivers list the file names of firmware they + may need to load onto devices via the ``MODULE_FIRMWARE()`` macro. These, + however, are per module, rather than per device. It'd be useful to list + the names of firmware files the driver will try to load for a given device, + in order of priority. diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst new file mode 100644 index 000000000..54c9f107c --- /dev/null +++ b/Documentation/networking/devlink/devlink-params.rst @@ -0,0 +1,116 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============== +Devlink Params +============== + +``devlink`` provides capability for a driver to expose device parameters for low +level device functionality. Since devlink can operate at the device-wide +level, it can be used to provide configuration that may affect multiple +ports on a single device. + +This document describes a number of generic parameters that are supported +across multiple drivers. Each driver is also free to add their own +parameters. Each driver must document the specific parameters they support, +whether generic or not. + +Configuration modes +=================== + +Parameters may be set in different configuration modes. + +.. list-table:: Possible configuration modes + :widths: 5 90 + + * - Name + - Description + * - ``runtime`` + - set while the driver is running, and takes effect immediately. No + reset is required. + * - ``driverinit`` + - applied while the driver initializes. Requires the user to restart + the driver using the ``devlink`` reload command. + * - ``permanent`` + - written to the device's non-volatile memory. A hard reset is required + for it to take effect. + +Reloading +--------- + +In order for ``driverinit`` parameters to take effect, the driver must +support reloading via the ``devlink-reload`` command. This command will +request a reload of the device driver. + +.. _devlink_params_generic: + +Generic configuration parameters +================================ +The following is a list of generic configuration parameters that drivers may +add. Use of generic parameters is preferred over each driver creating their +own name. + +.. list-table:: List of generic parameters + :widths: 5 5 90 + + * - Name + - Type + - Description + * - ``enable_sriov`` + - Boolean + - Enable Single Root I/O Virtualization (SRIOV) in the device. + * - ``ignore_ari`` + - Boolean + - Ignore Alternative Routing-ID Interpretation (ARI) capability. If + enabled, the adapter will ignore ARI capability even when the + platform has support enabled. The device will create the same number + of partitions as when the platform does not support ARI. + * - ``msix_vec_per_pf_max`` + - u32 + - Provides the maximum number of MSI-X interrupts that a device can + create. Value is the same across all physical functions (PFs) in the + device. + * - ``msix_vec_per_pf_min`` + - u32 + - Provides the minimum number of MSI-X interrupts required for the + device to initialize. Value is the same across all physical functions + (PFs) in the device. + * - ``fw_load_policy`` + - u8 + - Control the device's firmware loading policy. + - ``DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_DRIVER`` (0) + Load firmware version preferred by the driver. + - ``DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_FLASH`` (1) + Load firmware currently stored in flash. + - ``DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_DISK`` (2) + Load firmware currently available on host's disk. + * - ``reset_dev_on_drv_probe`` + - u8 + - Controls the device's reset policy on driver probe. + - ``DEVLINK_PARAM_RESET_DEV_ON_DRV_PROBE_VALUE_UNKNOWN`` (0) + Unknown or invalid value. + - ``DEVLINK_PARAM_RESET_DEV_ON_DRV_PROBE_VALUE_ALWAYS`` (1) + Always reset device on driver probe. + - ``DEVLINK_PARAM_RESET_DEV_ON_DRV_PROBE_VALUE_NEVER`` (2) + Never reset device on driver probe. + - ``DEVLINK_PARAM_RESET_DEV_ON_DRV_PROBE_VALUE_DISK`` (3) + Reset the device only if firmware can be found in the filesystem. + * - ``enable_roce`` + - Boolean + - Enable handling of RoCE traffic in the device. + * - ``internal_err_reset`` + - Boolean + - When enabled, the device driver will reset the device on internal + errors. + * - ``max_macs`` + - u32 + - Specifies the maximum number of MAC addresses per ethernet port of + this device. + * - ``region_snapshot_enable`` + - Boolean + - Enable capture of ``devlink-region`` snapshots. + * - ``enable_remote_dev_reset`` + - Boolean + - Enable device reset by remote host. When cleared, the device driver + will NACK any attempt of other host to reset the device. This parameter + is useful for setups where a device is shared by different hosts, such + as multi-host setup. diff --git a/Documentation/networking/devlink/devlink-region.rst b/Documentation/networking/devlink/devlink-region.rst new file mode 100644 index 000000000..3654c3e96 --- /dev/null +++ b/Documentation/networking/devlink/devlink-region.rst @@ -0,0 +1,70 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============== +Devlink Region +============== + +``devlink`` regions enable access to driver defined address regions using +devlink. + +Each device can create and register its own supported address regions. The +region can then be accessed via the devlink region interface. + +Region snapshots are collected by the driver, and can be accessed via read +or dump commands. This allows future analysis on the created snapshots. +Regions may optionally support triggering snapshots on demand. + +Snapshot identifiers are scoped to the devlink instance, not a region. +All snapshots with the same snapshot id within a devlink instance +correspond to the same event. + +The major benefit to creating a region is to provide access to internal +address regions that are otherwise inaccessible to the user. + +Regions may also be used to provide an additional way to debug complex error +states, but see also :doc:`devlink-health` + +Regions may optionally support capturing a snapshot on demand via the +``DEVLINK_CMD_REGION_NEW`` netlink message. A driver wishing to allow +requested snapshots must implement the ``.snapshot`` callback for the region +in its ``devlink_region_ops`` structure. If snapshot id is not set in +the ``DEVLINK_CMD_REGION_NEW`` request kernel will allocate one and send +the snapshot information to user space. + +example usage +------------- + +.. code:: shell + + $ devlink region help + $ devlink region show [ DEV/REGION ] + $ devlink region del DEV/REGION snapshot SNAPSHOT_ID + $ devlink region dump DEV/REGION [ snapshot SNAPSHOT_ID ] + $ devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ] address ADDRESS length length + + # Show all of the exposed regions with region sizes: + $ devlink region show + pci/0000:00:05.0/cr-space: size 1048576 snapshot [1 2] + pci/0000:00:05.0/fw-health: size 64 snapshot [1 2] + + # Delete a snapshot using: + $ devlink region del pci/0000:00:05.0/cr-space snapshot 1 + + # Request an immediate snapshot, if supported by the region + $ devlink region new pci/0000:00:05.0/cr-space + pci/0000:00:05.0/cr-space: snapshot 5 + + # Dump a snapshot: + $ devlink region dump pci/0000:00:05.0/fw-health snapshot 1 + 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30 + 0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8 + 0000000000000020 0016 0bb8 0016 1720 0000 0000 c00f 3ffc + 0000000000000030 bada cce5 bada cce5 bada cce5 bada cce5 + + # Read a specific part of a snapshot: + $ devlink region read pci/0000:00:05.0/fw-health snapshot 1 address 0 length 16 + 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30 + +As regions are likely very device or driver specific, no generic regions are +defined. See the driver-specific documentation files for information on the +specific regions a driver supports. diff --git a/Documentation/networking/devlink/devlink-reload.rst b/Documentation/networking/devlink/devlink-reload.rst new file mode 100644 index 000000000..505d22da0 --- /dev/null +++ b/Documentation/networking/devlink/devlink-reload.rst @@ -0,0 +1,81 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============== +Devlink Reload +============== + +``devlink-reload`` provides mechanism to reinit driver entities, applying +``devlink-params`` and ``devlink-resources`` new values. It also provides +mechanism to activate firmware. + +Reload Actions +============== + +User may select a reload action. +By default ``driver_reinit`` action is selected. + +.. list-table:: Possible reload actions + :widths: 5 90 + + * - Name + - Description + * - ``driver-reinit`` + - Devlink driver entities re-initialization, including applying + new values to devlink entities which are used during driver + load such as ``devlink-params`` in configuration mode + ``driverinit`` or ``devlink-resources`` + * - ``fw_activate`` + - Firmware activate. Activates new firmware if such image is stored and + pending activation. If no limitation specified this action may involve + firmware reset. If no new image pending this action will reload current + firmware image. + +Note that even though user asks for a specific action, the driver +implementation might require to perform another action alongside with +it. For example, some driver do not support driver reinitialization +being performed without fw activation. Therefore, the devlink reload +command returns the list of actions which were actrually performed. + +Reload Limits +============= + +By default reload actions are not limited and driver implementation may +include reset or downtime as needed to perform the actions. + +However, some drivers support action limits, which limit the action +implementation to specific constraints. + +.. list-table:: Possible reload limits + :widths: 5 90 + + * - Name + - Description + * - ``no_reset`` + - No reset allowed, no down time allowed, no link flap and no + configuration is lost. + +Change Namespace +================ + +The netns option allows user to be able to move devlink instances into +namespaces during devlink reload operation. +By default all devlink instances are created in init_net and stay there. + +example usage +------------- + +.. code:: shell + + $ devlink dev reload help + $ devlink dev reload DEV [ netns { PID | NAME | ID } ] [ action { driver_reinit | fw_activate } ] [ limit no_reset ] + + # Run reload command for devlink driver entities re-initialization: + $ devlink dev reload pci/0000:82:00.0 action driver_reinit + reload_actions_performed: + driver_reinit + + # Run reload command to activate firmware: + # Note that mlx5 driver reloads the driver while activating firmware + $ devlink dev reload pci/0000:82:00.0 action fw_activate + reload_actions_performed: + driver_reinit fw_activate diff --git a/Documentation/networking/devlink/devlink-resource.rst b/Documentation/networking/devlink/devlink-resource.rst new file mode 100644 index 000000000..93e92d2f0 --- /dev/null +++ b/Documentation/networking/devlink/devlink-resource.rst @@ -0,0 +1,62 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================ +Devlink Resource +================ + +``devlink`` provides the ability for drivers to register resources, which +can allow administrators to see the device restrictions for a given +resource, as well as how much of the given resource is currently +in use. Additionally, these resources can optionally have configurable size. +This could enable the administrator to limit the number of resources that +are used. + +For example, the ``netdevsim`` driver enables ``/IPv4/fib`` and +``/IPv4/fib-rules`` as resources to limit the number of IPv4 FIB entries and +rules for a given device. + +Resource Ids +============ + +Each resource is represented by an id, and contains information about its +current size and related sub resources. To access a sub resource, you +specify the path of the resource. For example ``/IPv4/fib`` is the id for +the ``fib`` sub-resource under the ``IPv4`` resource. + +example usage +------------- + +The resources exposed by the driver can be observed, for example: + +.. code:: shell + + $devlink resource show pci/0000:03:00.0 + pci/0000:03:00.0: + name kvd size 245760 unit entry + resources: + name linear size 98304 occ 0 unit entry size_min 0 size_max 147456 size_gran 128 + name hash_double size 60416 unit entry size_min 32768 size_max 180224 size_gran 128 + name hash_single size 87040 unit entry size_min 65536 size_max 212992 size_gran 128 + +Some resource's size can be changed. Examples: + +.. code:: shell + + $devlink resource set pci/0000:03:00.0 path /kvd/hash_single size 73088 + $devlink resource set pci/0000:03:00.0 path /kvd/hash_double size 74368 + +The changes do not apply immediately, this can be validated by the 'size_new' +attribute, which represents the pending change in size. For example: + +.. code:: shell + + $devlink resource show pci/0000:03:00.0 + pci/0000:03:00.0: + name kvd size 245760 unit entry size_valid false + resources: + name linear size 98304 size_new 147456 occ 0 unit entry size_min 0 size_max 147456 size_gran 128 + name hash_double size 60416 unit entry size_min 32768 size_max 180224 size_gran 128 + name hash_single size 87040 unit entry size_min 65536 size_max 212992 size_gran 128 + +Note that changes in resource size may require a device reload to properly +take effect. diff --git a/Documentation/networking/devlink/devlink-trap.rst b/Documentation/networking/devlink/devlink-trap.rst new file mode 100644 index 000000000..ef719ceac --- /dev/null +++ b/Documentation/networking/devlink/devlink-trap.rst @@ -0,0 +1,617 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============ +Devlink Trap +============ + +Background +========== + +Devices capable of offloading the kernel's datapath and perform functions such +as bridging and routing must also be able to send specific packets to the +kernel (i.e., the CPU) for processing. + +For example, a device acting as a multicast-aware bridge must be able to send +IGMP membership reports to the kernel for processing by the bridge module. +Without processing such packets, the bridge module could never populate its +MDB. + +As another example, consider a device acting as router which has received an IP +packet with a TTL of 1. Upon routing the packet the device must send it to the +kernel so that it will route it as well and generate an ICMP Time Exceeded +error datagram. Without letting the kernel route such packets itself, utilities +such as ``traceroute`` could never work. + +The fundamental ability of sending certain packets to the kernel for processing +is called "packet trapping". + +Overview +======== + +The ``devlink-trap`` mechanism allows capable device drivers to register their +supported packet traps with ``devlink`` and report trapped packets to +``devlink`` for further analysis. + +Upon receiving trapped packets, ``devlink`` will perform a per-trap packets and +bytes accounting and potentially report the packet to user space via a netlink +event along with all the provided metadata (e.g., trap reason, timestamp, input +port). This is especially useful for drop traps (see :ref:`Trap-Types`) +as it allows users to obtain further visibility into packet drops that would +otherwise be invisible. + +The following diagram provides a general overview of ``devlink-trap``:: + + Netlink event: Packet w/ metadata + Or a summary of recent drops + ^ + | + Userspace | + +---------------------------------------------------+ + Kernel | + | + +-------+--------+ + | | + | drop_monitor | + | | + +-------^--------+ + | + | Non-control traps + | + +----+----+ + | | Kernel's Rx path + | devlink | (non-drop traps) + | | + +----^----+ ^ + | | + +-----------+ + | + +-------+-------+ + | | + | Device driver | + | | + +-------^-------+ + Kernel | + +---------------------------------------------------+ + Hardware | + | Trapped packet + | + +--+---+ + | | + | ASIC | + | | + +------+ + +.. _Trap-Types: + +Trap Types +========== + +The ``devlink-trap`` mechanism supports the following packet trap types: + + * ``drop``: Trapped packets were dropped by the underlying device. Packets + are only processed by ``devlink`` and not injected to the kernel's Rx path. + The trap action (see :ref:`Trap-Actions`) can be changed. + * ``exception``: Trapped packets were not forwarded as intended by the + underlying device due to an exception (e.g., TTL error, missing neighbour + entry) and trapped to the control plane for resolution. Packets are + processed by ``devlink`` and injected to the kernel's Rx path. Changing the + action of such traps is not allowed, as it can easily break the control + plane. + * ``control``: Trapped packets were trapped by the device because these are + control packets required for the correct functioning of the control plane. + For example, ARP request and IGMP query packets. Packets are injected to + the kernel's Rx path, but not reported to the kernel's drop monitor. + Changing the action of such traps is not allowed, as it can easily break + the control plane. + +.. _Trap-Actions: + +Trap Actions +============ + +The ``devlink-trap`` mechanism supports the following packet trap actions: + + * ``trap``: The sole copy of the packet is sent to the CPU. + * ``drop``: The packet is dropped by the underlying device and a copy is not + sent to the CPU. + * ``mirror``: The packet is forwarded by the underlying device and a copy is + sent to the CPU. + +Generic Packet Traps +==================== + +Generic packet traps are used to describe traps that trap well-defined packets +or packets that are trapped due to well-defined conditions (e.g., TTL error). +Such traps can be shared by multiple device drivers and their description must +be added to the following table: + +.. list-table:: List of Generic Packet Traps + :widths: 5 5 90 + + * - Name + - Type + - Description + * - ``source_mac_is_multicast`` + - ``drop`` + - Traps incoming packets that the device decided to drop because of a + multicast source MAC + * - ``vlan_tag_mismatch`` + - ``drop`` + - Traps incoming packets that the device decided to drop in case of VLAN + tag mismatch: The ingress bridge port is not configured with a PVID and + the packet is untagged or prio-tagged + * - ``ingress_vlan_filter`` + - ``drop`` + - Traps incoming packets that the device decided to drop in case they are + tagged with a VLAN that is not configured on the ingress bridge port + * - ``ingress_spanning_tree_filter`` + - ``drop`` + - Traps incoming packets that the device decided to drop in case the STP + state of the ingress bridge port is not "forwarding" + * - ``port_list_is_empty`` + - ``drop`` + - Traps packets that the device decided to drop in case they need to be + flooded (e.g., unknown unicast, unregistered multicast) and there are + no ports the packets should be flooded to + * - ``port_loopback_filter`` + - ``drop`` + - Traps packets that the device decided to drop in case after layer 2 + forwarding the only port from which they should be transmitted through + is the port from which they were received + * - ``blackhole_route`` + - ``drop`` + - Traps packets that the device decided to drop in case they hit a + blackhole route + * - ``ttl_value_is_too_small`` + - ``exception`` + - Traps unicast packets that should be forwarded by the device whose TTL + was decremented to 0 or less + * - ``tail_drop`` + - ``drop`` + - Traps packets that the device decided to drop because they could not be + enqueued to a transmission queue which is full + * - ``non_ip`` + - ``drop`` + - Traps packets that the device decided to drop because they need to + undergo a layer 3 lookup, but are not IP or MPLS packets + * - ``uc_dip_over_mc_dmac`` + - ``drop`` + - Traps packets that the device decided to drop because they need to be + routed and they have a unicast destination IP and a multicast destination + MAC + * - ``dip_is_loopback_address`` + - ``drop`` + - Traps packets that the device decided to drop because they need to be + routed and their destination IP is the loopback address (i.e., 127.0.0.0/8 + and ::1/128) + * - ``sip_is_mc`` + - ``drop`` + - Traps packets that the device decided to drop because they need to be + routed and their source IP is multicast (i.e., 224.0.0.0/8 and ff::/8) + * - ``sip_is_loopback_address`` + - ``drop`` + - Traps packets that the device decided to drop because they need to be + routed and their source IP is the loopback address (i.e., 127.0.0.0/8 and ::1/128) + * - ``ip_header_corrupted`` + - ``drop`` + - Traps packets that the device decided to drop because they need to be + routed and their IP header is corrupted: wrong checksum, wrong IP version + or too short Internet Header Length (IHL) + * - ``ipv4_sip_is_limited_bc`` + - ``drop`` + - Traps packets that the device decided to drop because they need to be + routed and their source IP is limited broadcast (i.e., 255.255.255.255/32) + * - ``ipv6_mc_dip_reserved_scope`` + - ``drop`` + - Traps IPv6 packets that the device decided to drop because they need to + be routed and their IPv6 multicast destination IP has a reserved scope + (i.e., ffx0::/16) + * - ``ipv6_mc_dip_interface_local_scope`` + - ``drop`` + - Traps IPv6 packets that the device decided to drop because they need to + be routed and their IPv6 multicast destination IP has an interface-local scope + (i.e., ffx1::/16) + * - ``mtu_value_is_too_small`` + - ``exception`` + - Traps packets that should have been routed by the device, but were bigger + than the MTU of the egress interface + * - ``unresolved_neigh`` + - ``exception`` + - Traps packets that did not have a matching IP neighbour after routing + * - ``mc_reverse_path_forwarding`` + - ``exception`` + - Traps multicast IP packets that failed reverse-path forwarding (RPF) + check during multicast routing + * - ``reject_route`` + - ``exception`` + - Traps packets that hit reject routes (i.e., "unreachable", "prohibit") + * - ``ipv4_lpm_miss`` + - ``exception`` + - Traps unicast IPv4 packets that did not match any route + * - ``ipv6_lpm_miss`` + - ``exception`` + - Traps unicast IPv6 packets that did not match any route + * - ``non_routable_packet`` + - ``drop`` + - Traps packets that the device decided to drop because they are not + supposed to be routed. For example, IGMP queries can be flooded by the + device in layer 2 and reach the router. Such packets should not be + routed and instead dropped + * - ``decap_error`` + - ``exception`` + - Traps NVE and IPinIP packets that the device decided to drop because of + failure during decapsulation (e.g., packet being too short, reserved + bits set in VXLAN header) + * - ``overlay_smac_is_mc`` + - ``drop`` + - Traps NVE packets that the device decided to drop because their overlay + source MAC is multicast + * - ``ingress_flow_action_drop`` + - ``drop`` + - Traps packets dropped during processing of ingress flow action drop + * - ``egress_flow_action_drop`` + - ``drop`` + - Traps packets dropped during processing of egress flow action drop + * - ``stp`` + - ``control`` + - Traps STP packets + * - ``lacp`` + - ``control`` + - Traps LACP packets + * - ``lldp`` + - ``control`` + - Traps LLDP packets + * - ``igmp_query`` + - ``control`` + - Traps IGMP Membership Query packets + * - ``igmp_v1_report`` + - ``control`` + - Traps IGMP Version 1 Membership Report packets + * - ``igmp_v2_report`` + - ``control`` + - Traps IGMP Version 2 Membership Report packets + * - ``igmp_v3_report`` + - ``control`` + - Traps IGMP Version 3 Membership Report packets + * - ``igmp_v2_leave`` + - ``control`` + - Traps IGMP Version 2 Leave Group packets + * - ``mld_query`` + - ``control`` + - Traps MLD Multicast Listener Query packets + * - ``mld_v1_report`` + - ``control`` + - Traps MLD Version 1 Multicast Listener Report packets + * - ``mld_v2_report`` + - ``control`` + - Traps MLD Version 2 Multicast Listener Report packets + * - ``mld_v1_done`` + - ``control`` + - Traps MLD Version 1 Multicast Listener Done packets + * - ``ipv4_dhcp`` + - ``control`` + - Traps IPv4 DHCP packets + * - ``ipv6_dhcp`` + - ``control`` + - Traps IPv6 DHCP packets + * - ``arp_request`` + - ``control`` + - Traps ARP request packets + * - ``arp_response`` + - ``control`` + - Traps ARP response packets + * - ``arp_overlay`` + - ``control`` + - Traps NVE-decapsulated ARP packets that reached the overlay network. + This is required, for example, when the address that needs to be + resolved is a local address + * - ``ipv6_neigh_solicit`` + - ``control`` + - Traps IPv6 Neighbour Solicitation packets + * - ``ipv6_neigh_advert`` + - ``control`` + - Traps IPv6 Neighbour Advertisement packets + * - ``ipv4_bfd`` + - ``control`` + - Traps IPv4 BFD packets + * - ``ipv6_bfd`` + - ``control`` + - Traps IPv6 BFD packets + * - ``ipv4_ospf`` + - ``control`` + - Traps IPv4 OSPF packets + * - ``ipv6_ospf`` + - ``control`` + - Traps IPv6 OSPF packets + * - ``ipv4_bgp`` + - ``control`` + - Traps IPv4 BGP packets + * - ``ipv6_bgp`` + - ``control`` + - Traps IPv6 BGP packets + * - ``ipv4_vrrp`` + - ``control`` + - Traps IPv4 VRRP packets + * - ``ipv6_vrrp`` + - ``control`` + - Traps IPv6 VRRP packets + * - ``ipv4_pim`` + - ``control`` + - Traps IPv4 PIM packets + * - ``ipv6_pim`` + - ``control`` + - Traps IPv6 PIM packets + * - ``uc_loopback`` + - ``control`` + - Traps unicast packets that need to be routed through the same layer 3 + interface from which they were received. Such packets are routed by the + kernel, but also cause it to potentially generate ICMP redirect packets + * - ``local_route`` + - ``control`` + - Traps unicast packets that hit a local route and need to be locally + delivered + * - ``external_route`` + - ``control`` + - Traps packets that should be routed through an external interface (e.g., + management interface) that does not belong to the same device (e.g., + switch ASIC) as the ingress interface + * - ``ipv6_uc_dip_link_local_scope`` + - ``control`` + - Traps unicast IPv6 packets that need to be routed and have a destination + IP address with a link-local scope (i.e., fe80::/10). The trap allows + device drivers to avoid programming link-local routes, but still receive + packets for local delivery + * - ``ipv6_dip_all_nodes`` + - ``control`` + - Traps IPv6 packets that their destination IP address is the "All Nodes + Address" (i.e., ff02::1) + * - ``ipv6_dip_all_routers`` + - ``control`` + - Traps IPv6 packets that their destination IP address is the "All Routers + Address" (i.e., ff02::2) + * - ``ipv6_router_solicit`` + - ``control`` + - Traps IPv6 Router Solicitation packets + * - ``ipv6_router_advert`` + - ``control`` + - Traps IPv6 Router Advertisement packets + * - ``ipv6_redirect`` + - ``control`` + - Traps IPv6 Redirect Message packets + * - ``ipv4_router_alert`` + - ``control`` + - Traps IPv4 packets that need to be routed and include the Router Alert + option. Such packets need to be locally delivered to raw sockets that + have the IP_ROUTER_ALERT socket option set + * - ``ipv6_router_alert`` + - ``control`` + - Traps IPv6 packets that need to be routed and include the Router Alert + option in their Hop-by-Hop extension header. Such packets need to be + locally delivered to raw sockets that have the IPV6_ROUTER_ALERT socket + option set + * - ``ptp_event`` + - ``control`` + - Traps PTP time-critical event messages (Sync, Delay_req, Pdelay_Req and + Pdelay_Resp) + * - ``ptp_general`` + - ``control`` + - Traps PTP general messages (Announce, Follow_Up, Delay_Resp, + Pdelay_Resp_Follow_Up, management and signaling) + * - ``flow_action_sample`` + - ``control`` + - Traps packets sampled during processing of flow action sample (e.g., via + tc's sample action) + * - ``flow_action_trap`` + - ``control`` + - Traps packets logged during processing of flow action trap (e.g., via + tc's trap action) + * - ``early_drop`` + - ``drop`` + - Traps packets dropped due to the RED (Random Early Detection) algorithm + (i.e., early drops) + * - ``vxlan_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the VXLAN header parsing which + might be because of packet truncation or the I flag is not set. + * - ``llc_snap_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the LLC+SNAP header parsing + * - ``vlan_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the VLAN header parsing. Could + include unexpected packet truncation. + * - ``pppoe_ppp_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the PPPoE+PPP header parsing. + This could include finding a session ID of 0xFFFF (which is reserved and + not for use), a PPPoE length which is larger than the frame received or + any common error on this type of header + * - ``mpls_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the MPLS header parsing which + could include unexpected header truncation + * - ``arp_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the ARP header parsing + * - ``ip_1_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the first IP header parsing. + This packet trap could include packets which do not pass an IP checksum + check, a header length check (a minimum of 20 bytes), which might suffer + from packet truncation thus the total length field exceeds the received + packet length etc + * - ``ip_n_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the parsing of the last IP + header (the inner one in case of an IP over IP tunnel). The same common + error checking is performed here as for the ip_1_parsing trap + * - ``gre_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the GRE header parsing + * - ``udp_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the UDP header parsing. + This packet trap could include checksum errorrs, an improper UDP + length detected (smaller than 8 bytes) or detection of header + truncation. + * - ``tcp_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the TCP header parsing. + This could include TCP checksum errors, improper combination of SYN, FIN + and/or RESET etc. + * - ``ipsec_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the IPSEC header parsing + * - ``sctp_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the SCTP header parsing. + This would mean that port number 0 was used or that the header is + truncated. + * - ``dccp_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the DCCP header parsing + * - ``gtp_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the GTP header parsing + * - ``esp_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the ESP header parsing + +Driver-specific Packet Traps +============================ + +Device drivers can register driver-specific packet traps, but these must be +clearly documented. Such traps can correspond to device-specific exceptions and +help debug packet drops caused by these exceptions. The following list includes +links to the description of driver-specific traps registered by various device +drivers: + + * :doc:`netdevsim` + * :doc:`mlxsw` + +.. _Generic-Packet-Trap-Groups: + +Generic Packet Trap Groups +========================== + +Generic packet trap groups are used to aggregate logically related packet +traps. These groups allow the user to batch operations such as setting the trap +action of all member traps. In addition, ``devlink-trap`` can report aggregated +per-group packets and bytes statistics, in case per-trap statistics are too +narrow. The description of these groups must be added to the following table: + +.. list-table:: List of Generic Packet Trap Groups + :widths: 10 90 + + * - Name + - Description + * - ``l2_drops`` + - Contains packet traps for packets that were dropped by the device during + layer 2 forwarding (i.e., bridge) + * - ``l3_drops`` + - Contains packet traps for packets that were dropped by the device during + layer 3 forwarding + * - ``l3_exceptions`` + - Contains packet traps for packets that hit an exception (e.g., TTL + error) during layer 3 forwarding + * - ``buffer_drops`` + - Contains packet traps for packets that were dropped by the device due to + an enqueue decision + * - ``tunnel_drops`` + - Contains packet traps for packets that were dropped by the device during + tunnel encapsulation / decapsulation + * - ``acl_drops`` + - Contains packet traps for packets that were dropped by the device during + ACL processing + * - ``stp`` + - Contains packet traps for STP packets + * - ``lacp`` + - Contains packet traps for LACP packets + * - ``lldp`` + - Contains packet traps for LLDP packets + * - ``mc_snooping`` + - Contains packet traps for IGMP and MLD packets required for multicast + snooping + * - ``dhcp`` + - Contains packet traps for DHCP packets + * - ``neigh_discovery`` + - Contains packet traps for neighbour discovery packets (e.g., ARP, IPv6 + ND) + * - ``bfd`` + - Contains packet traps for BFD packets + * - ``ospf`` + - Contains packet traps for OSPF packets + * - ``bgp`` + - Contains packet traps for BGP packets + * - ``vrrp`` + - Contains packet traps for VRRP packets + * - ``pim`` + - Contains packet traps for PIM packets + * - ``uc_loopback`` + - Contains a packet trap for unicast loopback packets (i.e., + ``uc_loopback``). This trap is singled-out because in cases such as + one-armed router it will be constantly triggered. To limit the impact on + the CPU usage, a packet trap policer with a low rate can be bound to the + group without affecting other traps + * - ``local_delivery`` + - Contains packet traps for packets that should be locally delivered after + routing, but do not match more specific packet traps (e.g., + ``ipv4_bgp``) + * - ``external_delivery`` + - Contains packet traps for packets that should be routed through an + external interface (e.g., management interface) that does not belong to + the same device (e.g., switch ASIC) as the ingress interface + * - ``ipv6`` + - Contains packet traps for various IPv6 control packets (e.g., Router + Advertisements) + * - ``ptp_event`` + - Contains packet traps for PTP time-critical event messages (Sync, + Delay_req, Pdelay_Req and Pdelay_Resp) + * - ``ptp_general`` + - Contains packet traps for PTP general messages (Announce, Follow_Up, + Delay_Resp, Pdelay_Resp_Follow_Up, management and signaling) + * - ``acl_sample`` + - Contains packet traps for packets that were sampled by the device during + ACL processing + * - ``acl_trap`` + - Contains packet traps for packets that were trapped (logged) by the + device during ACL processing + * - ``parser_error_drops`` + - Contains packet traps for packets that were marked by the device during + parsing as erroneous + +Packet Trap Policers +==================== + +As previously explained, the underlying device can trap certain packets to the +CPU for processing. In most cases, the underlying device is capable of handling +packet rates that are several orders of magnitude higher compared to those that +can be handled by the CPU. + +Therefore, in order to prevent the underlying device from overwhelming the CPU, +devices usually include packet trap policers that are able to police the +trapped packets to rates that can be handled by the CPU. + +The ``devlink-trap`` mechanism allows capable device drivers to register their +supported packet trap policers with ``devlink``. The device driver can choose +to associate these policers with supported packet trap groups (see +:ref:`Generic-Packet-Trap-Groups`) during its initialization, thereby exposing +its default control plane policy to user space. + +Device drivers should allow user space to change the parameters of the policers +(e.g., rate, burst size) as well as the association between the policers and +trap groups by implementing the relevant callbacks. + +If possible, device drivers should implement a callback that allows user space +to retrieve the number of packets that were dropped by the policer because its +configured policy was violated. + +Testing +======= + +See ``tools/testing/selftests/drivers/net/netdevsim/devlink_trap.sh`` for a +test covering the core infrastructure. Test cases should be added for any new +functionality. + +Device drivers should focus their tests on device-specific functionality, such +as the triggering of supported packet traps. diff --git a/Documentation/networking/devlink/ice.rst b/Documentation/networking/devlink/ice.rst new file mode 100644 index 000000000..a432dc419 --- /dev/null +++ b/Documentation/networking/devlink/ice.rst @@ -0,0 +1,195 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=================== +ice devlink support +=================== + +This document describes the devlink features implemented by the ``ice`` +device driver. + +Info versions +============= + +The ``ice`` driver reports the following versions + +.. list-table:: devlink info versions implemented + :widths: 5 5 5 90 + + * - Name + - Type + - Example + - Description + * - ``board.id`` + - fixed + - K65390-000 + - The Product Board Assembly (PBA) identifier of the board. + * - ``fw.mgmt`` + - running + - 2.1.7 + - 3-digit version number of the management firmware that controls the + PHY, link, etc. + * - ``fw.mgmt.api`` + - running + - 1.5 + - 2-digit version number of the API exported over the AdminQ by the + management firmware. Used by the driver to identify what commands + are supported. + * - ``fw.mgmt.build`` + - running + - 0x305d955f + - Unique identifier of the source for the management firmware. + * - ``fw.undi`` + - running + - 1.2581.0 + - Version of the Option ROM containing the UEFI driver. The version is + reported in ``major.minor.patch`` format. The major version is + incremented whenever a major breaking change occurs, or when the + minor version would overflow. The minor version is incremented for + non-breaking changes and reset to 1 when the major version is + incremented. The patch version is normally 0 but is incremented when + a fix is delivered as a patch against an older base Option ROM. + * - ``fw.psid.api`` + - running + - 0.80 + - Version defining the format of the flash contents. + * - ``fw.bundle_id`` + - running + - 0x80002ec0 + - Unique identifier of the firmware image file that was loaded onto + the device. Also referred to as the EETRACK identifier of the NVM. + * - ``fw.app.name`` + - running + - ICE OS Default Package + - The name of the DDP package that is active in the device. The DDP + package is loaded by the driver during initialization. Each + variation of the DDP package has a unique name. + * - ``fw.app`` + - running + - 1.3.1.0 + - The version of the DDP package that is active in the device. Note + that both the name (as reported by ``fw.app.name``) and version are + required to uniquely identify the package. + * - ``fw.app.bundle_id`` + - running + - 0xc0000001 + - Unique identifier for the DDP package loaded in the device. Also + referred to as the DDP Track ID. Can be used to uniquely identify + the specific DDP package. + * - ``fw.netlist`` + - running + - 1.1.2000-6.7.0 + - The version of the netlist module. This module defines the device's + Ethernet capabilities and default settings, and is used by the + management firmware as part of managing link and device + connectivity. + * - ``fw.netlist.build`` + - running + - 0xee16ced7 + - The first 4 bytes of the hash of the netlist module contents. + +Flash Update +============ + +The ``ice`` driver implements support for flash update using the +``devlink-flash`` interface. It supports updating the device flash using a +combined flash image that contains the ``fw.mgmt``, ``fw.undi``, and +``fw.netlist`` components. + +.. list-table:: List of supported overwrite modes + :widths: 5 95 + + * - Bits + - Behavior + * - ``DEVLINK_FLASH_OVERWRITE_SETTINGS`` + - Do not preserve settings stored in the flash components being + updated. This includes overwriting the port configuration that + determines the number of physical functions the device will + initialize with. + * - ``DEVLINK_FLASH_OVERWRITE_SETTINGS`` and ``DEVLINK_FLASH_OVERWRITE_IDENTIFIERS`` + - Do not preserve either settings or identifiers. Overwrite everything + in the flash with the contents from the provided image, without + performing any preservation. This includes overwriting device + identifying fields such as the MAC address, VPD area, and device + serial number. It is expected that this combination be used with an + image customized for the specific device. + +The ice hardware does not support overwriting only identifiers while +preserving settings, and thus ``DEVLINK_FLASH_OVERWRITE_IDENTIFIERS`` on its +own will be rejected. If no overwrite mask is provided, the firmware will be +instructed to preserve all settings and identifying fields when updating. + +Regions +======= + +The ``ice`` driver implements the following regions for accessing internal +device data. + +.. list-table:: regions implemented + :widths: 15 85 + + * - Name + - Description + * - ``nvm-flash`` + - The contents of the entire flash chip, sometimes referred to as + the device's Non Volatile Memory. + * - ``device-caps`` + - The contents of the device firmware's capabilities buffer. Useful to + determine the current state and configuration of the device. + +Users can request an immediate capture of a snapshot via the +``DEVLINK_CMD_REGION_NEW`` + +.. code:: shell + + $ devlink region new pci/0000:01:00.0/nvm-flash snapshot 1 + $ devlink region dump pci/0000:01:00.0/nvm-flash snapshot 1 + + $ devlink region dump pci/0000:01:00.0/nvm-flash snapshot 1 + 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30 + 0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8 + 0000000000000020 0016 0bb8 0016 1720 0000 0000 c00f 3ffc + 0000000000000030 bada cce5 bada cce5 bada cce5 bada cce5 + + $ devlink region read pci/0000:01:00.0/nvm-flash snapshot 1 address 0 length 16 + 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30 + + $ devlink region delete pci/0000:01:00.0/nvm-flash snapshot 1 + + $ devlink region new pci/0000:01:00.0/device-caps snapshot 1 + $ devlink region dump pci/0000:01:00.0/device-caps snapshot 1 + 0000000000000000 01 00 01 00 00 00 00 00 01 00 00 00 00 00 00 00 + 0000000000000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 0000000000000020 02 00 02 01 32 03 00 00 0a 00 00 00 25 00 00 00 + 0000000000000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 0000000000000040 04 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 + 0000000000000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 0000000000000060 05 00 01 00 03 00 00 00 00 00 00 00 00 00 00 00 + 0000000000000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 0000000000000080 06 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 + 0000000000000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 00000000000000a0 08 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 + 00000000000000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 00000000000000c0 12 00 01 00 01 00 00 00 01 00 01 00 00 00 00 00 + 00000000000000d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 00000000000000e0 13 00 01 00 00 01 00 00 00 00 00 00 00 00 00 00 + 00000000000000f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 0000000000000100 14 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 + 0000000000000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 0000000000000120 15 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 + 0000000000000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 0000000000000140 16 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 + 0000000000000150 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 0000000000000160 17 00 01 00 06 00 00 00 00 00 00 00 00 00 00 00 + 0000000000000170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 0000000000000180 18 00 01 00 01 00 00 00 01 00 00 00 08 00 00 00 + 0000000000000190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 00000000000001a0 22 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 + 00000000000001b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 00000000000001c0 40 00 01 00 00 08 00 00 08 00 00 00 00 00 00 00 + 00000000000001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 00000000000001e0 41 00 01 00 00 08 00 00 00 00 00 00 00 00 00 00 + 00000000000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 0000000000000200 42 00 01 00 00 08 00 00 00 00 00 00 00 00 00 00 + 0000000000000210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + + $ devlink region delete pci/0000:01:00.0/device-caps snapshot 1 diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst new file mode 100644 index 000000000..d82874760 --- /dev/null +++ b/Documentation/networking/devlink/index.rst @@ -0,0 +1,46 @@ +Linux Devlink Documentation +=========================== + +devlink is an API to expose device information and resources not directly +related to any device class, such as chip-wide/switch-ASIC-wide configuration. + +Interface documentation +----------------------- + +The following pages describe various interfaces available through devlink in +general. + +.. toctree:: + :maxdepth: 1 + + devlink-dpipe + devlink-health + devlink-info + devlink-flash + devlink-params + devlink-region + devlink-resource + devlink-reload + devlink-trap + +Driver-specific documentation +----------------------------- + +Each driver that implements ``devlink`` is expected to document what +parameters, info versions, and other features it supports. + +.. toctree:: + :maxdepth: 1 + + bnxt + ionic + ice + mlx4 + mlx5 + mlxsw + mv88e6xxx + netdevsim + nfp + sja1105 + qed + ti-cpsw-switch diff --git a/Documentation/networking/devlink/ionic.rst b/Documentation/networking/devlink/ionic.rst new file mode 100644 index 000000000..48da9c92d --- /dev/null +++ b/Documentation/networking/devlink/ionic.rst @@ -0,0 +1,29 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================== +ionic devlink support +===================== + +This document describes the devlink features implemented by the ``ionic`` +device driver. + +Info versions +============= + +The ``ionic`` driver reports the following versions + +.. list-table:: devlink info versions implemented + :widths: 5 5 90 + + * - Name + - Type + - Description + * - ``fw`` + - running + - Version of firmware running on the device + * - ``asic.id`` + - fixed + - The ASIC type for this device + * - ``asic.rev`` + - fixed + - The revision of the ASIC for this device diff --git a/Documentation/networking/devlink/mlx4.rst b/Documentation/networking/devlink/mlx4.rst new file mode 100644 index 000000000..7b2d17ea5 --- /dev/null +++ b/Documentation/networking/devlink/mlx4.rst @@ -0,0 +1,56 @@ +.. SPDX-License-Identifier: GPL-2.0 + +==================== +mlx4 devlink support +==================== + +This document describes the devlink features implemented by the ``mlx4`` +device driver. + +Parameters +========== + +.. list-table:: Generic parameters implemented + + * - Name + - Mode + * - ``internal_err_reset`` + - driverinit, runtime + * - ``max_macs`` + - driverinit + * - ``region_snapshot_enable`` + - driverinit, runtime + +The ``mlx4`` driver also implements the following driver-specific +parameters. + +.. list-table:: Driver-specific parameters implemented + :widths: 5 5 5 85 + + * - Name + - Type + - Mode + - Description + * - ``enable_64b_cqe_eqe`` + - Boolean + - driverinit + - Enable 64 byte CQEs/EQEs, if the FW supports it. + * - ``enable_4k_uar`` + - Boolean + - driverinit + - Enable using the 4k UAR. + +The ``mlx4`` driver supports reloading via ``DEVLINK_CMD_RELOAD`` + +Regions +======= + +The ``mlx4`` driver supports dumping the firmware PCI crspace and health +buffer during a critical firmware issue. + +In case a firmware command times out, firmware getting stuck, or a non zero +value on the catastrophic buffer, a snapshot will be taken by the driver. + +The ``cr-space`` region will contain the firmware PCI crspace contents. The +``fw-health`` region will contain the device firmware's health buffer. +Snapshots for both of these regions are taken on the same event triggers. diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst new file mode 100644 index 000000000..4e4b97f79 --- /dev/null +++ b/Documentation/networking/devlink/mlx5.rst @@ -0,0 +1,65 @@ +.. SPDX-License-Identifier: GPL-2.0 + +==================== +mlx5 devlink support +==================== + +This document describes the devlink features implemented by the ``mlx5`` +device driver. + +Parameters +========== + +.. list-table:: Generic parameters implemented + + * - Name + - Mode + * - ``enable_roce`` + - driverinit + +The ``mlx5`` driver also implements the following driver-specific +parameters. + +.. list-table:: Driver-specific parameters implemented + :widths: 5 5 5 85 + + * - Name + - Type + - Mode + - Description + * - ``flow_steering_mode`` + - string + - runtime + - Controls the flow steering mode of the driver + + * ``dmfs`` Device managed flow steering. In DMFS mode, the HW + steering entities are created and managed through firmware. + * ``smfs`` Software managed flow steering. In SMFS mode, the HW + steering entities are created and manage through the driver without + firmware intervention. + * - ``fdb_large_groups`` + - u32 + - driverinit + - Control the number of large groups (size > 1) in the FDB table. + + * The default value is 15, and the range is between 1 and 1024. + +The ``mlx5`` driver supports reloading via ``DEVLINK_CMD_RELOAD`` + +Info versions +============= + +The ``mlx5`` driver reports the following versions + +.. list-table:: devlink info versions implemented + :widths: 5 5 90 + + * - Name + - Type + - Description + * - ``fw.psid`` + - fixed + - Used to represent the board id of the device. + * - ``fw.version`` + - stored, running + - Three digit major.minor.subminor firmware version number. diff --git a/Documentation/networking/devlink/mlxsw.rst b/Documentation/networking/devlink/mlxsw.rst new file mode 100644 index 000000000..cf857cb4b --- /dev/null +++ b/Documentation/networking/devlink/mlxsw.rst @@ -0,0 +1,81 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================== +mlxsw devlink support +===================== + +This document describes the devlink features implemented by the ``mlxsw`` +device driver. + +Parameters +========== + +.. list-table:: Generic parameters implemented + + * - Name + - Mode + * - ``fw_load_policy`` + - driverinit + +The ``mlxsw`` driver also implements the following driver-specific +parameters. + +.. list-table:: Driver-specific parameters implemented + :widths: 5 5 5 85 + + * - Name + - Type + - Mode + - Description + * - ``acl_region_rehash_interval`` + - u32 + - runtime + - Sets an interval for periodic ACL region rehashes. The value is + specified in milliseconds, with a minimum of ``3000``. The value of + ``0`` disables periodic work entirely. The first rehash will be run + immediately after the value is set. + +The ``mlxsw`` driver supports reloading via ``DEVLINK_CMD_RELOAD`` + +Info versions +============= + +The ``mlxsw`` driver reports the following versions + +.. list-table:: devlink info versions implemented + :widths: 5 5 90 + + * - Name + - Type + - Description + * - ``hw.revision`` + - fixed + - The hardware revision for this board + * - ``fw.psid`` + - fixed + - Firmware PSID + * - ``fw.version`` + - running + - Three digit firmware version + +Driver-specific Traps +===================== + +.. list-table:: List of Driver-specific Traps Registered by ``mlxsw`` + :widths: 5 5 90 + + * - Name + - Type + - Description + * - ``irif_disabled`` + - ``drop`` + - Traps packets that the device decided to drop because they need to be + routed from a disabled router interface (RIF). This can happen during + RIF dismantle, when the RIF is first disabled before being removed + completely + * - ``erif_disabled`` + - ``drop`` + - Traps packets that the device decided to drop because they need to be + routed through a disabled router interface (RIF). This can happen during + RIF dismantle, when the RIF is first disabled before being removed + completely diff --git a/Documentation/networking/devlink/mv88e6xxx.rst b/Documentation/networking/devlink/mv88e6xxx.rst new file mode 100644 index 000000000..c621212a4 --- /dev/null +++ b/Documentation/networking/devlink/mv88e6xxx.rst @@ -0,0 +1,28 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================= +mv88e6xxx devlink support +========================= + +This document describes the devlink features implemented by the ``mv88e6xxx`` +device driver. + +Parameters +========== + +The ``mv88e6xxx`` driver implements the following driver-specific parameters. + +.. list-table:: Driver-specific parameters implemented + :widths: 5 5 5 85 + + * - Name + - Type + - Mode + - Description + * - ``ATU_hash`` + - u8 + - runtime + - Select one of four possible hashing algorithms for MAC addresses in + the Address Translation Unit. A value of 3 may work better than the + default of 1 when many MAC addresses have the same OUI. Only the + values 0 to 3 are valid for this parameter. diff --git a/Documentation/networking/devlink/netdevsim.rst b/Documentation/networking/devlink/netdevsim.rst new file mode 100644 index 000000000..2a266b7e7 --- /dev/null +++ b/Documentation/networking/devlink/netdevsim.rst @@ -0,0 +1,72 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================= +netdevsim devlink support +========================= + +This document describes the ``devlink`` features supported by the +``netdevsim`` device driver. + +Parameters +========== + +.. list-table:: Generic parameters implemented + + * - Name + - Mode + * - ``max_macs`` + - driverinit + +The ``netdevsim`` driver also implements the following driver-specific +parameters. + +.. list-table:: Driver-specific parameters implemented + :widths: 5 5 5 85 + + * - Name + - Type + - Mode + - Description + * - ``test1`` + - Boolean + - driverinit + - Test parameter used to show how a driver-specific devlink parameter + can be implemented. + +The ``netdevsim`` driver supports reloading via ``DEVLINK_CMD_RELOAD`` + +Regions +======= + +The ``netdevsim`` driver exposes a ``dummy`` region as an example of how the +devlink-region interfaces work. A snapshot is taken whenever the +``take_snapshot`` debugfs file is written to. + +Resources +========= + +The ``netdevsim`` driver exposes resources to control the number of FIB +entries and FIB rule entries that the driver will allow. + +.. code:: shell + + $ devlink resource set netdevsim/netdevsim0 path /IPv4/fib size 96 + $ devlink resource set netdevsim/netdevsim0 path /IPv4/fib-rules size 16 + $ devlink resource set netdevsim/netdevsim0 path /IPv6/fib size 64 + $ devlink resource set netdevsim/netdevsim0 path /IPv6/fib-rules size 16 + $ devlink dev reload netdevsim/netdevsim0 + +Driver-specific Traps +===================== + +.. list-table:: List of Driver-specific Traps Registered by ``netdevsim`` + :widths: 5 5 90 + + * - Name + - Type + - Description + * - ``fid_miss`` + - ``exception`` + - When a packet enters the device it is classified to a filtering + indentifier (FID) based on the ingress port and VLAN. This trap is used + to trap packets for which a FID could not be found diff --git a/Documentation/networking/devlink/nfp.rst b/Documentation/networking/devlink/nfp.rst new file mode 100644 index 000000000..a1717db0d --- /dev/null +++ b/Documentation/networking/devlink/nfp.rst @@ -0,0 +1,65 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=================== +nfp devlink support +=================== + +This document describes the devlink features implemented by the ``nfp`` +device driver. + +Parameters +========== + +.. list-table:: Generic parameters implemented + + * - Name + - Mode + * - ``fw_load_policy`` + - permanent + * - ``reset_dev_on_drv_probe`` + - permanent + +Info versions +============= + +The ``nfp`` driver reports the following versions + +.. list-table:: devlink info versions implemented + :widths: 5 5 90 + + * - Name + - Type + - Description + * - ``board.id`` + - fixed + - Part number identifying the board design + * - ``board.rev`` + - fixed + - Revision of the board design + * - ``board.manufacture`` + - fixed + - Vendor of the board design + * - ``board.model`` + - fixed + - Model name of the board design + * - ``fw.bundle_id`` + - stored, running + - Firmware bundle id + * - ``fw.mgmt`` + - stored, running + - Version of the management firmware + * - ``fw.cpld`` + - stored, running + - The CPLD firmware component version + * - ``fw.app`` + - stored, running + - The APP firmware component version + * - ``fw.undi`` + - stored, running + - The UNDI firmware component version + * - ``fw.ncsi`` + - stored, running + - The NSCI firmware component version + * - ``chip.init`` + - stored, running + - The CFGR firmware component version diff --git a/Documentation/networking/devlink/qed.rst b/Documentation/networking/devlink/qed.rst new file mode 100644 index 000000000..805c6f636 --- /dev/null +++ b/Documentation/networking/devlink/qed.rst @@ -0,0 +1,26 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=================== +qed devlink support +=================== + +This document describes the devlink features implemented by the ``qed`` core +device driver. + +Parameters +========== + +The ``qed`` driver implements the following driver-specific parameters. + +.. list-table:: Driver-specific parameters implemented + :widths: 5 5 5 85 + + * - Name + - Type + - Mode + - Description + * - ``iwarp_cmt`` + - Boolean + - runtime + - Enable iWARP functionality for 100g devices. Note that this impacts + L2 performance, and is therefore not enabled by default. diff --git a/Documentation/networking/devlink/sja1105.rst b/Documentation/networking/devlink/sja1105.rst new file mode 100644 index 000000000..e2679c274 --- /dev/null +++ b/Documentation/networking/devlink/sja1105.rst @@ -0,0 +1,49 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======================= +sja1105 devlink support +======================= + +This document describes the devlink features implemented +by the ``sja1105`` device driver. + +Parameters +========== + +.. list-table:: Driver-specific parameters implemented + :widths: 5 5 5 85 + + * - Name + - Type + - Mode + - Description + * - ``best_effort_vlan_filtering`` + - Boolean + - runtime + - Allow plain ETH_P_8021Q headers to be used as DSA tags. + + Benefits: + + - Can terminate untagged traffic over switch net + devices even when enslaved to a bridge with + vlan_filtering=1. + - Can terminate VLAN-tagged traffic over switch net + devices even when enslaved to a bridge with + vlan_filtering=1, with some constraints (no more than + 7 non-pvid VLANs per user port). + - Can do QoS based on VLAN PCP and VLAN membership + admission control for autonomously forwarded frames + (regardless of whether they can be terminated on the + CPU or not). + + Drawbacks: + + - User cannot use VLANs in range 1024-3071. If the + switch receives frames with such VIDs, it will + misinterpret them as DSA tags. + - Switch uses Shared VLAN Learning (FDB lookup uses + only DMAC as key). + - When VLANs span cross-chip topologies, the total + number of permitted VLANs may be less than 7 per + port, due to a maximum number of 32 VLAN retagging + rules per switch. diff --git a/Documentation/networking/devlink/ti-cpsw-switch.rst b/Documentation/networking/devlink/ti-cpsw-switch.rst new file mode 100644 index 000000000..dc399e32a --- /dev/null +++ b/Documentation/networking/devlink/ti-cpsw-switch.rst @@ -0,0 +1,31 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================== +ti-cpsw-switch devlink support +============================== + +This document describes the devlink features implemented by the ``ti-cpsw-switch`` +device driver. + +Parameters +========== + +The ``ti-cpsw-switch`` driver implements the following driver-specific +parameters. + +.. list-table:: Driver-specific parameters implemented + :widths: 5 5 5 85 + + * - Name + - Type + - Mode + - Description + * - ``ale_bypass`` + - Boolean + - runtime + - Enables ALE_CONTROL(4).BYPASS mode for debugging purposes. In this + mode, all packets will be sent to the host port only. + * - ``switch_mode`` + - Boolean + - runtime + - Enable switch mode |