diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-21 11:54:28 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-21 11:54:28 +0000 |
commit | e6918187568dbd01842d8d1d2c808ce16a894239 (patch) | |
tree | 64f88b554b444a49f656b6c656111a145cbbaa28 /src/spdk/dpdk/doc/guides/nics/i40e.rst | |
parent | Initial commit. (diff) | |
download | ceph-e6918187568dbd01842d8d1d2c808ce16a894239.tar.xz ceph-e6918187568dbd01842d8d1d2c808ce16a894239.zip |
Adding upstream version 18.2.2.upstream/18.2.2
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'src/spdk/dpdk/doc/guides/nics/i40e.rst')
-rw-r--r-- | src/spdk/dpdk/doc/guides/nics/i40e.rst | 821 |
1 files changed, 821 insertions, 0 deletions
diff --git a/src/spdk/dpdk/doc/guides/nics/i40e.rst b/src/spdk/dpdk/doc/guides/nics/i40e.rst new file mode 100644 index 000000000..00c3042d5 --- /dev/null +++ b/src/spdk/dpdk/doc/guides/nics/i40e.rst @@ -0,0 +1,821 @@ +.. SPDX-License-Identifier: BSD-3-Clause + Copyright(c) 2016 Intel Corporation. + +I40E Poll Mode Driver +====================== + +The i40e PMD (librte_pmd_i40e) provides poll mode driver support for +10/25/40 Gbps Intel® Ethernet 700 Series Network Adapters based on +the Intel Ethernet Controller X710/XL710/XXV710 and Intel Ethernet +Connection X722 (only support part of features). + + +Features +-------- + +Features of the i40e PMD are: + +- Multiple queues for TX and RX +- Receiver Side Scaling (RSS) +- MAC/VLAN filtering +- Packet type information +- Flow director +- Cloud filter +- Checksum offload +- VLAN/QinQ stripping and inserting +- TSO offload +- Promiscuous mode +- Multicast mode +- Port hardware statistics +- Jumbo frames +- Link state information +- Link flow control +- Mirror on port, VLAN and VSI +- Interrupt mode for RX +- Scattered and gather for TX and RX +- Vector Poll mode driver +- DCB +- VMDQ +- SR-IOV VF +- Hot plug +- IEEE1588/802.1AS timestamping +- VF Daemon (VFD) - EXPERIMENTAL +- Dynamic Device Personalization (DDP) +- Queue region configuration +- Virtual Function Port Representors +- Malicious Device Drive event catch and notify +- Generic flow API + +Prerequisites +------------- + +- Identifying your adapter using `Intel Support + <http://www.intel.com/support>`_ and get the latest NVM/FW images. + +- Follow the DPDK :ref:`Getting Started Guide for Linux <linux_gsg>` to setup the basic DPDK environment. + +- To get better performance on Intel platforms, please follow the "How to get best performance with NICs on Intel platforms" + section of the :ref:`Getting Started Guide for Linux <linux_gsg>`. + +- Upgrade the NVM/FW version following the `Intel® Ethernet NVM Update Tool Quick Usage Guide for Linux + <https://www-ssl.intel.com/content/www/us/en/embedded/products/networking/nvm-update-tool-quick-linux-usage-guide.html>`_ and `Intel® Ethernet NVM Update Tool: Quick Usage Guide for EFI <https://www.intel.com/content/www/us/en/embedded/products/networking/nvm-update-tool-quick-efi-usage-guide.html>`_ if needed. + +- For information about supported media, please refer to this document: `Intel® Ethernet Controller X710/XXV710/XL710 Feature Support Matrix + <http://www.intel.com/content/dam/www/public/us/en/documents/release-notes/xl710-ethernet-controller-feature-matrix.pdf>`_. + + .. Note:: + + * Some adapters based on the Intel(R) Ethernet Controller 700 Series only + support Intel Ethernet Optics modules. On these adapters, other modules are not + supported and will not function. + + * For connections based on Intel(R) Ethernet Controller 700 Series, + support is dependent on your system board. Please see your vendor for details. + + * In all cases Intel recommends using Intel Ethernet Optics; other modules + may function but are not validated by Intel. Contact Intel for supported media types. + +Recommended Matching List +------------------------- + +It is highly recommended to upgrade the i40e kernel driver and firmware to +avoid the compatibility issues with i40e PMD. Here is the suggested matching +list which has been tested and verified. The detailed information can refer +to chapter Tested Platforms/Tested NICs in release notes. + +For X710/XL710/XXV710, + + +--------------+-----------------------+------------------+ + | DPDK version | Kernel driver version | Firmware version | + +==============+=======================+==================+ + | 20.05 | 2.11.27 | 7.30 | + +--------------+-----------------------+------------------+ + | 20.02 | 2.10.19 | 7.20 | + +--------------+-----------------------+------------------+ + | 19.11 | 2.9.21 | 7.00 | + +--------------+-----------------------+------------------+ + | 19.08 | 2.8.43 | 7.00 | + +--------------+-----------------------+------------------+ + | 19.05 | 2.7.29 | 6.80 | + +--------------+-----------------------+------------------+ + | 19.02 | 2.7.26 | 6.80 | + +--------------+-----------------------+------------------+ + | 18.11 | 2.4.6 | 6.01 | + +--------------+-----------------------+------------------+ + | 18.08 | 2.4.6 | 6.01 | + +--------------+-----------------------+------------------+ + | 18.05 | 2.4.6 | 6.01 | + +--------------+-----------------------+------------------+ + | 18.02 | 2.4.3 | 6.01 | + +--------------+-----------------------+------------------+ + | 17.11 | 2.1.26 | 6.01 | + +--------------+-----------------------+------------------+ + | 17.08 | 2.0.19 | 6.01 | + +--------------+-----------------------+------------------+ + | 17.05 | 1.5.23 | 5.05 | + +--------------+-----------------------+------------------+ + | 17.02 | 1.5.23 | 5.05 | + +--------------+-----------------------+------------------+ + | 16.11 | 1.5.23 | 5.05 | + +--------------+-----------------------+------------------+ + | 16.07 | 1.4.25 | 5.04 | + +--------------+-----------------------+------------------+ + | 16.04 | 1.4.25 | 5.02 | + +--------------+-----------------------+------------------+ + + +For X722, + + +--------------+-----------------------+------------------+ + | DPDK version | Kernel driver version | Firmware version | + +==============+=======================+==================+ + | 20.05 | 2.11.27 | 4.11 | + +--------------+-----------------------+------------------+ + | 20.02 | 2.10.19 | 4.11 | + +--------------+-----------------------+------------------+ + | 19.11 | 2.9.21 | 4.10 | + +--------------+-----------------------+------------------+ + | 19.08 | 2.9.21 | 4.10 | + +--------------+-----------------------+------------------+ + | 19.05 | 2.7.29 | 3.33 | + +--------------+-----------------------+------------------+ + | 19.02 | 2.7.26 | 3.33 | + +--------------+-----------------------+------------------+ + | 18.11 | 2.4.6 | 3.33 | + +--------------+-----------------------+------------------+ + + +Pre-Installation Configuration +------------------------------ + +Config File Options +~~~~~~~~~~~~~~~~~~~ + +The following options can be modified in the ``config`` file. +Please note that enabling debugging options may affect system performance. + +- ``CONFIG_RTE_LIBRTE_I40E_PMD`` (default ``y``) + + Toggle compilation of the ``librte_pmd_i40e`` driver. + +- ``CONFIG_RTE_LIBRTE_I40E_DEBUG_*`` (default ``n``) + + Toggle display of generic debugging messages. + +- ``CONFIG_RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC`` (default ``y``) + + Toggle bulk allocation for RX. + +- ``CONFIG_RTE_LIBRTE_I40E_INC_VECTOR`` (default ``n``) + + Toggle the use of Vector PMD instead of normal RX/TX path. + To enable vPMD for RX, bulk allocation for Rx must be allowed. + +- ``CONFIG_RTE_LIBRTE_I40E_16BYTE_RX_DESC`` (default ``n``) + + Toggle to use a 16-byte RX descriptor, by default the RX descriptor is 32 byte. + +- ``CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_PF`` (default ``64``) + + Number of queues reserved for PF. + +- ``CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM`` (default ``4``) + + Number of queues reserved for each VMDQ Pool. + +Runtime Config Options +~~~~~~~~~~~~~~~~~~~~~~ + +- ``Reserved number of Queues per VF`` (default ``4``) + + The number of reserved queue per VF is determined by its host PF. If the + PCI address of an i40e PF is aaaa:bb.cc, the number of reserved queues per + VF can be configured with EAL parameter like -w aaaa:bb.cc,queue-num-per-vf=n. + The value n can be 1, 2, 4, 8 or 16. If no such parameter is configured, the + number of reserved queues per VF is 4 by default. If VF request more than + reserved queues per VF, PF will able to allocate max to 16 queues after a VF + reset. + + +- ``Support multiple driver`` (default ``disable``) + + There was a multiple driver support issue during use of 700 series Ethernet + Adapter with both Linux kernel and DPDK PMD. To fix this issue, ``devargs`` + parameter ``support-multi-driver`` is introduced, for example:: + + -w 84:00.0,support-multi-driver=1 + + With the above configuration, DPDK PMD will not change global registers, and + will switch PF interrupt from IntN to Int0 to avoid interrupt conflict between + DPDK and Linux Kernel. + +- ``Support VF Port Representor`` (default ``not enabled``) + + The i40e PF PMD supports the creation of VF port representors for the control + and monitoring of i40e virtual function devices. Each port representor + corresponds to a single virtual function of that device. Using the ``devargs`` + option ``representor`` the user can specify which virtual functions to create + port representors for on initialization of the PF PMD by passing the VF IDs of + the VFs which are required.:: + + -w DBDF,representor=[0,1,4] + + Currently hot-plugging of representor ports is not supported so all required + representors must be specified on the creation of the PF. + +- ``Use latest supported vector`` (default ``disable``) + + Latest supported vector path may not always get the best perf so vector path was + recommended to use only on later platform. But users may want the latest vector path + since it can get better perf in some real work loading cases. So ``devargs`` param + ``use-latest-supported-vec`` is introduced, for example:: + + -w 84:00.0,use-latest-supported-vec=1 + +- ``Enable validation for VF message`` (default ``not enabled``) + + The PF counts messages from each VF. If in any period of seconds the message + statistic from a VF exceeds maximal limitation, the PF will ignore any new message + from that VF for some seconds. + Format -- "maximal-message@period-seconds:ignore-seconds" + For example:: + + -w 84:00.0,vf_msg_cfg=80@120:180 + +Vector RX Pre-conditions +~~~~~~~~~~~~~~~~~~~~~~~~ +For Vector RX it is assumed that the number of descriptor rings will be a power +of 2. With this pre-condition, the ring pointer can easily scroll back to the +head after hitting the tail without a conditional check. In addition Vector RX +can use this assumption to do a bit mask using ``ring_size - 1``. + +Driver compilation and testing +------------------------------ + +Refer to the document :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` +for details. + + +SR-IOV: Prerequisites and sample Application Notes +-------------------------------------------------- + +#. Load the kernel module: + + .. code-block:: console + + modprobe i40e + + Check the output in dmesg: + + .. code-block:: console + + i40e 0000:83:00.1 ens802f0: renamed from eth0 + +#. Bring up the PF ports: + + .. code-block:: console + + ifconfig ens802f0 up + +#. Create VF device(s): + + Echo the number of VFs to be created into the ``sriov_numvfs`` sysfs entry + of the parent PF. + + Example: + + .. code-block:: console + + echo 2 > /sys/devices/pci0000:00/0000:00:03.0/0000:81:00.0/sriov_numvfs + + +#. Assign VF MAC address: + + Assign MAC address to the VF using iproute2 utility. The syntax is: + + .. code-block:: console + + ip link set <PF netdev id> vf <VF id> mac <macaddr> + + Example: + + .. code-block:: console + + ip link set ens802f0 vf 0 mac a0:b0:c0:d0:e0:f0 + +#. Assign VF to VM, and bring up the VM. + Please see the documentation for the *I40E/IXGBE/IGB Virtual Function Driver*. + +#. Running testpmd: + + Follow instructions available in the document + :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` + to run testpmd. + + Example output: + + .. code-block:: console + + ... + EAL: PCI device 0000:83:00.0 on NUMA socket 1 + EAL: probe driver: 8086:1572 rte_i40e_pmd + EAL: PCI memory mapped at 0x7f7f80000000 + EAL: PCI memory mapped at 0x7f7f80800000 + PMD: eth_i40e_dev_init(): FW 5.0 API 1.5 NVM 05.00.02 eetrack 8000208a + Interactive-mode selected + Configuring Port 0 (socket 0) + ... + + PMD: i40e_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are + satisfied.Rx Burst Bulk Alloc function will be used on port=0, queue=0. + + ... + Port 0: 68:05:CA:26:85:84 + Checking link statuses... + Port 0 Link Up - speed 10000 Mbps - full-duplex + Done + + testpmd> + + +Sample Application Notes +------------------------ + +Vlan filter +~~~~~~~~~~~ + +Vlan filter only works when Promiscuous mode is off. + +To start ``testpmd``, and add vlan 10 to port 0: + +.. code-block:: console + + ./app/testpmd -l 0-15 -n 4 -- -i --forward-mode=mac + ... + + testpmd> set promisc 0 off + testpmd> rx_vlan add 10 0 + + +Flow Director +~~~~~~~~~~~~~ + +The Flow Director works in receive mode to identify specific flows or sets of flows and route them to specific queues. +The Flow Director filters can match the different fields for different type of packet: flow type, specific input set per flow type and the flexible payload. + +The default input set of each flow type is:: + + ipv4-other : src_ip_address, dst_ip_address + ipv4-frag : src_ip_address, dst_ip_address + ipv4-tcp : src_ip_address, dst_ip_address, src_port, dst_port + ipv4-udp : src_ip_address, dst_ip_address, src_port, dst_port + ipv4-sctp : src_ip_address, dst_ip_address, src_port, dst_port, + verification_tag + ipv6-other : src_ip_address, dst_ip_address + ipv6-frag : src_ip_address, dst_ip_address + ipv6-tcp : src_ip_address, dst_ip_address, src_port, dst_port + ipv6-udp : src_ip_address, dst_ip_address, src_port, dst_port + ipv6-sctp : src_ip_address, dst_ip_address, src_port, dst_port, + verification_tag + l2_payload : ether_type + +The flex payload is selected from offset 0 to 15 of packet's payload by default, while it is masked out from matching. + +Start ``testpmd`` with ``--disable-rss`` and ``--pkt-filter-mode=perfect``: + +.. code-block:: console + + ./app/testpmd -l 0-15 -n 4 -- -i --disable-rss --pkt-filter-mode=perfect \ + --rxq=8 --txq=8 --nb-cores=8 --nb-ports=1 + +Add a rule to direct ``ipv4-udp`` packet whose ``dst_ip=2.2.2.5, src_ip=2.2.2.3, src_port=32, dst_port=32`` to queue 1: + +.. code-block:: console + + testpmd> flow_director_filter 0 mode IP add flow ipv4-udp \ + src 2.2.2.3 32 dst 2.2.2.5 32 vlan 0 flexbytes () \ + fwd pf queue 1 fd_id 1 + +Check the flow director status: + +.. code-block:: console + + testpmd> show port fdir 0 + + ######################## FDIR infos for port 0 #################### + MODE: PERFECT + SUPPORTED FLOW TYPE: ipv4-frag ipv4-tcp ipv4-udp ipv4-sctp ipv4-other + ipv6-frag ipv6-tcp ipv6-udp ipv6-sctp ipv6-other + l2_payload + FLEX PAYLOAD INFO: + max_len: 16 payload_limit: 480 + payload_unit: 2 payload_seg: 3 + bitmask_unit: 2 bitmask_num: 2 + MASK: + vlan_tci: 0x0000, + src_ipv4: 0x00000000, + dst_ipv4: 0x00000000, + src_port: 0x0000, + dst_port: 0x0000 + src_ipv6: 0x00000000,0x00000000,0x00000000,0x00000000, + dst_ipv6: 0x00000000,0x00000000,0x00000000,0x00000000 + FLEX PAYLOAD SRC OFFSET: + L2_PAYLOAD: 0 1 2 3 4 5 6 ... + L3_PAYLOAD: 0 1 2 3 4 5 6 ... + L4_PAYLOAD: 0 1 2 3 4 5 6 ... + FLEX MASK CFG: + ipv4-udp: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv4-tcp: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv4-sctp: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv4-other: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv4-frag: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv6-udp: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv6-tcp: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv6-sctp: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv6-other: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv6-frag: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + l2_payload: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + guarant_count: 1 best_count: 0 + guarant_space: 512 best_space: 7168 + collision: 0 free: 0 + maxhash: 0 maxlen: 0 + add: 0 remove: 0 + f_add: 0 f_remove: 0 + + +Delete all flow director rules on a port: + +.. code-block:: console + + testpmd> flush_flow_director 0 + +Floating VEB +~~~~~~~~~~~~~ + +The Intel® Ethernet 700 Series support a feature called +"Floating VEB". + +A Virtual Ethernet Bridge (VEB) is an IEEE Edge Virtual Bridging (EVB) term +for functionality that allows local switching between virtual endpoints within +a physical endpoint and also with an external bridge/network. + +A "Floating" VEB doesn't have an uplink connection to the outside world so all +switching is done internally and remains within the host. As such, this +feature provides security benefits. + +In addition, a Floating VEB overcomes a limitation of normal VEBs where they +cannot forward packets when the physical link is down. Floating VEBs don't need +to connect to the NIC port so they can still forward traffic from VF to VF +even when the physical link is down. + +Therefore, with this feature enabled VFs can be limited to communicating with +each other but not an outside network, and they can do so even when there is +no physical uplink on the associated NIC port. + +To enable this feature, the user should pass a ``devargs`` parameter to the +EAL, for example:: + + -w 84:00.0,enable_floating_veb=1 + +In this configuration the PMD will use the floating VEB feature for all the +VFs created by this PF device. + +Alternatively, the user can specify which VFs need to connect to this floating +VEB using the ``floating_veb_list`` argument:: + + -w 84:00.0,enable_floating_veb=1,floating_veb_list=1;3-4 + +In this example ``VF1``, ``VF3`` and ``VF4`` connect to the floating VEB, +while other VFs connect to the normal VEB. + +The current implementation only supports one floating VEB and one regular +VEB. VFs can connect to a floating VEB or a regular VEB according to the +configuration passed on the EAL command line. + +The floating VEB functionality requires a NIC firmware version of 5.0 +or greater. + +Dynamic Device Personalization (DDP) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The Intel® Ethernet 700 Series except for the Intel Ethernet Connection +X722 support a feature called "Dynamic Device Personalization (DDP)", +which is used to configure hardware by downloading a profile to support +protocols/filters which are not supported by default. The DDP +functionality requires a NIC firmware version of 6.0 or greater. + +Current implementation supports GTP-C/GTP-U/PPPoE/PPPoL2TP/ESP, +steering can be used with rte_flow API. + +GTPv1 package is released, and it can be downloaded from +https://downloadcenter.intel.com/download/27587. + +PPPoE package is released, and it can be downloaded from +https://downloadcenter.intel.com/download/28040. + +ESP-AH package is released, and it can be downloaded from +https://downloadcenter.intel.com/download/29446. + +Load a profile which supports GTP and store backup profile: + +.. code-block:: console + + testpmd> ddp add 0 ./gtp.pkgo,./backup.pkgo + +Delete a GTP profile and restore backup profile: + +.. code-block:: console + + testpmd> ddp del 0 ./backup.pkgo + +Get loaded DDP package info list: + +.. code-block:: console + + testpmd> ddp get list 0 + +Display information about a GTP profile: + +.. code-block:: console + + testpmd> ddp get info ./gtp.pkgo + +Input set configuration +~~~~~~~~~~~~~~~~~~~~~~~ +Input set for any PCTYPE can be configured with user defined configuration, +For example, to use only 48bit prefix for IPv6 src address for IPv6 TCP RSS: + +.. code-block:: console + + testpmd> port config 0 pctype 43 hash_inset clear all + testpmd> port config 0 pctype 43 hash_inset set field 13 + testpmd> port config 0 pctype 43 hash_inset set field 14 + testpmd> port config 0 pctype 43 hash_inset set field 15 + +Queue region configuration +~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The Intel® Ethernet 700 Series supports a feature of queue regions +configuration for RSS in the PF, so that different traffic classes or +different packet classification types can be separated to different +queues in different queue regions. There is an API for configuration +of queue regions in RSS with a command line. It can parse the parameters +of the region index, queue number, queue start index, user priority, traffic +classes and so on. Depending on commands from the command line, it will call +i40e private APIs and start the process of setting or flushing the queue +region configuration. As this feature is specific for i40e only private +APIs are used. These new ``test_pmd`` commands are as shown below. For +details please refer to :doc:`../testpmd_app_ug/index`. + +.. code-block:: console + + testpmd> set port (port_id) queue-region region_id (value) \ + queue_start_index (value) queue_num (value) + testpmd> set port (port_id) queue-region region_id (value) flowtype (value) + testpmd> set port (port_id) queue-region UP (value) region_id (value) + testpmd> set port (port_id) queue-region flush (on|off) + testpmd> show port (port_id) queue-region + +Generic flow API +~~~~~~~~~~~~~~~~~~~ + +- ``RSS Flow`` + + RSS Flow supports to set hash input set, hash function, enable hash + and configure queue region. + For example: + Configure queue region as queue 0, 1, 2, 3. + + .. code-block:: console + + testpmd> flow create 0 ingress pattern end actions rss types end \ + queues 0 1 2 3 end / end + + Enable hash and set input set for ipv4-tcp. + + .. code-block:: console + + testpmd> flow create 0 ingress pattern eth / ipv4 / tcp / end \ + actions rss types ipv4-tcp l3-src-only end queues end / end + + Set symmetric hash enable for flow type ipv4-tcp. + + .. code-block:: console + + testpmd> flow create 0 ingress pattern eth / ipv4 / tcp / end \ + actions rss types ipv4-tcp end queues end func symmetric_toeplitz / end + + Set hash function as simple xor. + + .. code-block:: console + + testpmd> flow create 0 ingress pattern end actions rss types end \ + queues end func simple_xor / end + +Limitations or Known issues +--------------------------- + +MPLS packet classification +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For firmware versions prior to 5.0, MPLS packets are not recognized by the NIC. +The L2 Payload flow type in flow director can be used to classify MPLS packet +by using a command in testpmd like: + + testpmd> flow_director_filter 0 mode IP add flow l2_payload ether \ + 0x8847 flexbytes () fwd pf queue <N> fd_id <M> + +With the NIC firmware version 5.0 or greater, some limited MPLS support +is added: Native MPLS (MPLS in Ethernet) skip is implemented, while no +new packet type, no classification or offload are possible. With this change, +L2 Payload flow type in flow director cannot be used to classify MPLS packet +as with previous firmware versions. Meanwhile, the Ethertype filter can be +used to classify MPLS packet by using a command in testpmd like: + + testpmd> ethertype_filter 0 add mac_ignr 00:00:00:00:00:00 ethertype \ + 0x8847 fwd queue <M> + +16 Byte RX Descriptor setting on DPDK VF +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Currently the VF's RX descriptor mode is decided by PF. There's no PF-VF +interface for VF to request the RX descriptor mode, also no interface to notify +VF its own RX descriptor mode. +For all available versions of the i40e driver, these drivers don't support 16 +byte RX descriptor. If the Linux i40e kernel driver is used as host driver, +while DPDK i40e PMD is used as the VF driver, DPDK cannot choose 16 byte receive +descriptor. The reason is that the RX descriptor is already set to 32 byte by +the i40e kernel driver. That is to say, user should keep +``CONFIG_RTE_LIBRTE_I40E_16BYTE_RX_DESC=n`` in config file. +In the future, if the Linux i40e driver supports 16 byte RX descriptor, user +should make sure the DPDK VF uses the same RX descriptor mode, 16 byte or 32 +byte, as the PF driver. + +The same rule for DPDK PF + DPDK VF. The PF and VF should use the same RX +descriptor mode. Or the VF RX will not work. + +Receive packets with Ethertype 0x88A8 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Due to the FW limitation, PF can receive packets with Ethertype 0x88A8 +only when floating VEB is disabled. + +Incorrect Rx statistics when packet is oversize +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When a packet is over maximum frame size, the packet is dropped. +However, the Rx statistics, when calling `rte_eth_stats_get` incorrectly +shows it as received. + +VF & TC max bandwidth setting +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The per VF max bandwidth and per TC max bandwidth cannot be enabled in parallel. +The behavior is different when handling per VF and per TC max bandwidth setting. +When enabling per VF max bandwidth, SW will check if per TC max bandwidth is +enabled. If so, return failure. +When enabling per TC max bandwidth, SW will check if per VF max bandwidth +is enabled. If so, disable per VF max bandwidth and continue with per TC max +bandwidth setting. + +TC TX scheduling mode setting +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +There are 2 TX scheduling modes for TCs, round robin and strict priority mode. +If a TC is set to strict priority mode, it can consume unlimited bandwidth. +It means if APP has set the max bandwidth for that TC, it comes to no +effect. +It's suggested to set the strict priority mode for a TC that is latency +sensitive but no consuming much bandwidth. + +VF performance is impacted by PCI extended tag setting +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To reach maximum NIC performance in the VF the PCI extended tag must be +enabled. The DPDK i40e PF driver will set this feature during initialization, +but the kernel PF driver does not. So when running traffic on a VF which is +managed by the kernel PF driver, a significant NIC performance downgrade has +been observed (for 64 byte packets, there is about 25% line-rate downgrade for +a 25GbE device and about 35% for a 40GbE device). + +For kernel version >= 4.11, the kernel's PCI driver will enable the extended +tag if it detects that the device supports it. So by default, this is not an +issue. For kernels <= 4.11 or when the PCI extended tag is disabled it can be +enabled using the steps below. + +#. Get the current value of the PCI configure register:: + + setpci -s <XX:XX.X> a8.w + +#. Set bit 8:: + + value = value | 0x100 + +#. Set the PCI configure register with new value:: + + setpci -s <XX:XX.X> a8.w=<value> + +Vlan strip of VF +~~~~~~~~~~~~~~~~ + +The VF vlan strip function is only supported in the i40e kernel driver >= 2.1.26. + +DCB function +~~~~~~~~~~~~ + +DCB works only when RSS is enabled. + +Global configuration warning +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +I40E PMD will set some global registers to enable some function or set some +configure. Then when using different ports of the same NIC with Linux kernel +and DPDK, the port with Linux kernel will be impacted by the port with DPDK. +For example, register I40E_GL_SWT_L2TAGCTRL is used to control L2 tag, i40e +PMD uses I40E_GL_SWT_L2TAGCTRL to set vlan TPID. If setting TPID in port A +with DPDK, then the configuration will also impact port B in the NIC with +kernel driver, which don't want to use the TPID. +So PMD reports warning to clarify what is changed by writing global register. + +High Performance of Small Packets on 40GbE NIC +---------------------------------------------- + +As there might be firmware fixes for performance enhancement in latest version +of firmware image, the firmware update might be needed for getting high performance. +Check the Intel support website for the latest firmware updates. +Users should consult the release notes specific to a DPDK release to identify +the validated firmware version for a NIC using the i40e driver. + +Use 16 Bytes RX Descriptor Size +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +As i40e PMD supports both 16 and 32 bytes RX descriptor sizes, and 16 bytes size can provide helps to high performance of small packets. +Configuration of ``CONFIG_RTE_LIBRTE_I40E_16BYTE_RX_DESC`` in config files can be changed to use 16 bytes size RX descriptors. + +Input set requirement of each pctype for FDIR +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Each PCTYPE can only have one specific FDIR input set at one time. +For example, if creating 2 rte_flow rules with different input set for one PCTYPE, +it will fail and return the info "Conflict with the first rule's input set", +which means the current rule's input set conflicts with the first rule's. +Remove the first rule if want to change the input set of the PCTYPE. + +Example of getting best performance with l3fwd example +------------------------------------------------------ + +The following is an example of running the DPDK ``l3fwd`` sample application to get high performance with a +server with Intel Xeon processors and Intel Ethernet CNA XL710. + +The example scenario is to get best performance with two Intel Ethernet CNA XL710 40GbE ports. +See :numref:`figure_intel_perf_test_setup` for the performance test setup. + +.. _figure_intel_perf_test_setup: + +.. figure:: img/intel_perf_test_setup.* + + Performance Test Setup + + +1. Add two Intel Ethernet CNA XL710 to the platform, and use one port per card to get best performance. + The reason for using two NICs is to overcome a PCIe v3.0 limitation since it cannot provide 80GbE bandwidth + for two 40GbE ports, but two different PCIe v3.0 x8 slot can. + Refer to the sample NICs output above, then we can select ``82:00.0`` and ``85:00.0`` as test ports:: + + 82:00.0 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583] + 85:00.0 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583] + +2. Connect the ports to the traffic generator. For high speed testing, it's best to use a hardware traffic generator. + +3. Check the PCI devices numa node (socket id) and get the cores number on the exact socket id. + In this case, ``82:00.0`` and ``85:00.0`` are both in socket 1, and the cores on socket 1 in the referenced platform + are 18-35 and 54-71. + Note: Don't use 2 logical cores on the same core (e.g core18 has 2 logical cores, core18 and core54), instead, use 2 logical + cores from different cores (e.g core18 and core19). + +4. Bind these two ports to igb_uio. + +5. As to Intel Ethernet CNA XL710 40GbE port, we need at least two queue pairs to achieve best performance, then two queues per port + will be required, and each queue pair will need a dedicated CPU core for receiving/transmitting packets. + +6. The DPDK sample application ``l3fwd`` will be used for performance testing, with using two ports for bi-directional forwarding. + Compile the ``l3fwd sample`` with the default lpm mode. + +7. The command line of running l3fwd would be something like the following:: + + ./l3fwd -l 18-21 -n 4 -w 82:00.0 -w 85:00.0 \ + -- -p 0x3 --config '(0,0,18),(0,1,19),(1,0,20),(1,1,21)' + + This means that the application uses core 18 for port 0, queue pair 0 forwarding, core 19 for port 0, queue pair 1 forwarding, + core 20 for port 1, queue pair 0 forwarding, and core 21 for port 1, queue pair 1 forwarding. + +8. Configure the traffic at a traffic generator. + + * Start creating a stream on packet generator. + + * Set the Ethernet II type to 0x0800. + +Tx bytes affected by the link status change +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For firmware versions prior to 6.01 for X710 series and 3.33 for X722 series, the tx_bytes statistics data is affected by +the link down event. Each time the link status changes to down, the tx_bytes decreases 110 bytes. |