diff options
Diffstat (limited to 'src/seastar/dpdk/doc/guides/nics')
87 files changed, 9314 insertions, 0 deletions
diff --git a/src/seastar/dpdk/doc/guides/nics/ark.rst b/src/seastar/dpdk/doc/guides/nics/ark.rst new file mode 100644 index 00000000..a7c2590b --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/ark.rst @@ -0,0 +1,261 @@ +.. BSD LICENSE + + Copyright (c) 2015-2017 Atomic Rules LLC + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Atomic Rules LLC nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +ARK Poll Mode Driver +==================== + +The ARK PMD is a DPDK poll-mode driver for the Atomic Rules Arkville +(ARK) family of devices. + +More information can be found at the `Atomic Rules website +<http://atomicrules.com>`_. + +Overview +-------- + +The Atomic Rules Arkville product is DPDK and AXI compliant product +that marshals packets across a PCIe conduit between host DPDK mbufs and +FPGA AXI streams. + +The ARK PMD, and the spirit of the overall Arkville product, +has been to take the DPDK API/ABI as a fixed specification; +then implement much of the business logic in FPGA RTL circuits. +The approach of *working backwards* from the DPDK API/ABI and having +the GPP host software *dictate*, while the FPGA hardware *copes*, +results in significant performance gains over a naive implementation. + +While this document describes the ARK PMD software, it is helpful to +understand what the FPGA hardware is and is not. The Arkville RTL +component provides a single PCIe Physical Function (PF) supporting +some number of RX/Ingress and TX/Egress Queues. The ARK PMD controls +the Arkville core through a dedicated opaque Core BAR (CBAR). +To allow users full freedom for their own FPGA application IP, +an independent FPGA Application BAR (ABAR) is provided. + +One popular way to imagine Arkville's FPGA hardware aspect is as the +FPGA PCIe-facing side of a so-called Smart NIC. The Arkville core does +not contain any MACs, and is link-speed independent, as well as +agnostic to the number of physical ports the application chooses to +use. The ARK driver exposes the familiar PMD interface to allow packet +movement to and from mbufs across multiple queues. + +However FPGA RTL applications could contain a universe of added +functionality that an Arkville RTL core does not provide or can +not anticipate. To allow for this expectation of user-defined +innovation, the ARK PMD provides a dynamic mechanism of adding +capabilities without having to modify the ARK PMD. + +The ARK PMD is intended to support all instances of the Arkville +RTL Core, regardless of configuration, FPGA vendor, or target +board. While specific capabilities such as number of physical +hardware queue-pairs are negotiated; the driver is designed to +remain constant over a broad and extendable feature set. + +Intentionally, Arkville by itself DOES NOT provide common NIC +capabilities such as offload or receive-side scaling (RSS). +These capabilities would be viewed as a gate-level "tax" on +Green-box FPGA applications that do not require such function. +Instead, they can be added as needed with essentially no +overhead to the FPGA Application. + +The ARK PMD also supports optional user extensions, through dynamic linking. +The ARK PMD user extensions are a feature of Arkville’s DPDK +net/ark poll mode driver, allowing users to add their +own code to extend the net/ark functionality without +having to make source code changes to the driver. One motivation for +this capability is that while DPDK provides a rich set of functions +to interact with NIC-like capabilities (e.g. MAC addresses and statistics), +the Arkville RTL IP does not include a MAC. Users can supply their +own MAC or custom FPGA applications, which may require control from +the PMD. The user extension is the means providing the control +between the user's FPGA application and the existing DPDK features via +the PMD. + +Device Parameters +------------------- + +The ARK PMD supports device parameters that are used for packet +routing and for internal packet generation and packet checking. This +section describes the supported parameters. These features are +primarily used for diagnostics, testing, and performance verification +under the guidance of an Arkville specialist. The nominal use of +Arkville does not require any configuration using these parameters. + +"Pkt_dir" + +The Packet Director controls connectivity between Arkville's internal +hardware components. The features of the Pkt_dir are only used for +diagnostics and testing; it is not intended for nominal use. The full +set of features are not published at this level. + +Format: +Pkt_dir=0x00110F10 + +"Pkt_gen" + +The packet generator parameter takes a file as its argument. The file +contains configuration parameters used internally for regression +testing and are not intended to be published at this level. The +packet generator is an internal Arkville hardware component. + +Format: +Pkt_gen=./config/pg.conf + +"Pkt_chkr" + +The packet checker parameter takes a file as its argument. The file +contains configuration parameters used internally for regression +testing and are not intended to be published at this level. The +packet checker is an internal Arkville hardware component. + +Format: +Pkt_chkr=./config/pc.conf + + +Data Path Interface +------------------- + +Ingress RX and Egress TX operation is by the nominal DPDK API . +The driver supports single-port, multi-queue for both RX and TX. + +Refer to ``ark_ethdev.h`` for the list of supported methods to +act upon RX and TX Queues. + +Configuration Information +------------------------- + +**DPDK Configuration Parameters** + + The following configuration options are available for the ARK PMD: + + * **CONFIG_RTE_LIBRTE_ARK_PMD** (default y): Enables or disables inclusion + of the ARK PMD driver in the DPDK compilation. + + * **CONFIG_RTE_LIBRTE_ARK_PAD_TX** (default y): When enabled TX + packets are padded to 60 bytes to support downstream MACS. + + * **CONFIG_RTE_LIBRTE_ARK_DEBUG_RX** (default n): Enables or disables debug + logging and internal checking of RX ingress logic within the ARK PMD driver. + + * **CONFIG_RTE_LIBRTE_ARK_DEBUG_TX** (default n): Enables or disables debug + logging and internal checking of TX egress logic within the ARK PMD driver. + + * **CONFIG_RTE_LIBRTE_ARK_DEBUG_STATS** (default n): Enables or disables debug + logging of detailed packet and performance statistics gathered in + the PMD and FPGA. + + * **CONFIG_RTE_LIBRTE_ARK_DEBUG_TRACE** (default n): Enables or disables debug + logging of detailed PMD events and status. + + +Building DPDK +------------- + +See the :ref:`DPDK Getting Started Guide for Linux <linux_gsg>` for +instructions on how to build DPDK. + +By default the ARK PMD library will be built into the DPDK library. + +For configuring and using UIO and VFIO frameworks, please also refer :ref:`the +documentation that comes with DPDK suite <linux_gsg>`. + +Supported ARK RTL PCIe Instances +-------------------------------- + +ARK PMD supports the following Arkville RTL PCIe instances including: + +* ``1d6c:100d`` - AR-ARKA-FX0 [Arkville 32B DPDK Data Mover] +* ``1d6c:100e`` - AR-ARKA-FX1 [Arkville 64B DPDK Data Mover] + +Supported Operating Systems +--------------------------- + +Any Linux distribution fulfilling the conditions described in ``System Requirements`` +section of :ref:`the DPDK documentation <linux_gsg>` or refer to *DPDK +Release Notes*. ARM and PowerPC architectures are not supported at this time. + + +Supported Features +------------------ + +* Dynamic ARK PMD extensions +* Multiple receive and transmit queues +* Jumbo frames up to 9K +* Hardware Statistics + +Unsupported Features +-------------------- + +Features that may be part of, or become part of, the Arkville RTL IP that are +not currently supported or exposed by the ARK PMD include: + +* PCIe SR-IOV Virtual Functions (VFs) +* Arkville's Packet Generator Control and Status +* Arkville's Packet Director Control and Status +* Arkville's Packet Checker Control and Status +* Arkville's Timebase Management + +Pre-Requisites +-------------- + +#. Prepare the system as recommended by DPDK suite. This includes environment + variables, hugepages configuration, tool-chains and configuration + +#. Insert igb_uio kernel module using the command 'modprobe igb_uio' + +#. Bind the intended ARK device to igb_uio module + +At this point the system should be ready to run DPDK applications. Once the +application runs to completion, the ARK PMD can be detached from igb_uio if necessary. + +Usage Example +------------- + +Follow instructions available in the document +:ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` to launch +**testpmd** with Atomic Rules ARK devices managed by librte_pmd_ark. + +Example output: + +.. code-block:: console + + [...] + EAL: PCI device 0000:01:00.0 on NUMA socket -1 + EAL: probe driver: 1d6c:100e rte_ark_pmd + EAL: PCI memory mapped at 0x7f9b6c400000 + PMD: eth_ark_dev_init(): Initializing 0:2:0.1 + ARKP PMD CommitID: 378f3a67 + Configuring Port 0 (socket 0) + Port 0: DC:3C:F6:00:00:01 + Checking link statuses... + Port 0 Link Up - speed 100000 Mbps - full-duplex + Done + testpmd> diff --git a/src/seastar/dpdk/doc/guides/nics/avp.rst b/src/seastar/dpdk/doc/guides/nics/avp.rst new file mode 100644 index 00000000..1fcba66c --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/avp.rst @@ -0,0 +1,111 @@ +.. BSD LICENSE + Copyright(c) 2017 Wind River Systems, Inc. rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +AVP Poll Mode Driver +================================================================= + +The Accelerated Virtual Port (AVP) device is a shared memory based device +only available on `virtualization platforms <http://www.windriver.com/products/titanium-cloud/>`_ +from Wind River Systems. The Wind River Systems virtualization platform +currently uses QEMU/KVM as its hypervisor and as such provides support for all +of the QEMU supported virtual and/or emulated devices (e.g., virtio, e1000, +etc.). The platform offers the virtio device type as the default device when +launching a virtual machine or creating a virtual machine port. The AVP device +is a specialized device available to customers that require increased +throughput and decreased latency to meet the demands of their performance +focused applications. + +The AVP driver binds to any AVP PCI devices that have been exported by the Wind +River Systems QEMU/KVM hypervisor. As a user of the DPDK driver API it +supports a subset of the full Ethernet device API to enable the application to +use the standard device configuration functions and packet receive/transmit +functions. + +These devices enable optimized packet throughput by bypassing QEMU and +delivering packets directly to the virtual switch via a shared memory +mechanism. This provides DPDK applications running in virtual machines with +significantly improved throughput and latency over other device types. + +The AVP device implementation is integrated with the QEMU/KVM live-migration +mechanism to allow applications to seamlessly migrate from one hypervisor node +to another with minimal packet loss. + + +Features and Limitations of the AVP PMD +--------------------------------------- + +The AVP PMD driver provides the following functionality. + +* Receive and transmit of both simple and chained mbuf packets, + +* Chained mbufs may include up to 5 chained segments, + +* Up to 8 receive and transmit queues per device, + +* Only a single MAC address is supported, + +* The MAC address cannot be modified, + +* The maximum receive packet length is 9238 bytes, + +* VLAN header stripping and inserting, + +* Promiscuous mode + +* VM live-migration + +* PCI hotplug insertion and removal + + +Prerequisites +------------- + +The following prerequisites apply: + +* A virtual machine running in a Wind River Systems virtualization + environment and configured with at least one neutron port defined with a + vif-model set to "avp". + + +Launching a VM with an AVP type network attachment +-------------------------------------------------- + +The following example will launch a VM with three network attachments. The +first attachment will have a default vif-model of "virtio". The next two +network attachments will have a vif-model of "avp" and may be used with a DPDK +application which is built to include the AVP PMD driver. + +.. code-block:: console + + nova boot --flavor small --image my-image \ + --nic net-id=${NETWORK1_UUID} \ + --nic net-id=${NETWORK2_UUID},vif-model=avp \ + --nic net-id=${NETWORK3_UUID},vif-model=avp \ + --security-group default my-instance1 diff --git a/src/seastar/dpdk/doc/guides/nics/bnx2x.rst b/src/seastar/dpdk/doc/guides/nics/bnx2x.rst new file mode 100644 index 00000000..fbfc048e --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/bnx2x.rst @@ -0,0 +1,239 @@ +.. BSD LICENSE + Copyright (c) 2015 QLogic Corporation + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of QLogic Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +BNX2X Poll Mode Driver +====================== + +The BNX2X poll mode driver library (**librte_pmd_bnx2x**) implements support +for **QLogic 578xx** 10/20 Gbps family of adapters as well as their virtual +functions (VF) in SR-IOV context. It is supported on several standard Linux +distros like Red Hat 7.x and SLES12 OS. It is compile-tested under FreeBSD OS. + +More information can be found at `QLogic Corporation's Official Website +<http://www.qlogic.com>`_. + +Supported Features +------------------ + +BNX2X PMD has support for: + +- Base L2 features +- Unicast/multicast filtering +- Promiscuous mode +- Port hardware statistics +- SR-IOV VF + +Non-supported Features +---------------------- + +The features not yet supported include: + +- TSS (Transmit Side Scaling) +- RSS (Receive Side Scaling) +- LRO/TSO offload +- Checksum offload +- SR-IOV PF +- Rx TX scatter gather + +Co-existence considerations +--------------------------- + +- BCM578xx being a CNA can have both NIC and Storage personalities. + However, coexistence with storage protocol drivers (cnic, bnx2fc and + bnx2fi) is not supported on the same adapter. So storage personality + has to be disabled on that adapter when used in DPDK applications. + +- For SR-IOV case, bnx2x PMD will be used to bind to SR-IOV VF device and + Linux native kernel driver (bnx2x) will be attached to SR-IOV PF. + + +Supported QLogic NICs +--------------------- + +- 578xx + +Prerequisites +------------- + +- Requires firmware version **7.2.51.0**. It is included in most of the + standard Linux distros. If it is not available visit + `QLogic Driver Download Center <http://driverdownloads.qlogic.com>`_ + to get the required firmware. + +Pre-Installation Configuration +------------------------------ + +Config File Options +~~~~~~~~~~~~~~~~~~~ + +The following options can be modified in the ``.config`` file. Please note that +enabling debugging options may affect system performance. + +- ``CONFIG_RTE_LIBRTE_BNX2X_PMD`` (default **n**) + + Toggle compilation of bnx2x driver. To use bnx2x PMD set this config parameter + to 'y'. Also, in order for firmware binary to load user will need zlib devel + package installed. + +- ``CONFIG_RTE_LIBRTE_BNX2X_DEBUG`` (default **n**) + + Toggle display of generic debugging messages. + +- ``CONFIG_RTE_LIBRTE_BNX2X_DEBUG_INIT`` (default **n**) + + Toggle display of initialization related messages. + +- ``CONFIG_RTE_LIBRTE_BNX2X_DEBUG_TX`` (default **n**) + + Toggle display of transmit fast path run-time messages. + +- ``CONFIG_RTE_LIBRTE_BNX2X_DEBUG_RX`` (default **n**) + + Toggle display of receive fast path run-time messages. + +- ``CONFIG_RTE_LIBRTE_BNX2X_DEBUG_PERIODIC`` (default **n**) + + Toggle display of register reads and writes. + + +.. _bnx2x_driver-compilation: + +Driver compilation and testing +------------------------------ + +Refer to the document :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` +for details. + +SR-IOV: Prerequisites and sample Application Notes +-------------------------------------------------- + +This section provides instructions to configure SR-IOV with Linux OS. + +#. Verify SR-IOV and ARI capabilities are enabled on the adapter using ``lspci``: + + .. code-block:: console + + lspci -s <slot> -vvv + + Example output: + + .. code-block:: console + + [...] + Capabilities: [1b8 v1] Alternative Routing-ID Interpretation (ARI) + [...] + Capabilities: [1c0 v1] Single Root I/O Virtualization (SR-IOV) + [...] + Kernel driver in use: igb_uio + +#. Load the kernel module: + + .. code-block:: console + + modprobe bnx2x + + Example output: + + .. code-block:: console + + systemd-udevd[4848]: renamed network interface eth0 to ens5f0 + systemd-udevd[4848]: renamed network interface eth1 to ens5f1 + +#. Bring up the PF ports: + + .. code-block:: console + + ifconfig ens5f0 up + ifconfig ens5f1 up + +#. Create VF device(s): + + Echo the number of VFs to be created into "sriov_numvfs" sysfs entry + of the parent PF. + + Example output: + + .. code-block:: console + + echo 2 > /sys/devices/pci0000:00/0000:00:03.0/0000:81:00.0/sriov_numvfs + +#. Assign VF MAC address: + + Assign MAC address to the VF using iproute2 utility. The syntax is: + ip link set <PF iface> vf <VF id> mac <macaddr> + + Example output: + + .. code-block:: console + + ip link set ens5f0 vf 0 mac 52:54:00:2f:9d:e8 + +#. PCI Passthrough: + + The VF devices may be passed through to the guest VM using virt-manager or + virsh etc. bnx2x PMD should be used to bind the VF devices in the guest VM + using the instructions outlined in the Application notes below. + +#. Running testpmd: + + Follow instructions available in the document + :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` + to run testpmd. + + Example output: + + .. code-block:: console + + [...] + EAL: PCI device 0000:84:00.0 on NUMA socket 1 + EAL: probe driver: 14e4:168e rte_bnx2x_pmd + EAL: PCI memory mapped at 0x7f14f6fe5000 + EAL: PCI memory mapped at 0x7f14f67e5000 + EAL: PCI memory mapped at 0x7f15fbd9b000 + EAL: PCI device 0000:84:00.1 on NUMA socket 1 + EAL: probe driver: 14e4:168e rte_bnx2x_pmd + EAL: PCI memory mapped at 0x7f14f5fe5000 + EAL: PCI memory mapped at 0x7f14f57e5000 + EAL: PCI memory mapped at 0x7f15fbd4f000 + Interactive-mode selected + Configuring Port 0 (socket 0) + PMD: bnx2x_dev_tx_queue_setup(): fp[00] req_bd=512, thresh=512, + usable_bd=1020, total_bd=1024, + tx_pages=4 + PMD: bnx2x_dev_rx_queue_setup(): fp[00] req_bd=128, thresh=0, + usable_bd=510, total_bd=512, + rx_pages=1, cq_pages=8 + PMD: bnx2x_print_adapter_info(): + [...] + Checking link statuses... + Port 0 Link Up - speed 10000 Mbps - full-duplex + Port 1 Link Up - speed 10000 Mbps - full-duplex + Done + testpmd> diff --git a/src/seastar/dpdk/doc/guides/nics/bnxt.rst b/src/seastar/dpdk/doc/guides/nics/bnxt.rst new file mode 100644 index 00000000..9826b350 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/bnxt.rst @@ -0,0 +1,64 @@ +.. BSD LICENSE + Copyright 2016 Broadcom Limited + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Broadcom Limited nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +BNXT Poll Mode Driver +===================== + +The bnxt poll mode library (**librte_pmd_bnxt**) implements support for: + + * **Broadcom NetXtreme-C®/NetXtreme-E® BCM5730X and BCM574XX family of + Ethernet Network Controllers** + + These adapters support Standards compliant 10/25/50/100Gbps 30MPPS + full-duplex throughput. + + Information about the NetXtreme family of adapters can be found in the + `NetXtreme® Brand section + <https://www.broadcom.com/products/ethernet-communication-and-switching?technology%5B%5D=88>`_ + of the `Broadcom website <http://www.broadcom.com/>`_. + + * **Broadcom StrataGX® BCM5871X Series of Communucations Processors** + + These ARM based processors target a broad range of networking applications + including virtual CPE (vCPE) and NFV appliances, 10G service routers and + gateways, control plane processing for Ethernet switches and network + attached storage (NAS). + + Information about the StrataGX family of adapters can be found in the + `StrataGX® BCM5871X Series section + <http://www.broadcom.com/products/enterprise-and-network-processors/processors/bcm58712>`_ + of the `Broadcom website <http://www.broadcom.com/>`_. + +Limitations +----------- + +With the current driver, allocated mbufs must be large enough to hold +the entire received frame. If the mbufs are not large enough, the +packets will be dropped. This is most limiting when jumbo frames are +used. diff --git a/src/seastar/dpdk/doc/guides/nics/build_and_test.rst b/src/seastar/dpdk/doc/guides/nics/build_and_test.rst new file mode 100644 index 00000000..2d70af88 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/build_and_test.rst @@ -0,0 +1,179 @@ +.. BSD LICENSE + Copyright(c) 2017 Cavium, Inc. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Cavium, Inc. nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER(S) OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +.. _pmd_build_and_test: + +Compiling and testing a PMD for a NIC +===================================== + +This section demonstrates how to compile and run a Poll Mode Driver (PMD) for +the available Network Interface Cards in DPDK using TestPMD. + +TestPMD is one of the reference applications distributed with the DPDK. Its main +purpose is to forward packets between Ethernet ports on a network interface and +as such is the best way to test a PMD. + +Refer to the :ref:`testpmd application user guide <testpmd_ug>` for detailed +information on how to build and run testpmd. + +Driver Compilation +------------------ + +To compile a PMD for a platform, run make with appropriate target as shown below. +Use "make" command in Linux and "gmake" in FreeBSD. This will also build testpmd. + +To check available targets: + +.. code-block:: console + + cd <DPDK-source-directory> + make showconfigs + +Example output: + +.. code-block:: console + + arm-armv7a-linuxapp-gcc + arm64-armv8a-linuxapp-gcc + arm64-dpaa2-linuxapp-gcc + arm64-thunderx-linuxapp-gcc + arm64-xgene1-linuxapp-gcc + i686-native-linuxapp-gcc + i686-native-linuxapp-icc + ppc_64-power8-linuxapp-gcc + x86_64-native-bsdapp-clang + x86_64-native-bsdapp-gcc + x86_64-native-linuxapp-clang + x86_64-native-linuxapp-gcc + x86_64-native-linuxapp-icc + x86_x32-native-linuxapp-gcc + +To compile a PMD for Linux x86_64 gcc target, run the following "make" command: + +.. code-block:: console + + make install T=x86_64-native-linuxapp-gcc + +Use ARM (ThunderX, DPAA, X-Gene) or PowerPC target for respective platform. + +For more information, refer to the :ref:`Getting Started Guide for Linux <linux_gsg>` +or :ref:`Getting Started Guide for FreeBSD <freebsd_gsg>` depending on your platform. + +Running testpmd in Linux +------------------------ + +This section demonstrates how to setup and run ``testpmd`` in Linux. + +#. Mount huge pages: + + .. code-block:: console + + mkdir /mnt/huge + mount -t hugetlbfs nodev /mnt/huge + +#. Request huge pages: + + Hugepage memory should be reserved as per application requirement. Check + hugepage size configured in the system and calculate the number of pages + required. + + To reserve 1024 pages of 2MB: + + .. code-block:: console + + echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages + + .. note:: + + Check ``/proc/meminfo`` to find system hugepage size: + + .. code-block:: console + + grep "Hugepagesize:" /proc/meminfo + + Example output: + + .. code-block:: console + + Hugepagesize: 2048 kB + +#. Load ``igb_uio`` or ``vfio-pci`` driver: + + .. code-block:: console + + modprobe uio + insmod ./x86_64-native-linuxapp-gcc/kmod/igb_uio.ko + + or + + .. code-block:: console + + modprobe vfio-pci + +#. Setup VFIO permissions for regular users before binding to ``vfio-pci``: + + .. code-block:: console + + sudo chmod a+x /dev/vfio + + sudo chmod 0666 /dev/vfio/* + +#. Bind the adapters to ``igb_uio`` or ``vfio-pci`` loaded in the previous step: + + .. code-block:: console + + ./usertools/dpdk-devbind.py --bind igb_uio DEVICE1 DEVICE2 ... + + Or setup VFIO permissions for regular users and then bind to ``vfio-pci``: + + .. code-block:: console + + ./usertools/dpdk-devbind.py --bind vfio-pci DEVICE1 DEVICE2 ... + + .. note:: + + DEVICE1, DEVICE2 are specified via PCI "domain:bus:slot.func" syntax or + "bus:slot.func" syntax. + +#. Start ``testpmd`` with basic parameters: + + .. code-block:: console + + ./x86_64-native-linuxapp-gcc/app/testpmd -l 0-3 -n 4 -- -i + + Successful execution will show initialization messages from EAL, PMD and + testpmd application. A prompt will be displayed at the end for user commands + as interactive mode (``-i``) is on. + + .. code-block:: console + + testpmd> + + Refer to the :ref:`testpmd runtime functions <testpmd_runtime>` for a list + of available commands. diff --git a/src/seastar/dpdk/doc/guides/nics/cxgbe.rst b/src/seastar/dpdk/doc/guides/nics/cxgbe.rst new file mode 100644 index 00000000..a205b43f --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/cxgbe.rst @@ -0,0 +1,525 @@ +.. BSD LICENSE + Copyright 2015 Chelsio Communications. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Chelsio Communications nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +CXGBE Poll Mode Driver +====================== + +The CXGBE PMD (**librte_pmd_cxgbe**) provides poll mode driver support +for **Chelsio T5** 10/40 Gbps family of adapters. CXGBE PMD has support +for the latest Linux and FreeBSD operating systems. + +More information can be found at `Chelsio Communications Official Website +<http://www.chelsio.com>`_. + +Features +-------- + +CXGBE PMD has support for: + +- Multiple queues for TX and RX +- Receiver Side Steering (RSS) +- VLAN filtering +- Checksum offload +- Promiscuous mode +- All multicast mode +- Port hardware statistics +- Jumbo frames + +Limitations +----------- + +The Chelsio T5 devices provide two/four ports but expose a single PCI bus +address, thus, librte_pmd_cxgbe registers itself as a +PCI driver that allocates one Ethernet device per detected port. + +For this reason, one cannot whitelist/blacklist a single port without +whitelisting/blacklisting the other ports on the same device. + +Supported Chelsio T5 NICs +------------------------- + +- 1G NICs: T502-BT +- 10G NICs: T520-BT, T520-CR, T520-LL-CR, T520-SO-CR, T540-CR +- 40G NICs: T580-CR, T580-LP-CR, T580-SO-CR +- Other T5 NICs: T522-CR + +Prerequisites +------------- + +- Requires firmware version **1.13.32.0** and higher. Visit + `Chelsio Download Center <http://service.chelsio.com>`_ to get latest firmware + bundled with the latest Chelsio Unified Wire package. + + For Linux, installing and loading the latest cxgb4 kernel driver from the + Chelsio Unified Wire package should get you the latest firmware. More + information can be obtained from the User Guide that is bundled with the + Chelsio Unified Wire package. + + For FreeBSD, the latest firmware obtained from the Chelsio Unified Wire + package must be manually flashed via cxgbetool available in FreeBSD source + repository. + + Instructions on how to manually flash the firmware are given in section + :ref:`linux-installation` for Linux and section :ref:`freebsd-installation` + for FreeBSD. + +Pre-Installation Configuration +------------------------------ + +Config File Options +~~~~~~~~~~~~~~~~~~~ + +The following options can be modified in the ``.config`` file. Please note that +enabling debugging options may affect system performance. + +- ``CONFIG_RTE_LIBRTE_CXGBE_PMD`` (default **y**) + + Toggle compilation of librte_pmd_cxgbe driver. + +- ``CONFIG_RTE_LIBRTE_CXGBE_DEBUG`` (default **n**) + + Toggle display of generic debugging messages. + +- ``CONFIG_RTE_LIBRTE_CXGBE_DEBUG_REG`` (default **n**) + + Toggle display of registers related run-time check messages. + +- ``CONFIG_RTE_LIBRTE_CXGBE_DEBUG_MBOX`` (default **n**) + + Toggle display of firmware mailbox related run-time check messages. + +- ``CONFIG_RTE_LIBRTE_CXGBE_DEBUG_TX`` (default **n**) + + Toggle display of transmission data path run-time check messages. + +- ``CONFIG_RTE_LIBRTE_CXGBE_DEBUG_RX`` (default **n**) + + Toggle display of receiving data path run-time check messages. + +.. _driver-compilation: + +Driver compilation and testing +------------------------------ + +Refer to the document :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` +for details. + +Linux +----- + +.. _linux-installation: + +Linux Installation +~~~~~~~~~~~~~~~~~~ + +Steps to manually install the latest firmware from the downloaded Chelsio +Unified Wire package for Linux operating system are as follows: + +#. Load the kernel module: + + .. code-block:: console + + modprobe cxgb4 + +#. Use ifconfig to get the interface name assigned to Chelsio card: + + .. code-block:: console + + ifconfig -a | grep "00:07:43" + + Example output: + + .. code-block:: console + + p1p1 Link encap:Ethernet HWaddr 00:07:43:2D:EA:C0 + p1p2 Link encap:Ethernet HWaddr 00:07:43:2D:EA:C8 + +#. Install cxgbtool: + + .. code-block:: console + + cd <path_to_uwire>/tools/cxgbtool + make install + +#. Use cxgbtool to load the firmware config file onto the card: + + .. code-block:: console + + cxgbtool p1p1 loadcfg <path_to_uwire>/src/network/firmware/t5-config.txt + +#. Use cxgbtool to load the firmware image onto the card: + + .. code-block:: console + + cxgbtool p1p1 loadfw <path_to_uwire>/src/network/firmware/t5fw-*.bin + +#. Unload and reload the kernel module: + + .. code-block:: console + + modprobe -r cxgb4 + modprobe cxgb4 + +#. Verify with ethtool: + + .. code-block:: console + + ethtool -i p1p1 | grep "firmware" + + Example output: + + .. code-block:: console + + firmware-version: 1.13.32.0, TP 0.1.4.8 + +Running testpmd +~~~~~~~~~~~~~~~ + +This section demonstrates how to launch **testpmd** with Chelsio T5 +devices managed by librte_pmd_cxgbe in Linux operating system. + +#. Load the kernel module: + + .. code-block:: console + + modprobe cxgb4 + +#. Get the PCI bus addresses of the interfaces bound to cxgb4 driver: + + .. code-block:: console + + dmesg | tail -2 + + Example output: + + .. code-block:: console + + cxgb4 0000:02:00.4 p1p1: renamed from eth0 + cxgb4 0000:02:00.4 p1p2: renamed from eth1 + + .. note:: + + Both the interfaces of a Chelsio T5 2-port adapter are bound to the + same PCI bus address. + +#. Unload the kernel module: + + .. code-block:: console + + modprobe -ar cxgb4 csiostor + +#. Running testpmd + + Follow instructions available in the document + :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` + to run testpmd. + + .. note:: + + Currently, CXGBE PMD only supports the binding of PF4 for Chelsio T5 NICs. + + Example output: + + .. code-block:: console + + [...] + EAL: PCI device 0000:02:00.4 on NUMA socket -1 + EAL: probe driver: 1425:5401 rte_cxgbe_pmd + EAL: PCI memory mapped at 0x7fd7c0200000 + EAL: PCI memory mapped at 0x7fd77cdfd000 + EAL: PCI memory mapped at 0x7fd7c10b7000 + PMD: rte_cxgbe_pmd: fw: 1.13.32.0, TP: 0.1.4.8 + PMD: rte_cxgbe_pmd: Coming up as MASTER: Initializing adapter + Interactive-mode selected + Configuring Port 0 (socket 0) + Port 0: 00:07:43:2D:EA:C0 + Configuring Port 1 (socket 0) + Port 1: 00:07:43:2D:EA:C8 + Checking link statuses... + PMD: rte_cxgbe_pmd: Port0: passive DA port module inserted + PMD: rte_cxgbe_pmd: Port1: passive DA port module inserted + Port 0 Link Up - speed 10000 Mbps - full-duplex + Port 1 Link Up - speed 10000 Mbps - full-duplex + Done + testpmd> + + .. note:: + + Flow control pause TX/RX is disabled by default and can be enabled via + testpmd. Refer section :ref:`flow-control` for more details. + +FreeBSD +------- + +.. _freebsd-installation: + +FreeBSD Installation +~~~~~~~~~~~~~~~~~~~~ + +Steps to manually install the latest firmware from the downloaded Chelsio +Unified Wire package for FreeBSD operating system are as follows: + +#. Load the kernel module: + + .. code-block:: console + + kldload if_cxgbe + +#. Use dmesg to get the t5nex instance assigned to the Chelsio card: + + .. code-block:: console + + dmesg | grep "t5nex" + + Example output: + + .. code-block:: console + + t5nex0: <Chelsio T520-CR> irq 16 at device 0.4 on pci2 + cxl0: <port 0> on t5nex0 + cxl1: <port 1> on t5nex0 + t5nex0: PCIe x8, 2 ports, 14 MSI-X interrupts, 31 eq, 13 iq + + In the example above, a Chelsio T520-CR card is bound to a t5nex0 instance. + +#. Install cxgbetool from FreeBSD source repository: + + .. code-block:: console + + cd <path_to_FreeBSD_source>/tools/tools/cxgbetool/ + make && make install + +#. Use cxgbetool to load the firmware image onto the card: + + .. code-block:: console + + cxgbetool t5nex0 loadfw <path_to_uwire>/src/network/firmware/t5fw-*.bin + +#. Unload and reload the kernel module: + + .. code-block:: console + + kldunload if_cxgbe + kldload if_cxgbe + +#. Verify with sysctl: + + .. code-block:: console + + sysctl -a | grep "t5nex" | grep "firmware" + + Example output: + + .. code-block:: console + + dev.t5nex.0.firmware_version: 1.13.32.0 + +Running testpmd +~~~~~~~~~~~~~~~ + +This section demonstrates how to launch **testpmd** with Chelsio T5 +devices managed by librte_pmd_cxgbe in FreeBSD operating system. + +#. Change to DPDK source directory where the target has been compiled in + section :ref:`driver-compilation`: + + .. code-block:: console + + cd <DPDK-source-directory> + +#. Copy the contigmem kernel module to /boot/kernel directory: + + .. code-block:: console + + cp x86_64-native-bsdapp-clang/kmod/contigmem.ko /boot/kernel/ + +#. Add the following lines to /boot/loader.conf: + + .. code-block:: console + + # reserve 2 x 1G blocks of contiguous memory using contigmem driver + hw.contigmem.num_buffers=2 + hw.contigmem.buffer_size=1073741824 + # load contigmem module during boot process + contigmem_load="YES" + + The above lines load the contigmem kernel module during boot process and + allocate 2 x 1G blocks of contiguous memory to be used for DPDK later on. + This is to avoid issues with potential memory fragmentation during later + system up time, which may result in failure of allocating the contiguous + memory required for the contigmem kernel module. + +#. Restart the system and ensure the contigmem module is loaded successfully: + + .. code-block:: console + + reboot + kldstat | grep "contigmem" + + Example output: + + .. code-block:: console + + 2 1 0xffffffff817f1000 3118 contigmem.ko + +#. Repeat step 1 to ensure that you are in the DPDK source directory. + +#. Load the cxgbe kernel module: + + .. code-block:: console + + kldload if_cxgbe + +#. Get the PCI bus addresses of the interfaces bound to t5nex driver: + + .. code-block:: console + + pciconf -l | grep "t5nex" + + Example output: + + .. code-block:: console + + t5nex0@pci0:2:0:4: class=0x020000 card=0x00001425 chip=0x54011425 rev=0x00 + + In the above example, the t5nex0 is bound to 2:0:4 bus address. + + .. note:: + + Both the interfaces of a Chelsio T5 2-port adapter are bound to the + same PCI bus address. + +#. Unload the kernel module: + + .. code-block:: console + + kldunload if_cxgbe + +#. Set the PCI bus addresses to hw.nic_uio.bdfs kernel environment parameter: + + .. code-block:: console + + kenv hw.nic_uio.bdfs="2:0:4" + + This automatically binds 2:0:4 to nic_uio kernel driver when it is loaded in + the next step. + + .. note:: + + Currently, CXGBE PMD only supports the binding of PF4 for Chelsio T5 NICs. + +#. Load nic_uio kernel driver: + + .. code-block:: console + + kldload ./x86_64-native-bsdapp-clang/kmod/nic_uio.ko + +#. Start testpmd with basic parameters: + + .. code-block:: console + + ./x86_64-native-bsdapp-clang/app/testpmd -l 0-3 -n 4 -w 0000:02:00.4 -- -i + + Example output: + + .. code-block:: console + + [...] + EAL: PCI device 0000:02:00.4 on NUMA socket 0 + EAL: probe driver: 1425:5401 rte_cxgbe_pmd + EAL: PCI memory mapped at 0x8007ec000 + EAL: PCI memory mapped at 0x842800000 + EAL: PCI memory mapped at 0x80086c000 + PMD: rte_cxgbe_pmd: fw: 1.13.32.0, TP: 0.1.4.8 + PMD: rte_cxgbe_pmd: Coming up as MASTER: Initializing adapter + Interactive-mode selected + Configuring Port 0 (socket 0) + Port 0: 00:07:43:2D:EA:C0 + Configuring Port 1 (socket 0) + Port 1: 00:07:43:2D:EA:C8 + Checking link statuses... + PMD: rte_cxgbe_pmd: Port0: passive DA port module inserted + PMD: rte_cxgbe_pmd: Port1: passive DA port module inserted + Port 0 Link Up - speed 10000 Mbps - full-duplex + Port 1 Link Up - speed 10000 Mbps - full-duplex + Done + testpmd> + +.. note:: + + Flow control pause TX/RX is disabled by default and can be enabled via + testpmd. Refer section :ref:`flow-control` for more details. + +Sample Application Notes +------------------------ + +.. _flow-control: + +Enable/Disable Flow Control +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Flow control pause TX/RX is disabled by default and can be enabled via +testpmd as follows: + +.. code-block:: console + + testpmd> set flow_ctrl rx on tx on 0 0 0 0 mac_ctrl_frame_fwd off autoneg on 0 + testpmd> set flow_ctrl rx on tx on 0 0 0 0 mac_ctrl_frame_fwd off autoneg on 1 + +To disable again, run: + +.. code-block:: console + + testpmd> set flow_ctrl rx off tx off 0 0 0 0 mac_ctrl_frame_fwd off autoneg off 0 + testpmd> set flow_ctrl rx off tx off 0 0 0 0 mac_ctrl_frame_fwd off autoneg off 1 + +Jumbo Mode +~~~~~~~~~~ + +There are two ways to enable sending and receiving of jumbo frames via testpmd. +One method involves using the **mtu** command, which changes the mtu of an +individual port without having to stop the selected port. Another method +involves stopping all the ports first and then running **max-pkt-len** command +to configure the mtu of all the ports with a single command. + +- To configure each port individually, run the mtu command as follows: + + .. code-block:: console + + testpmd> port config mtu 0 9000 + testpmd> port config mtu 1 9000 + +- To configure all the ports at once, stop all the ports first and run the + max-pkt-len command as follows: + + .. code-block:: console + + testpmd> port stop all + testpmd> port config all max-pkt-len 9000 diff --git a/src/seastar/dpdk/doc/guides/nics/dpaa2.rst b/src/seastar/dpdk/doc/guides/nics/dpaa2.rst new file mode 100644 index 00000000..1ca27d45 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/dpaa2.rst @@ -0,0 +1,594 @@ +.. BSD LICENSE + Copyright (C) NXP. 2016. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of NXP nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +DPAA2 Poll Mode Driver +====================== + +The DPAA2 NIC PMD (**librte_pmd_dpaa2**) provides poll mode driver +support for the inbuilt NIC found in the **NXP DPAA2** SoC family. + +More information can be found at `NXP Official Website +<http://www.nxp.com/products/microcontrollers-and-processors/arm-processors/qoriq-arm-processors:QORIQ-ARM>`_. + +NXP DPAA2 (Data Path Acceleration Architecture Gen2) +---------------------------------------------------- + +This section provides an overview of the NXP DPAA2 architecture +and how it is integrated into the DPDK. + +Contents summary + +- DPAA2 overview +- Overview of DPAA2 objects +- DPAA2 driver architecture overview + +.. _dpaa2_overview: + +DPAA2 Overview +~~~~~~~~~~~~~~ + +Reference: `FSL MC BUS in Linux Kernel <https://www.kernel.org/doc/readme/drivers-staging-fsl-mc-README.txt>`_. + +DPAA2 is a hardware architecture designed for high-speed network +packet processing. DPAA2 consists of sophisticated mechanisms for +processing Ethernet packets, queue management, buffer management, +autonomous L2 switching, virtual Ethernet bridging, and accelerator +(e.g. crypto) sharing. + +A DPAA2 hardware component called the Management Complex (or MC) manages the +DPAA2 hardware resources. The MC provides an object-based abstraction for +software drivers to use the DPAA2 hardware. + +The MC uses DPAA2 hardware resources such as queues, buffer pools, and +network ports to create functional objects/devices such as network +interfaces, an L2 switch, or accelerator instances. + +The MC provides memory-mapped I/O command interfaces (MC portals) +which DPAA2 software drivers use to operate on DPAA2 objects: + +The diagram below shows an overview of the DPAA2 resource management +architecture: + +.. code-block:: console + + +--------------------------------------+ + | OS | + | DPAA2 drivers | + | | | + +-----------------------------|--------+ + | + | (create,discover,connect + | config,use,destroy) + | + DPAA2 | + +------------------------| mc portal |-+ + | | | + | +- - - - - - - - - - - - -V- - -+ | + | | | | + | | Management Complex (MC) | | + | | | | + | +- - - - - - - - - - - - - - - -+ | + | | + | Hardware Hardware | + | Resources Objects | + | --------- ------- | + | -queues -DPRC | + | -buffer pools -DPMCP | + | -Eth MACs/ports -DPIO | + | -network interface -DPNI | + | profiles -DPMAC | + | -queue portals -DPBP | + | -MC portals ... | + | ... | + | | + +--------------------------------------+ + +The MC mediates operations such as create, discover, +connect, configuration, and destroy. Fast-path operations +on data, such as packet transmit/receive, are not mediated by +the MC and are done directly using memory mapped regions in +DPIO objects. + +Overview of DPAA2 Objects +~~~~~~~~~~~~~~~~~~~~~~~~~ + +The section provides a brief overview of some key DPAA2 objects. +A simple scenario is described illustrating the objects involved +in creating a network interfaces. + +DPRC (Datapath Resource Container) + + A DPRC is a container object that holds all the other + types of DPAA2 objects. In the example diagram below there + are 8 objects of 5 types (DPMCP, DPIO, DPBP, DPNI, and DPMAC) + in the container. + +.. code-block:: console + + +---------------------------------------------------------+ + | DPRC | + | | + | +-------+ +-------+ +-------+ +-------+ +-------+ | + | | DPMCP | | DPIO | | DPBP | | DPNI | | DPMAC | | + | +-------+ +-------+ +-------+ +---+---+ +---+---+ | + | | DPMCP | | DPIO | | + | +-------+ +-------+ | + | | DPMCP | | + | +-------+ | + | | + +---------------------------------------------------------+ + +From the point of view of an OS, a DPRC behaves similar to a plug and +play bus, like PCI. DPRC commands can be used to enumerate the contents +of the DPRC, discover the hardware objects present (including mappable +regions and interrupts). + +.. code-block:: console + + DPRC.1 (bus) + | + +--+--------+-------+-------+-------+ + | | | | | + DPMCP.1 DPIO.1 DPBP.1 DPNI.1 DPMAC.1 + DPMCP.2 DPIO.2 + DPMCP.3 + +Hardware objects can be created and destroyed dynamically, providing +the ability to hot plug/unplug objects in and out of the DPRC. + +A DPRC has a mappable MMIO region (an MC portal) that can be used +to send MC commands. It has an interrupt for status events (like +hotplug). + +All objects in a container share the same hardware "isolation context". +This means that with respect to an IOMMU the isolation granularity +is at the DPRC (container) level, not at the individual object +level. + +DPRCs can be defined statically and populated with objects +via a config file passed to the MC when firmware starts +it. There is also a Linux user space tool called "restool" +that can be used to create/destroy containers and objects +dynamically. + +DPAA2 Objects for an Ethernet Network Interface +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A typical Ethernet NIC is monolithic-- the NIC device contains TX/RX +queuing mechanisms, configuration mechanisms, buffer management, +physical ports, and interrupts. DPAA2 uses a more granular approach +utilizing multiple hardware objects. Each object provides specialized +functions. Groups of these objects are used by software to provide +Ethernet network interface functionality. This approach provides +efficient use of finite hardware resources, flexibility, and +performance advantages. + +The diagram below shows the objects needed for a simple +network interface configuration on a system with 2 CPUs. + +.. code-block:: console + + +---+---+ +---+---+ + CPU0 CPU1 + +---+---+ +---+---+ + | | + +---+---+ +---+---+ + DPIO DPIO + +---+---+ +---+---+ + \ / + \ / + \ / + +---+---+ + DPNI --- DPBP,DPMCP + +---+---+ + | + | + +---+---+ + DPMAC + +---+---+ + | + port/PHY + +Below the objects are described. For each object a brief description +is provided along with a summary of the kinds of operations the object +supports and a summary of key resources of the object (MMIO regions +and IRQs). + +DPMAC (Datapath Ethernet MAC): represents an Ethernet MAC, a +hardware device that connects to an Ethernet PHY and allows +physical transmission and reception of Ethernet frames. + +- MMIO regions: none +- IRQs: DPNI link change +- commands: set link up/down, link config, get stats, IRQ config, enable, reset + +DPNI (Datapath Network Interface): contains TX/RX queues, +network interface configuration, and RX buffer pool configuration +mechanisms. The TX/RX queues are in memory and are identified by +queue number. + +- MMIO regions: none +- IRQs: link state +- commands: port config, offload config, queue config, parse/classify config, IRQ config, enable, reset + +DPIO (Datapath I/O): provides interfaces to enqueue and dequeue +packets and do hardware buffer pool management operations. The DPAA2 +architecture separates the mechanism to access queues (the DPIO object) +from the queues themselves. The DPIO provides an MMIO interface to +enqueue/dequeue packets. To enqueue something a descriptor is written +to the DPIO MMIO region, which includes the target queue number. +There will typically be one DPIO assigned to each CPU. This allows all +CPUs to simultaneously perform enqueue/dequeued operations. DPIOs are +expected to be shared by different DPAA2 drivers. + +- MMIO regions: queue operations, buffer management +- IRQs: data availability, congestion notification, buffer pool depletion +- commands: IRQ config, enable, reset + +DPBP (Datapath Buffer Pool): represents a hardware buffer +pool. + +- MMIO regions: none +- IRQs: none +- commands: enable, reset + +DPMCP (Datapath MC Portal): provides an MC command portal. +Used by drivers to send commands to the MC to manage +objects. + +- MMIO regions: MC command portal +- IRQs: command completion +- commands: IRQ config, enable, reset + +Object Connections +~~~~~~~~~~~~~~~~~~ + +Some objects have explicit relationships that must +be configured: + +- DPNI <--> DPMAC +- DPNI <--> DPNI +- DPNI <--> L2-switch-port + +A DPNI must be connected to something such as a DPMAC, +another DPNI, or L2 switch port. The DPNI connection +is made via a DPRC command. + +.. code-block:: console + + +-------+ +-------+ + | DPNI | | DPMAC | + +---+---+ +---+---+ + | | + +==========+ + +- DPNI <--> DPBP + +A network interface requires a 'buffer pool' (DPBP object) which provides +a list of pointers to memory where received Ethernet data is to be copied. +The Ethernet driver configures the DPBPs associated with the network +interface. + +Interrupts +~~~~~~~~~~ + +All interrupts generated by DPAA2 objects are message +interrupts. At the hardware level message interrupts +generated by devices will normally have 3 components-- +1) a non-spoofable 'device-id' expressed on the hardware +bus, 2) an address, 3) a data value. + +In the case of DPAA2 devices/objects, all objects in the +same container/DPRC share the same 'device-id'. +For ARM-based SoC this is the same as the stream ID. + + +DPAA2 DPDK - Poll Mode Driver Overview +-------------------------------------- + +This section provides an overview of the drivers for +DPAA2-- 1) the bus driver and associated "DPAA2 infrastructure" +drivers and 2) functional object drivers (such as Ethernet). + +As described previously, a DPRC is a container that holds the other +types of DPAA2 objects. It is functionally similar to a plug-and-play +bus controller. + +Each object in the DPRC is a Linux "device" and is bound to a driver. +The diagram below shows the dpaa2 drivers involved in a networking +scenario and the objects bound to each driver. A brief description +of each driver follows. + +.. code-block: console + + + +------------+ + | DPDK DPAA2 | + | PMD | + +------------+ +------------+ + | Ethernet |.......| Mempool | + . . . . . . . . . | (DPNI) | | (DPBP) | + . +---+---+----+ +-----+------+ + . ^ | . + . | |<enqueue, . + . | | dequeue> . + . | | . + . +---+---V----+ . + . . . . . . . . . . .| DPIO driver| . + . . | (DPIO) | . + . . +-----+------+ . + . . | QBMAN | . + . . | Driver | . + +----+------+-------+ +-----+----- | . + | dpaa2 bus | | . + | VFIO fslmc-bus |....................|..................... + | | | + | /bus/fslmc | | + +-------------------+ | + | + ========================== HARDWARE =====|======================= + DPIO + | + DPNI---DPBP + | + DPMAC + | + PHY + =========================================|======================== + + +A brief description of each driver is provided below. + +DPAA2 bus driver +~~~~~~~~~~~~~~~~ + +The DPAA2 bus driver is a rte_bus driver which scans the fsl-mc bus. +Key functions include: + +- Reading the container and setting up vfio group +- Scanning and parsing the various MC objects and adding them to + their respective device list. + +Additionally, it also provides the object driver for generic MC objects. + +DPIO driver +~~~~~~~~~~~ + +The DPIO driver is bound to DPIO objects and provides services that allow +other drivers such as the Ethernet driver to enqueue and dequeue data for +their respective objects. +Key services include: + +- Data availability notifications +- Hardware queuing operations (enqueue and dequeue of data) +- Hardware buffer pool management + +To transmit a packet the Ethernet driver puts data on a queue and +invokes a DPIO API. For receive, the Ethernet driver registers +a data availability notification callback. To dequeue a packet +a DPIO API is used. + +There is typically one DPIO object per physical CPU for optimum +performance, allowing different CPUs to simultaneously enqueue +and dequeue data. + +The DPIO driver operates on behalf of all DPAA2 drivers +active -- Ethernet, crypto, compression, etc. + +DPBP based Mempool driver +~~~~~~~~~~~~~~~~~~~~~~~~~ + +The DPBP driver is bound to a DPBP objects and provides sevices to +create a hardware offloaded packet buffer mempool. + +DPAA2 NIC Driver +~~~~~~~~~~~~~~~~ +The Ethernet driver is bound to a DPNI and implements the kernel +interfaces needed to connect the DPAA2 network interface to +the network stack. + +Each DPNI corresponds to a DPDK network interface. + +Features +^^^^^^^^ + +Features of the DPAA2 PMD are: + +- Multiple queues for TX and RX +- Receive Side Scaling (RSS) +- Packet type information +- Checksum offload +- Promiscuous mode + +Supported DPAA2 SoCs +-------------------- + +- LS2080A/LS2040A +- LS2084A/LS2044A +- LS2088A/LS2048A +- LS1088A/LS1048A + +Prerequisites +------------- + +There are three main pre-requisities for executing DPAA2 PMD on a DPAA2 +compatible board: + +1. **ARM 64 Tool Chain** + + For example, the `*aarch64* Linaro Toolchain <https://releases.linaro.org/components/toolchain/binaries/4.9-2017.01/aarch64-linux-gnu>`_. + +2. **Linux Kernel** + + It can be obtained from `NXP's Github hosting <https://github.com/qoriq-open-source/linux>`_. + +3. **Rootfile system** + + Any *aarch64* supporting filesystem can be used. For example, + Ubuntu 15.10 (Wily) or 16.04 LTS (Xenial) userland which can be obtained + from `here <http://cdimage.ubuntu.com/ubuntu-base/releases/16.04/release/ubuntu-base-16.04.1-base-arm64.tar.gz>`_. + +As an alternative method, DPAA2 PMD can also be executed using images provided +as part of SDK from NXP. The SDK includes all the above prerequisites necessary +to bring up a DPAA2 board. + +The following dependencies are not part of DPDK and must be installed +separately: + +- **NXP Linux SDK** + + NXP Linux software development kit (SDK) includes support for family + of QorIQ® ARM-Architecture-based system on chip (SoC) processors + and corresponding boards. + + It includes the Linux board support packages (BSPs) for NXP SoCs, + a fully operational tool chain, kernel and board specific modules. + + SDK and related information can be obtained from: `NXP QorIQ SDK <http://www.nxp.com/products/software-and-tools/run-time-software/linux-sdk/linux-sdk-for-qoriq-processors:SDKLINUX>`_. + +- **DPDK Helper Scripts** + + DPAA2 based resources can be configured easily with the help of ready scripts + as provided in the DPDK helper repository. + + `DPDK Helper Scripts <https://github.com/qoriq-open-source/dpdk-helper>`_. + +Currently supported by DPDK: + +- NXP SDK **2.0+**. +- MC Firmware version **10.0.0** and higher. +- Supported architectures: **arm64 LE**. + +- Follow the DPDK :ref:`Getting Started Guide for Linux <linux_gsg>` to setup the basic DPDK environment. + +.. note:: + + Some part of fslmc bus code (mc flib - object library) routines are + dual licensed (BSD & GPLv2). + +Pre-Installation Configuration +------------------------------ + +Config File Options +~~~~~~~~~~~~~~~~~~~ + +The following options can be modified in the ``config`` file. +Please note that enabling debugging options may affect system performance. + +- ``CONFIG_RTE_LIBRTE_FSLMC_BUS`` (default ``n``) + + By default it is enabled only for defconfig_arm64-dpaa2-* config. + Toggle compilation of the ``librte_bus_fslmc`` driver. + +- ``CONFIG_RTE_LIBRTE_DPAA2_PMD`` (default ``n``) + + By default it is enabled only for defconfig_arm64-dpaa2-* config. + Toggle compilation of the ``librte_pmd_dpaa2`` driver. + +- ``CONFIG_RTE_LIBRTE_DPAA2_DEBUG_DRIVER`` (default ``n``) + + Toggle display of generic debugging messages + +- ``CONFIG_RTE_LIBRTE_DPAA2_USE_PHYS_IOVA`` (default ``y``) + + Toggle to use physical address vs virtual address for hardware accelerators. + +- ``CONFIG_RTE_LIBRTE_DPAA2_DEBUG_INIT`` (default ``n``) + + Toggle display of initialization related messages. + +- ``CONFIG_RTE_LIBRTE_DPAA2_DEBUG_RX`` (default ``n``) + + Toggle display of receive fast path run-time message + +- ``CONFIG_RTE_LIBRTE_DPAA2_DEBUG_TX`` (default ``n``) + + Toggle display of transmit fast path run-time message + +- ``CONFIG_RTE_LIBRTE_DPAA2_DEBUG_TX_FREE`` (default ``n``) + + Toggle display of transmit fast path buffer free run-time message + +Driver compilation and testing +------------------------------ + +Refer to the document :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` +for details. + +#. Running testpmd: + + Follow instructions available in the document + :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` + to run testpmd. + + Example output: + + .. code-block:: console + + ./arm64-dpaa2-linuxapp-gcc/testpmd -c 0xff -n 1 \ + -- -i --portmask=0x3 --nb-cores=1 --no-flush-rx + + ..... + EAL: Registered [pci] bus. + EAL: Registered [fslmc] bus. + EAL: Detected 8 lcore(s) + EAL: Probing VFIO support... + EAL: VFIO support initialized + ..... + PMD: DPAA2: Processing Container = dprc.2 + EAL: fslmc: DPRC contains = 51 devices + EAL: fslmc: Bus scan completed + ..... + Configuring Port 0 (socket 0) + Port 0: 00:00:00:00:00:01 + Configuring Port 1 (socket 0) + Port 1: 00:00:00:00:00:02 + ..... + Checking link statuses... + Port 0 Link Up - speed 10000 Mbps - full-duplex + Port 1 Link Up - speed 10000 Mbps - full-duplex + Done + testpmd> + +Limitations +----------- + +Platform Requirement +~~~~~~~~~~~~~~~~~~~~ +DPAA2 drivers for DPDK can only work on NXP SoCs as listed in the +``Supported DPAA2 SoCs``. + +Maximum packet length +~~~~~~~~~~~~~~~~~~~~~ + +The DPAA2 SoC family support a maximum of a 10240 jumbo frame. The value +is fixed and cannot be changed. So, even when the ``rxmode.max_rx_pkt_len`` +member of ``struct rte_eth_conf`` is set to a value lower than 10240, frames +up to 10240 bytes can still reach the host interface. diff --git a/src/seastar/dpdk/doc/guides/nics/e1000em.rst b/src/seastar/dpdk/doc/guides/nics/e1000em.rst new file mode 100644 index 00000000..265b147a --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/e1000em.rst @@ -0,0 +1,182 @@ +.. BSD LICENSE + Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Driver for VM Emulated Devices +============================== + +The DPDK EM poll mode driver supports the following emulated devices: + +* qemu-kvm emulated Intel® 82540EM Gigabit Ethernet Controller (qemu e1000 device) + +* VMware* emulated Intel® 82545EM Gigabit Ethernet Controller + +* VMware emulated Intel® 8274L Gigabit Ethernet Controller. + +Validated Hypervisors +--------------------- + +The validated hypervisors are: + +* KVM (Kernel Virtual Machine) with Qemu, version 0.14.0 + +* KVM (Kernel Virtual Machine) with Qemu, version 0.15.1 + +* VMware ESXi 5.0, Update 1 + +Recommended Guest Operating System in Virtual Machine +----------------------------------------------------- + +The recommended guest operating system in a virtualized environment is: + +* Fedora* 18 (64-bit) + +For supported kernel versions, refer to the *DPDK Release Notes*. + +Setting Up a KVM Virtual Machine +-------------------------------- + +The following describes a target environment: + +* Host Operating System: Fedora 14 + +* Hypervisor: KVM (Kernel Virtual Machine) with Qemu version, 0.14.0 + +* Guest Operating System: Fedora 14 + +* Linux Kernel Version: Refer to the DPDK Getting Started Guide + +* Target Applications: testpmd + +The setup procedure is as follows: + +#. Download qemu-kvm-0.14.0 from + `http://sourceforge.net/projects/kvm/files/qemu-kvm/ <http://sourceforge.net/projects/kvm/files/qemu-kvm/>`_ + and install it in the Host OS using the following steps: + + When using a recent kernel (2.6.25+) with kvm modules included: + + .. code-block:: console + + tar xzf qemu-kvm-release.tar.gz cd qemu-kvm-release + ./configure --prefix=/usr/local/kvm + make + sudo make install + sudo /sbin/modprobe kvm-intel + + When using an older kernel or a kernel from a distribution without the kvm modules, + you must download (from the same link), compile and install the modules yourself: + + .. code-block:: console + + tar xjf kvm-kmod-release.tar.bz2 + cd kvm-kmod-release + ./configure + make + sudo make install + sudo /sbin/modprobe kvm-intel + + Note that qemu-kvm installs in the /usr/local/bin directory. + + For more details about KVM configuration and usage, please refer to: + `http://www.linux-kvm.org/page/HOWTO1 <http://www.linux-kvm.org/page/HOWTO1>`_. + +#. Create a Virtual Machine and install Fedora 14 on the Virtual Machine. + This is referred to as the Guest Operating System (Guest OS). + +#. Start the Virtual Machine with at least one emulated e1000 device. + + .. note:: + + The Qemu provides several choices for the emulated network device backend. + Most commonly used is a TAP networking backend that uses a TAP networking device in the host. + For more information about Qemu supported networking backends and different options for configuring networking at Qemu, + please refer to: + + — `http://www.linux-kvm.org/page/Networking <http://www.linux-kvm.org/page/Networking>`_ + + — `http://wiki.qemu.org/Documentation/Networking <http://wiki.qemu.org/Documentation/Networking>`_ + + — `http://qemu.weilnetz.de/qemu-doc.html <http://qemu.weilnetz.de/qemu-doc.html>`_ + + For example, to start a VM with two emulated e1000 devices, issue the following command: + + .. code-block:: console + + /usr/local/kvm/bin/qemu-system-x86_64 -cpu host -smp 4 -hda qemu1.raw -m 1024 + -net nic,model=e1000,vlan=1,macaddr=DE:AD:1E:00:00:01 + -net tap,vlan=1,ifname=tapvm01,script=no,downscript=no + -net nic,model=e1000,vlan=2,macaddr=DE:AD:1E:00:00:02 + -net tap,vlan=2,ifname=tapvm02,script=no,downscript=no + + where: + + — -m = memory to assign + + — -smp = number of smp cores + + — -hda = virtual disk image + + This command starts a new virtual machine with two emulated 82540EM devices, + backed up with two TAP networking host interfaces, tapvm01 and tapvm02. + + .. code-block:: console + + # ip tuntap show + tapvm01: tap + tapvm02: tap + +#. Configure your TAP networking interfaces using ip/ifconfig tools. + +#. Log in to the guest OS and check that the expected emulated devices exist: + + .. code-block:: console + + # lspci -d 8086:100e + 00:04.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03) + 00:05.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03) + +#. Install the DPDK and run testpmd. + +Known Limitations of Emulated Devices +------------------------------------- + +The following are known limitations: + +#. The Qemu e1000 RX path does not support multiple descriptors/buffers per packet. + Therefore, rte_mbuf should be big enough to hold the whole packet. + For example, to allow testpmd to receive jumbo frames, use the following: + + testpmd [options] -- --mbuf-size=<your-max-packet-size> + +#. Qemu e1000 does not validate the checksum of incoming packets. + +#. Qemu e1000 only supports one interrupt source, so link and Rx interrupt should be exclusive. + +#. Qemu e1000 does not support interrupt auto-clear, application should disable interrupt immediately when woken up. diff --git a/src/seastar/dpdk/doc/guides/nics/ena.rst b/src/seastar/dpdk/doc/guides/nics/ena.rst new file mode 100644 index 00000000..d19912e9 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/ena.rst @@ -0,0 +1,222 @@ +.. BSD LICENSE + + Copyright (c) 2015-2016 Amazon.com, Inc. or its affiliates. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Amazon.com, Inc. nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +ENA Poll Mode Driver +==================== + +The ENA PMD is a DPDK poll-mode driver for the Amazon Elastic +Network Adapter (ENA) family. + +Overview +-------- + +The ENA driver exposes a lightweight management interface with a +minimal set of memory mapped registers and an extendable command set +through an Admin Queue. + +The driver supports a wide range of ENA adapters, is link-speed +independent (i.e., the same driver is used for 10GbE, 25GbE, 40GbE, +etc.), and it negotiates and supports an extendable feature set. + +ENA adapters allow high speed and low overhead Ethernet traffic +processing by providing a dedicated Tx/Rx queue pair per CPU core. + +The ENA driver supports industry standard TCP/IP offload features such +as checksum offload and TCP transmit segmentation offload (TSO). + +Receive-side scaling (RSS) is supported for multi-core scaling. + +Some of the ENA devices support a working mode called Low-latency +Queue (LLQ), which saves several more microseconds. + +Management Interface +-------------------- + +ENA management interface is exposed by means of: + +* Device Registers +* Admin Queue (AQ) and Admin Completion Queue (ACQ) + +ENA device memory-mapped PCIe space for registers (MMIO registers) +are accessed only during driver initialization and are not involved +in further normal device operation. + +AQ is used for submitting management commands, and the +results/responses are reported asynchronously through ACQ. + +ENA introduces a very small set of management commands with room for +vendor-specific extensions. Most of the management operations are +framed in a generic Get/Set feature command. + +The following admin queue commands are supported: + +* Create I/O submission queue +* Create I/O completion queue +* Destroy I/O submission queue +* Destroy I/O completion queue +* Get feature +* Set feature +* Get statistics + +Refer to ``ena_admin_defs.h`` for the list of supported Get/Set Feature +properties. + +Data Path Interface +------------------- + +I/O operations are based on Tx and Rx Submission Queues (Tx SQ and Rx +SQ correspondingly). Each SQ has a completion queue (CQ) associated +with it. + +The SQs and CQs are implemented as descriptor rings in contiguous +physical memory. + +Refer to ``ena_eth_io_defs.h`` for the detailed structure of the descriptor + +The driver supports multi-queue for both Tx and Rx. + +Configuration information +------------------------- + +**DPDK Configuration Parameters** + + The following configuration options are available for the ENA PMD: + + * **CONFIG_RTE_LIBRTE_ENA_PMD** (default y): Enables or disables inclusion + of the ENA PMD driver in the DPDK compilation. + + + * **CONFIG_RTE_LIBRTE_ENA_DEBUG_INIT** (default y): Enables or disables debug + logging of device initialization within the ENA PMD driver. + + * **CONFIG_RTE_LIBRTE_ENA_DEBUG_RX** (default n): Enables or disables debug + logging of RX logic within the ENA PMD driver. + + * **CONFIG_RTE_LIBRTE_ENA_DEBUG_TX** (default n): Enables or disables debug + logging of TX logic within the ENA PMD driver. + + * **CONFIG_RTE_LIBRTE_ENA_COM_DEBUG** (default n): Enables or disables debug + logging of low level tx/rx logic in ena_com(base) within the ENA PMD driver. + +**ENA Configuration Parameters** + + * **Number of Queues** + + This is the requested number of queues upon initialization, however, the actual + number of receive and transmit queues to be created will be the minimum between + the maximal number supported by the device and number of queues requested. + + * **Size of Queues** + + This is the requested size of receive/transmit queues, while the actual size + will be the minimum between the requested size and the maximal receive/transmit + supported by the device. + +Building DPDK +------------- + +See the :ref:`DPDK Getting Started Guide for Linux <linux_gsg>` for +instructions on how to build DPDK. + +By default the ENA PMD library will be built into the DPDK library. + +For configuring and using UIO and VFIO frameworks, please also refer :ref:`the +documentation that comes with DPDK suite <linux_gsg>`. + +Supported ENA adapters +---------------------- + +Current ENA PMD supports the following ENA adapters including: + +* ``1d0f:ec20`` - ENA VF +* ``1d0f:ec21`` - ENA VF with LLQ support + +Supported Operating Systems +--------------------------- + +Any Linux distribution fulfilling the conditions described in ``System Requirements`` +section of :ref:`the DPDK documentation <linux_gsg>` or refer to *DPDK Release Notes*. + +Supported features +------------------ + +* Jumbo frames up to 9K +* Port Hardware Statistics +* IPv4/TCP/UDP checksum offload +* TSO offload +* Multiple receive and transmit queues +* RSS +* Low Latency Queue for Tx + +Unsupported features +-------------------- + +The features supported by the device and not yet supported by this PMD include: + +* Asynchronous Event Notification Queue (AENQ) + +Prerequisites +------------- + +#. Prepare the system as recommended by DPDK suite. This includes environment + variables, hugepages configuration, tool-chains and configuration + +#. Insert igb_uio kernel module using the command 'modprobe igb_uio' + +#. Bind the intended ENA device to igb_uio module + + +At this point the system should be ready to run DPDK applications. Once the +application runs to completion, the ENA can be detached from igb_uio if necessary. + +Usage example +------------- + +Follow instructions available in the document +:ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` to launch +**testpmd** with Amazon ENA devices managed by librte_pmd_ena. + +Example output: + +.. code-block:: console + + [...] + EAL: PCI device 0000:02:00.1 on NUMA socket -1 + EAL: probe driver: 1d0f:ec20 rte_ena_pmd + EAL: PCI memory mapped at 0x7f9b6c400000 + PMD: eth_ena_dev_init(): Initializing 0:2:0.1 + Interactive-mode selected + Configuring Port 0 (socket 0) + Port 0: 00:00:00:11:00:01 + Checking link statuses... + Port 0 Link Up - speed 10000 Mbps - full-duplex + Done + testpmd> diff --git a/src/seastar/dpdk/doc/guides/nics/enic.rst b/src/seastar/dpdk/doc/guides/nics/enic.rst new file mode 100644 index 00000000..89a30158 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/enic.rst @@ -0,0 +1,382 @@ +.. BSD LICENSE + Copyright (c) 2017, Cisco Systems, Inc. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS + FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE + COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, + INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, + BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER + CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN + ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + POSSIBILITY OF SUCH DAMAGE. + +ENIC Poll Mode Driver +===================== + +ENIC PMD is the DPDK poll-mode driver for the Cisco System Inc. VIC Ethernet +NICs. These adapters are also referred to as vNICs below. If you are running +or would like to run DPDK software applications on Cisco UCS servers using +Cisco VIC adapters the following documentation is relevant. + +How to obtain ENIC PMD integrated DPDK +-------------------------------------- + +ENIC PMD support is integrated into the DPDK suite. dpdk-<version>.tar.gz +should be downloaded from http://dpdk.org + + +Configuration information +------------------------- + +- **DPDK Configuration Parameters** + + The following configuration options are available for the ENIC PMD: + + - **CONFIG_RTE_LIBRTE_ENIC_PMD** (default y): Enables or disables inclusion + of the ENIC PMD driver in the DPDK compilation. + + - **CONFIG_RTE_LIBRTE_ENIC_DEBUG** (default n): Enables or disables debug + logging within the ENIC PMD driver. + +- **vNIC Configuration Parameters** + + - **Number of Queues** + + The maximum number of receive queues (RQs), work queues (WQs) and + completion queues (CQs) are configurable on a per vNIC basis + through the Cisco UCS Manager (CIMC or UCSM). + + These values should be configured as follows: + + - The number of WQs should be greater or equal to the value of the + expected nb_tx_q parameter in the call to the + rte_eth_dev_configure() + + - The number of RQs configured in the vNIC should be greater or + equal to *twice* the value of the expected nb_rx_q parameter in + the call to rte_eth_dev_configure(). With the addition of Rx + scatter, a pair of RQs on the vnic is needed for each receive + queue used by DPDK, even if Rx scatter is not being used. + Having a vNIC with only 1 RQ is not a valid configuration, and + will fail with an error message. + + - The number of CQs should set so that there is one CQ for each + WQ, and one CQ for each pair of RQs. + + For example: If the application requires 3 Rx queues, and 3 Tx + queues, the vNIC should be configured to have at least 3 WQs, 6 + RQs (3 pairs), and 6 CQs (3 for use by WQs + 3 for use by the 3 + pairs of RQs). + + - **Size of Queues** + + Likewise, the number of receive and transmit descriptors are configurable on + a per vNIC bases via the UCS Manager and should be greater than or equal to + the nb_rx_desc and nb_tx_desc parameters expected to be used in the calls + to rte_eth_rx_queue_setup() and rte_eth_tx_queue_setup() respectively. + An application requesting more than the set size will be limited to that + size. + + Unless there is a lack of resources due to creating many vNICs, it + is recommended that the WQ and RQ sizes be set to the maximum. This + gives the application the greatest amount of flexibility in its + queue configuration. + + - *Note*: Since the introduction of Rx scatter, for performance + reasons, this PMD uses two RQs on the vNIC per receive queue in + DPDK. One RQ holds descriptors for the start of a packet the + second RQ holds the descriptors for the rest of the fragments of + a packet. This means that the nb_rx_desc parameter to + rte_eth_rx_queue_setup() can be a greater than 4096. The exact + amount will depend on the size of the mbufs being used for + receives, and the MTU size. + + For example: If the mbuf size is 2048, and the MTU is 9000, then + receiving a full size packet will take 5 descriptors, 1 from the + start of packet queue, and 4 from the second queue. Assuming + that the RQ size was set to the maximum of 4096, then the + application can specify up to 1024 + 4096 as the nb_rx_desc + parameter to rte_eth_rx_queue_setup(). + + - **Interrupts** + + Only one interrupt per vNIC interface should be configured in the UCS + manager regardless of the number receive/transmit queues. The ENIC PMD + uses this interrupt to get information about link status and errors + in the fast path. + +.. _enic-flow-director: + +Flow director support +--------------------- + +Advanced filtering support was added to 1300 series VIC firmware starting +with version 2.0.13 for C-series UCS servers and version 3.1.2 for UCSM +managed blade servers. In order to enable advanced filtering the 'Advanced +filter' radio button should be enabled via CIMC or UCSM followed by a reboot +of the server. + +With advanced filters, perfect matching of all fields of IPv4, IPv6 headers +as well as TCP, UDP and SCTP L4 headers is available through flow director. +Masking of these fields for partial match is also supported. + +Without advanced filter support, the flow director is limited to IPv4 +perfect filtering of the 5-tuple with no masking of fields supported. + +SR-IOV mode utilization +----------------------- + +UCS blade servers configured with dynamic vNIC connection policies in UCS +manager are capable of supporting assigned devices on virtual machines (VMs) +through a KVM hypervisor. Assigned devices, also known as 'passthrough' +devices, are SR-IOV virtual functions (VFs) on the host which are exposed +to VM instances. + +The Cisco Virtual Machine Fabric Extender (VM-FEX) gives the VM a dedicated +interface on the Fabric Interconnect (FI). Layer 2 switching is done at +the FI. This may eliminate the requirement for software switching on the +host to route intra-host VM traffic. + +Please refer to `Creating a Dynamic vNIC Connection Policy +<http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/sw/vm_fex/vmware/gui/config_guide/b_GUI_VMware_VM-FEX_UCSM_Configuration_Guide/b_GUI_VMware_VM-FEX_UCSM_Configuration_Guide_chapter_010.html#task_433E01651F69464783A68E66DA8A47A5>`_ +for information on configuring SR-IOV Adapter policies using UCS manager. + +Once the policies are in place and the host OS is rebooted, VFs should be +visible on the host, E.g.: + +.. code-block:: console + + # lspci | grep Cisco | grep Ethernet + 0d:00.0 Ethernet controller: Cisco Systems Inc VIC Ethernet NIC (rev a2) + 0d:00.1 Ethernet controller: Cisco Systems Inc VIC SR-IOV VF (rev a2) + 0d:00.2 Ethernet controller: Cisco Systems Inc VIC SR-IOV VF (rev a2) + 0d:00.3 Ethernet controller: Cisco Systems Inc VIC SR-IOV VF (rev a2) + 0d:00.4 Ethernet controller: Cisco Systems Inc VIC SR-IOV VF (rev a2) + 0d:00.5 Ethernet controller: Cisco Systems Inc VIC SR-IOV VF (rev a2) + 0d:00.6 Ethernet controller: Cisco Systems Inc VIC SR-IOV VF (rev a2) + 0d:00.7 Ethernet controller: Cisco Systems Inc VIC SR-IOV VF (rev a2) + +Enable Intel IOMMU on the host and install KVM and libvirt. A VM instance should +be created with an assigned device. When using libvirt, this configuration can +be done within the domain (i.e. VM) config file. For example this entry maps +host VF 0d:00:01 into the VM. + +.. code-block:: console + + <interface type='hostdev' managed='yes'> + <mac address='52:54:00:ac:ff:b6'/> + <source> + <address type='pci' domain='0x0000' bus='0x0d' slot='0x00' function='0x1'/> + </source> + +Alternatively, the configuration can be done in a separate file using the +``network`` keyword. These methods are described in the libvirt documentation for +`Network XML format <https://libvirt.org/formatnetwork.html>`_. + +When the VM instance is started, the ENIC KVM driver will bind the host VF to +vfio, complete provisioning on the FI and bring up the link. + +.. note:: + + It is not possible to use a VF directly from the host because it is not + fully provisioned until the hypervisor brings up the VM that it is assigned + to. + +In the VM instance, the VF will now be visible. E.g., here the VF 00:04.0 is +seen on the VM instance and should be available for binding to a DPDK. + +.. code-block:: console + + # lspci | grep Ether + 00:04.0 Ethernet controller: Cisco Systems Inc VIC SR-IOV VF (rev a2) + +Follow the normal DPDK install procedure, binding the VF to either ``igb_uio`` +or ``vfio`` in non-IOMMU mode. + +Please see :ref:`Limitations <enic_limitations>` for limitations in +the use of SR-IOV. + +.. _enic_limitations: + +Limitations +----------- + +- **VLAN 0 Priority Tagging** + + If a vNIC is configured in TRUNK mode by the UCS manager, the adapter will + priority tag egress packets according to 802.1Q if they were not already + VLAN tagged by software. If the adapter is connected to a properly configured + switch, there will be no unexpected behavior. + + In test setups where an Ethernet port of a Cisco adapter in TRUNK mode is + connected point-to-point to another adapter port or connected though a router + instead of a switch, all ingress packets will be VLAN tagged. Programs such + as l3fwd which do not account for VLAN tags in packets will misbehave. The + solution is to enable VLAN stripping on ingress. The follow code fragment is + example of how to accomplish this: + +.. code-block:: console + + vlan_offload = rte_eth_dev_get_vlan_offload(port); + vlan_offload |= ETH_VLAN_STRIP_OFFLOAD; + rte_eth_dev_set_vlan_offload(port, vlan_offload); + +- Limited flow director support on 1200 series and 1300 series Cisco VIC + adapters with old firmware. Please see :ref:`enic-flow-director`. + +- Flow director features are not supported on generation 1 Cisco VIC adapters + (M81KR and P81E) + +- **SR-IOV** + + - KVM hypervisor support only. VMware has not been tested. + - Requires VM-FEX, and so is only available on UCS managed servers connected + to Fabric Interconnects. It is not on standalone C-Series servers. + - VF devices are not usable directly from the host. They can only be used + as assigned devices on VM instances. + - Currently, unbind of the ENIC kernel mode driver 'enic.ko' on the VM + instance may hang. As a workaround, enic.ko should blacklisted or removed + from the boot process. + - pci_generic cannot be used as the uio module in the VM. igb_uio or + vfio in non-IOMMU mode can be used. + - The number of RQs in UCSM dynamic vNIC configurations must be at least 2. + - The number of SR-IOV devices is limited to 256. Components on target system + might limit this number to fewer than 256. + +How to build the suite +---------------------- + +Refer to the document :ref:`compiling and testing a PMD for a NIC +<pmd_build_and_test>` for details. + +By default the ENIC PMD library will be built into the DPDK library. + +For configuring and using UIO and VFIO frameworks, please refer to the +documentation that comes with DPDK suite. + +Supported Cisco VIC adapters +---------------------------- + +ENIC PMD supports all recent generations of Cisco VIC adapters including: + +- VIC 1280 +- VIC 1240 +- VIC 1225 +- VIC 1285 +- VIC 1225T +- VIC 1227 +- VIC 1227T +- VIC 1380 +- VIC 1340 +- VIC 1385 +- VIC 1387 + +Supported Operating Systems +--------------------------- + +Any Linux distribution fulfilling the conditions described in Dependencies +section of DPDK documentation. + +Supported features +------------------ + +- Unicast, multicast and broadcast transmission and reception +- Receive queue polling +- Port Hardware Statistics +- Hardware VLAN acceleration +- IP checksum offload +- Receive side VLAN stripping +- Multiple receive and transmit queues +- Flow Director ADD, UPDATE, DELETE, STATS operation support IPv4 and IPv6 +- Promiscuous mode +- Setting RX VLAN (supported via UCSM/CIMC only) +- VLAN filtering (supported via UCSM/CIMC only) +- Execution of application by unprivileged system users +- IPV4, IPV6 and TCP RSS hashing +- Scattered Rx +- MTU update +- SR-IOV on UCS managed servers connected to Fabric Interconnects. + +Known bugs and unsupported features in this release +--------------------------------------------------- + +- Signature or flex byte based flow direction +- Drop feature of flow direction +- VLAN based flow direction +- non-IPV4 flow direction +- Setting of extended VLAN +- UDP RSS hashing +- MTU update only works if Scattered Rx mode is disabled + +Prerequisites +------------- + +- Prepare the system as recommended by DPDK suite. This includes environment + variables, hugepages configuration, tool-chains and configuration +- Insert vfio-pci kernel module using the command 'modprobe vfio-pci' if the + user wants to use VFIO framework +- Insert uio kernel module using the command 'modprobe uio' if the user wants + to use UIO framework +- DPDK suite should be configured based on the user's decision to use VFIO or + UIO framework +- If the vNIC device(s) to be used is bound to the kernel mode Ethernet driver + use 'ifconfig' to bring the interface down. The dpdk-devbind.py tool can + then be used to unbind the device's bus id from the ENIC kernel mode driver. +- Bind the intended vNIC to vfio-pci in case the user wants ENIC PMD to use + VFIO framework using dpdk-devbind.py. +- Bind the intended vNIC to igb_uio in case the user wants ENIC PMD to use + UIO framework using dpdk-devbind.py. + +At this point the system should be ready to run DPDK applications. Once the +application runs to completion, the vNIC can be detached from vfio-pci or +igb_uio if necessary. + +Root privilege is required to bind and unbind vNICs to/from VFIO/UIO. +VFIO framework helps an unprivileged user to run the applications. +For an unprivileged user to run the applications on DPDK and ENIC PMD, +it may be necessary to increase the maximum locked memory of the user. +The following command could be used to do this. + +.. code-block:: console + + sudo sh -c "ulimit -l <value in Kilo Bytes>" + +The value depends on the memory configuration of the application, DPDK and +PMD. Typically, the limit has to be raised to higher than 2GB. +e.g., 2621440 + +The compilation of any unused drivers can be disabled using the +configuration file in config/ directory (e.g., config/common_linuxapp). +This would help in bringing down the time taken for building the +libraries and the initialization time of the application. + +Additional Reference +-------------------- + +- http://www.cisco.com/c/en/us/products/servers-unified-computing + +Contact Information +------------------- + +Any questions or bugs should be reported to DPDK community and to the ENIC PMD +maintainers: + +- John Daley <johndale@cisco.com> +- Nelson Escobar <neescoba@cisco.com> diff --git a/src/seastar/dpdk/doc/guides/nics/features/afpacket.ini b/src/seastar/dpdk/doc/guides/nics/features/afpacket.ini new file mode 100644 index 00000000..99f87ab6 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/afpacket.ini @@ -0,0 +1,6 @@ +; +; Supported features of the 'afpacket' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] diff --git a/src/seastar/dpdk/doc/guides/nics/features/ark.ini b/src/seastar/dpdk/doc/guides/nics/features/ark.ini new file mode 100644 index 00000000..31a35279 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/ark.ini @@ -0,0 +1,14 @@ +; +; Supported features of the 'ark' poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Queue start/stop = Y +Jumbo frame = Y +Scattered Rx = Y +Basic stats = Y +Stats per queue = Y +Linux UIO = Y +x86-64 = Y +Usage doc = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/avp.ini b/src/seastar/dpdk/doc/guides/nics/features/avp.ini new file mode 100644 index 00000000..ceb69939 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/avp.ini @@ -0,0 +1,16 @@ +; +; Supported features of the 'AVP' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Link status = Y +Jumbo frame = Y +Scattered Rx = Y +Promiscuous mode = Y +Unicast MAC filter = Y +VLAN offload = Y +Basic stats = Y +Stats per queue = Y +Linux UIO = Y +x86-64 = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/bnx2x.ini b/src/seastar/dpdk/doc/guides/nics/features/bnx2x.ini new file mode 100644 index 00000000..1ad8a3e8 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/bnx2x.ini @@ -0,0 +1,16 @@ +; +; Supported features of the 'bnx2x' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Link status = Y +Link status event = Y +Promiscuous mode = Y +Unicast MAC filter = Y +Multicast MAC filter = Y +Basic stats = Y +Extended stats = Y +Linux UIO = Y +x86-64 = Y +Usage doc = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/bnx2x_vf.ini b/src/seastar/dpdk/doc/guides/nics/features/bnx2x_vf.ini new file mode 100644 index 00000000..da9168ea --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/bnx2x_vf.ini @@ -0,0 +1,17 @@ +; +; Supported features of the 'bnx2x_vf' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Link status = Y +Link status event = Y +Promiscuous mode = Y +Unicast MAC filter = Y +Multicast MAC filter = Y +SR-IOV = Y +Basic stats = Y +Extended stats = Y +Linux UIO = Y +x86-64 = Y +Usage doc = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/bnxt.ini b/src/seastar/dpdk/doc/guides/nics/features/bnxt.ini new file mode 100644 index 00000000..013a9cda --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/bnxt.ini @@ -0,0 +1,16 @@ +; +; Supported features of the 'bnxt' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Link status = Y +Queue start/stop = Y +Promiscuous mode = Y +Unicast MAC filter = Y +Multicast MAC filter = Y +RSS reta update = Y +Basic stats = Y +Extended stats = Y +Linux UIO = Y +x86-64 = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/bonding.ini b/src/seastar/dpdk/doc/guides/nics/features/bonding.ini new file mode 100644 index 00000000..c1653051 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/bonding.ini @@ -0,0 +1,6 @@ +; +; Supported features of the 'bonding' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] diff --git a/src/seastar/dpdk/doc/guides/nics/features/cxgbe.ini b/src/seastar/dpdk/doc/guides/nics/features/cxgbe.ini new file mode 100644 index 00000000..2e72a107 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/cxgbe.ini @@ -0,0 +1,31 @@ +; +; Supported features of the 'cxgbe' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Link status = Y +Queue start/stop = Y +MTU update = Y +Jumbo frame = Y +Scattered Rx = Y +TSO = Y +Promiscuous mode = Y +Allmulticast mode = Y +RSS hash = Y +Flow control = Y +CRC offload = Y +VLAN offload = Y +L3 checksum offload = Y +L4 checksum offload = Y +Packet type parsing = Y +Basic stats = Y +Stats per queue = Y +EEPROM dump = Y +Registers dump = Y +BSD nic_uio = Y +Linux UIO = Y +Linux VFIO = Y +x86-32 = Y +x86-64 = Y +Usage doc = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/default.ini b/src/seastar/dpdk/doc/guides/nics/features/default.ini new file mode 100644 index 00000000..cafc6c70 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/default.ini @@ -0,0 +1,75 @@ +; +; Features of a default network driver. +; +; This file defines the features that are valid for inclusion in +; the other driver files and also the order that they appear in +; the features table in the documentation. The feature description +; string should not exceed feature_str_len defined in conf.py. +; +[Features] +Speed capabilities = +Link status = +Link status event = +Removal event = +Queue status event = +Rx interrupt = +Free Tx mbuf on demand = +Queue start/stop = +MTU update = +Jumbo frame = +Scattered Rx = +LRO = +TSO = +Promiscuous mode = +Allmulticast mode = +Unicast MAC filter = +Multicast MAC filter = +RSS hash = +RSS key update = +RSS reta update = +VMDq = +SR-IOV = +DCB = +VLAN filter = +Ethertype filter = +N-tuple filter = +SYN filter = +Tunnel filter = +Flexible filter = +Hash filter = +Flow director = +Flow control = +Flow API = +Rate limitation = +Traffic mirroring = +CRC offload = +VLAN offload = +QinQ offload = +L3 checksum offload = +L4 checksum offload = +MACsec offload = +Inner L3 checksum = +Inner L4 checksum = +Packet type parsing = +Timesync = +Rx descriptor status = +Tx descriptor status = +Basic stats = +Extended stats = +Stats per queue = +FW version = +EEPROM dump = +Registers dump = +Multiprocess aware = +BSD nic_uio = +Linux UIO = +Linux VFIO = +Other kdrv = +ARMv7 = +ARMv8 = +Power8 = +x86-32 = +x86-64 = +Usage doc = +Design doc = +Perf doc = diff --git a/src/seastar/dpdk/doc/guides/nics/features/dpaa2.ini b/src/seastar/dpdk/doc/guides/nics/features/dpaa2.ini new file mode 100644 index 00000000..d43f4046 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/dpaa2.ini @@ -0,0 +1,18 @@ +; +; Supported features of the 'dpaa2' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Link status = Y +Queue start/stop = Y +MTU update = Y +Promiscuous mode = Y +RSS hash = Y +L3 checksum offload = Y +L4 checksum offload = Y +Packet type parsing = Y +Basic stats = Y +Linux VFIO = Y +ARMv8 = Y +Usage doc = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/e1000.ini b/src/seastar/dpdk/doc/guides/nics/features/e1000.ini new file mode 100644 index 00000000..260d46da --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/e1000.ini @@ -0,0 +1,31 @@ +; +; Supported features of the 'e1000' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Link status = Y +Link status event = Y +Rx interrupt = Y +Free Tx mbuf on demand = Y +MTU update = Y +Jumbo frame = Y +Scattered Rx = Y +Promiscuous mode = Y +Allmulticast mode = Y +Unicast MAC filter = Y +VLAN filter = Y +Flow control = Y +CRC offload = Y +VLAN offload = Y +QinQ offload = Y +L3 checksum offload = Y +L4 checksum offload = Y +Rx descriptor status = Y +Tx descriptor status = Y +Basic stats = Y +BSD nic_uio = Y +Linux UIO = Y +Linux VFIO = Y +x86-32 = Y +x86-64 = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/ena.ini b/src/seastar/dpdk/doc/guides/nics/features/ena.ini new file mode 100644 index 00000000..74969fd0 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/ena.ini @@ -0,0 +1,26 @@ +; +; Supported features of the 'ena' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Queue start/stop = Y +MTU update = Y +Jumbo frame = Y +Scattered Rx = Y +TSO = Y +RSS hash = Y +RSS key update = Y +RSS reta update = Y +SR-IOV = Y +CRC offload = Y +VLAN offload = Y +L3 checksum offload = Y +L4 checksum offload = Y +Inner L3 checksum = Y +Inner L4 checksum = Y +Basic stats = Y +Extended stats = Y +Linux UIO = Y +x86-32 = Y +x86-64 = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/enic.ini b/src/seastar/dpdk/doc/guides/nics/features/enic.ini new file mode 100644 index 00000000..94e7f3cb --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/enic.ini @@ -0,0 +1,32 @@ +; +; Supported features of the 'enic' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Link status = Y +Link status event = Y +Queue start/stop = Y +MTU update = Y +Jumbo frame = Y +Scattered Rx = Y +TSO = Y +Promiscuous mode = Y +Unicast MAC filter = Y +Multicast MAC filter = Y +RSS hash = Y +SR-IOV = Y +VLAN filter = Y +CRC offload = Y +VLAN offload = Y +Flow director = Y +L3 checksum offload = Y +L4 checksum offload = Y +Packet type parsing = Y +Basic stats = Y +BSD nic_uio = Y +Linux UIO = Y +Linux VFIO = Y +x86-32 = Y +x86-64 = Y +Usage doc = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/fm10k.ini b/src/seastar/dpdk/doc/guides/nics/features/fm10k.ini new file mode 100644 index 00000000..9e1035f3 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/fm10k.ini @@ -0,0 +1,34 @@ +; +; Supported features of the 'fm10k' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Rx interrupt = Y +Queue start/stop = Y +Jumbo frame = Y +Scattered Rx = Y +TSO = Y +Promiscuous mode = Y +Allmulticast mode = Y +Unicast MAC filter = Y +Multicast MAC filter = Y +RSS hash = Y +RSS key update = Y +RSS reta update = Y +VMDq = Y +VLAN filter = Y +CRC offload = Y +VLAN offload = Y +L3 checksum offload = Y +L4 checksum offload = Y +Packet type parsing = Y +Basic stats = Y +Extended stats = Y +Stats per queue = Y +Multiprocess aware = Y +BSD nic_uio = Y +Linux UIO = Y +Linux VFIO = Y +x86-32 = Y +x86-64 = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/fm10k_vec.ini b/src/seastar/dpdk/doc/guides/nics/features/fm10k_vec.ini new file mode 100644 index 00000000..1384ab15 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/fm10k_vec.ini @@ -0,0 +1,34 @@ +; +; Supported features of the 'fm10k_vec' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Rx interrupt = Y +Queue start/stop = Y +Jumbo frame = Y +Scattered Rx = Y +TSO = Y +Promiscuous mode = Y +Allmulticast mode = Y +Unicast MAC filter = Y +Multicast MAC filter = Y +RSS hash = Y +RSS key update = Y +RSS reta update = Y +VMDq = Y +VLAN filter = Y +CRC offload = Y +VLAN offload = Y +L3 checksum offload = Y +L4 checksum offload = Y +Packet type parsing = Y +Basic stats = Y +Extended stats = Y +Stats per queue = Y +Multiprocess aware = Y +BSD nic_uio = Y +Linux UIO = Y +Linux VFIO = Y +x86-32 = Y +x86-64 = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/fm10k_vf.ini b/src/seastar/dpdk/doc/guides/nics/features/fm10k_vf.ini new file mode 100644 index 00000000..15de536f --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/fm10k_vf.ini @@ -0,0 +1,28 @@ +; +; Supported features of the 'fm10k_vf' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Rx interrupt = Y +Queue start/stop = Y +Jumbo frame = Y +Scattered Rx = Y +TSO = Y +RSS hash = Y +RSS key update = Y +RSS reta update = Y +CRC offload = Y +VLAN offload = Y +L3 checksum offload = Y +L4 checksum offload = Y +Packet type parsing = Y +Basic stats = Y +Extended stats = Y +Stats per queue = Y +Multiprocess aware = Y +BSD nic_uio = Y +Linux UIO = Y +Linux VFIO = Y +x86-32 = Y +x86-64 = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/fm10k_vf_vec.ini b/src/seastar/dpdk/doc/guides/nics/features/fm10k_vf_vec.ini new file mode 100644 index 00000000..b32550cb --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/fm10k_vf_vec.ini @@ -0,0 +1,28 @@ +; +; Supported features of the 'fm10kvf_vec' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Rx interrupt = Y +Queue start/stop = Y +Jumbo frame = Y +Scattered Rx = Y +TSO = Y +RSS hash = Y +RSS key update = Y +RSS reta update = Y +CRC offload = Y +VLAN offload = Y +L3 checksum offload = Y +L4 checksum offload = Y +Packet type parsing = Y +Basic stats = Y +Extended stats = Y +Stats per queue = Y +Multiprocess aware = Y +BSD nic_uio = Y +Linux UIO = Y +Linux VFIO = Y +x86-32 = Y +x86-64 = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/i40e.ini b/src/seastar/dpdk/doc/guides/nics/features/i40e.ini new file mode 100644 index 00000000..ecabce0b --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/i40e.ini @@ -0,0 +1,53 @@ +; +; Supported features of the 'i40e' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Link status = Y +Link status event = Y +Rx interrupt = Y +Queue start/stop = Y +Jumbo frame = Y +Scattered Rx = Y +TSO = Y +Promiscuous mode = Y +Allmulticast mode = Y +Unicast MAC filter = Y +Multicast MAC filter = Y +RSS hash = Y +RSS key update = Y +RSS reta update = Y +VMDq = Y +SR-IOV = Y +DCB = Y +VLAN filter = Y +Ethertype filter = Y +Tunnel filter = Y +Hash filter = Y +Flow director = Y +Flow control = Y +Flow API = Y +Traffic mirroring = Y +CRC offload = Y +VLAN offload = Y +QinQ offload = Y +L3 checksum offload = Y +L4 checksum offload = Y +Inner L3 checksum = Y +Inner L4 checksum = Y +Packet type parsing = Y +Timesync = Y +Rx descriptor status = Y +Tx descriptor status = Y +Basic stats = Y +Extended stats = Y +FW version = Y +Multiprocess aware = Y +BSD nic_uio = Y +Linux UIO = Y +Linux VFIO = Y +x86-32 = Y +x86-64 = Y +ARMv8 = Y +Power8 = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/i40e_vec.ini b/src/seastar/dpdk/doc/guides/nics/features/i40e_vec.ini new file mode 100644 index 00000000..206f348b --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/i40e_vec.ini @@ -0,0 +1,43 @@ +; +; Supported features of the 'i40e_vec' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Link status = Y +Link status event = Y +Rx interrupt = Y +Queue start/stop = Y +Jumbo frame = Y +Scattered Rx = Y +TSO = Y +Promiscuous mode = Y +Allmulticast mode = Y +Unicast MAC filter = Y +Multicast MAC filter = Y +RSS hash = Y +RSS key update = Y +RSS reta update = Y +VMDq = Y +SR-IOV = Y +DCB = Y +VLAN filter = Y +Ethertype filter = Y +Tunnel filter = Y +Hash filter = Y +Flow director = Y +Flow control = Y +Traffic mirroring = Y +Timesync = Y +Rx descriptor status = Y +Tx descriptor status = Y +Basic stats = Y +Extended stats = Y +Multiprocess aware = Y +BSD nic_uio = Y +Linux UIO = Y +Linux VFIO = Y +x86-32 = Y +x86-64 = Y +ARMv8 = Y +Power8 = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/i40e_vf.ini b/src/seastar/dpdk/doc/guides/nics/features/i40e_vf.ini new file mode 100644 index 00000000..46e0d9fc --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/i40e_vf.ini @@ -0,0 +1,38 @@ +; +; Supported features of the 'i40e_vf' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Rx interrupt = Y +Queue start/stop = Y +Jumbo frame = Y +Scattered Rx = Y +TSO = Y +Promiscuous mode = Y +Allmulticast mode = Y +Unicast MAC filter = Y +Multicast MAC filter = Y +RSS hash = Y +RSS key update = Y +RSS reta update = Y +VLAN filter = Y +Hash filter = Y +CRC offload = Y +VLAN offload = Y +QinQ offload = Y +L3 checksum offload = Y +L4 checksum offload = Y +Inner L3 checksum = Y +Inner L4 checksum = Y +Packet type parsing = Y +Rx descriptor status = Y +Tx descriptor status = Y +Basic stats = Y +Extended stats = Y +Multiprocess aware = Y +BSD nic_uio = Y +Linux UIO = Y +Linux VFIO = Y +x86-32 = Y +x86-64 = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/i40e_vf_vec.ini b/src/seastar/dpdk/doc/guides/nics/features/i40e_vf_vec.ini new file mode 100644 index 00000000..c2c6c19f --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/i40e_vf_vec.ini @@ -0,0 +1,31 @@ +; +; Supported features of the 'i40evf_vec' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Rx interrupt = Y +Queue start/stop = Y +Jumbo frame = Y +Scattered Rx = Y +TSO = Y +Promiscuous mode = Y +Allmulticast mode = Y +Unicast MAC filter = Y +Multicast MAC filter = Y +RSS hash = Y +RSS key update = Y +RSS reta update = Y +VLAN filter = Y +Hash filter = Y +Rx descriptor status = Y +Tx descriptor status = Y +Basic stats = Y +Extended stats = Y +Multiprocess aware = Y +BSD nic_uio = Y +Linux UIO = Y +Linux VFIO = Y +x86-32 = Y +x86-64 = Y +ARMv8 = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/igb.ini b/src/seastar/dpdk/doc/guides/nics/features/igb.ini new file mode 100644 index 00000000..11450270 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/igb.ini @@ -0,0 +1,47 @@ +; +; Supported features of the 'igb' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Link status = Y +Link status event = Y +Rx interrupt = Y +MTU update = Y +Jumbo frame = Y +Scattered Rx = Y +TSO = Y +Promiscuous mode = Y +Allmulticast mode = Y +Unicast MAC filter = Y +RSS hash = Y +RSS key update = Y +RSS reta update = Y +VMDq = Y +SR-IOV = Y +DCB = Y +VLAN filter = Y +Ethertype filter = Y +N-tuple filter = Y +SYN filter = Y +Flexible filter = Y +Flow control = Y +CRC offload = Y +VLAN offload = Y +QinQ offload = Y +L3 checksum offload = Y +L4 checksum offload = Y +Packet type parsing = Y +Timesync = Y +Rx descriptor status = Y +Tx descriptor status = Y +Basic stats = Y +Extended stats = Y +FW version = Y +EEPROM dump = Y +Registers dump = Y +BSD nic_uio = Y +Linux UIO = Y +Linux VFIO = Y +x86-32 = Y +x86-64 = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/igb_vf.ini b/src/seastar/dpdk/doc/guides/nics/features/igb_vf.ini new file mode 100644 index 00000000..e641a2c9 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/igb_vf.ini @@ -0,0 +1,29 @@ +; +; Supported features of the 'igb_vf' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Rx interrupt = Y +Scattered Rx = Y +TSO = Y +Promiscuous mode = Y +Allmulticast mode = Y +Unicast MAC filter = Y +VLAN filter = Y +CRC offload = Y +VLAN offload = Y +QinQ offload = Y +L3 checksum offload = Y +L4 checksum offload = Y +Packet type parsing = Y +Rx descriptor status = Y +Tx descriptor status = Y +Basic stats = Y +Extended stats = Y +Registers dump = Y +BSD nic_uio = Y +Linux UIO = Y +Linux VFIO = Y +x86-32 = Y +x86-64 = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/ixgbe.ini b/src/seastar/dpdk/doc/guides/nics/features/ixgbe.ini new file mode 100644 index 00000000..4aa7af6d --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/ixgbe.ini @@ -0,0 +1,59 @@ +; +; Supported features of the 'ixgbe' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Link status = Y +Link status event = Y +Rx interrupt = Y +Queue start/stop = Y +MTU update = Y +Jumbo frame = Y +Scattered Rx = Y +LRO = Y +TSO = Y +Promiscuous mode = Y +Allmulticast mode = Y +Unicast MAC filter = Y +RSS hash = Y +RSS key update = Y +RSS reta update = Y +VMDq = Y +SR-IOV = Y +DCB = Y +VLAN filter = Y +Ethertype filter = Y +N-tuple filter = Y +SYN filter = Y +Tunnel filter = Y +Flow director = Y +Flow control = Y +Flow API = Y +Rate limitation = Y +Traffic mirroring = Y +CRC offload = Y +VLAN offload = Y +QinQ offload = Y +L3 checksum offload = Y +L4 checksum offload = Y +MACsec offload = Y +Inner L3 checksum = Y +Inner L4 checksum = Y +Packet type parsing = Y +Timesync = Y +Rx descriptor status = Y +Tx descriptor status = Y +Basic stats = Y +Extended stats = Y +Stats per queue = Y +FW version = Y +EEPROM dump = Y +Registers dump = Y +Multiprocess aware = Y +BSD nic_uio = Y +Linux UIO = Y +Linux VFIO = Y +ARMv8 = Y +x86-32 = Y +x86-64 = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/ixgbe_vec.ini b/src/seastar/dpdk/doc/guides/nics/features/ixgbe_vec.ini new file mode 100644 index 00000000..4da81182 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/ixgbe_vec.ini @@ -0,0 +1,48 @@ +; +; Supported features of the 'ixgbe_vec' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Link status = Y +Link status event = Y +Rx interrupt = Y +Queue start/stop = Y +MTU update = Y +Jumbo frame = Y +Scattered Rx = Y +LRO = Y +TSO = Y +Promiscuous mode = Y +Allmulticast mode = Y +Unicast MAC filter = Y +RSS hash = Y +RSS key update = Y +RSS reta update = Y +VMDq = Y +SR-IOV = Y +DCB = Y +VLAN filter = Y +Ethertype filter = Y +N-tuple filter = Y +SYN filter = Y +Tunnel filter = Y +Flow director = Y +Flow control = Y +Rate limitation = Y +Traffic mirroring = Y +Timesync = Y +Rx descriptor status = Y +Tx descriptor status = Y +Basic stats = Y +Extended stats = Y +Stats per queue = Y +EEPROM dump = Y +Registers dump = Y +Multiprocess aware = Y +BSD nic_uio = Y +Linux UIO = Y +Linux VFIO = Y +ARMv8 = Y +x86-32 = Y +x86-64 = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/ixgbe_vf.ini b/src/seastar/dpdk/doc/guides/nics/features/ixgbe_vf.ini new file mode 100644 index 00000000..b63e32ce --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/ixgbe_vf.ini @@ -0,0 +1,39 @@ +; +; Supported features of the 'ixgbe_vf' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Link status = Y +Rx interrupt = Y +MTU update = Y +Jumbo frame = Y +Scattered Rx = Y +LRO = Y +TSO = Y +Allmulticast mode = Y +Unicast MAC filter = Y +RSS hash = Y +RSS key update = Y +RSS reta update = Y +VLAN filter = Y +CRC offload = Y +VLAN offload = Y +QinQ offload = Y +L3 checksum offload = Y +L4 checksum offload = Y +Inner L3 checksum = Y +Inner L4 checksum = Y +Packet type parsing = Y +Rx descriptor status = Y +Tx descriptor status = Y +Basic stats = Y +Extended stats = Y +Registers dump = Y +Multiprocess aware = Y +BSD nic_uio = Y +Linux UIO = Y +Linux VFIO = Y +ARMv8 = Y +x86-32 = Y +x86-64 = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/ixgbe_vf_vec.ini b/src/seastar/dpdk/doc/guides/nics/features/ixgbe_vf_vec.ini new file mode 100644 index 00000000..c994857e --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/ixgbe_vf_vec.ini @@ -0,0 +1,31 @@ +; +; Supported features of the 'ixgbevf_vec' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Link status = Y +Rx interrupt = Y +MTU update = Y +Jumbo frame = Y +Scattered Rx = Y +LRO = Y +TSO = Y +Allmulticast mode = Y +Unicast MAC filter = Y +RSS hash = Y +RSS key update = Y +RSS reta update = Y +VLAN filter = Y +Rx descriptor status = Y +Tx descriptor status = Y +Basic stats = Y +Extended stats = Y +Registers dump = Y +Multiprocess aware = Y +BSD nic_uio = Y +Linux UIO = Y +Linux VFIO = Y +ARMv8 = Y +x86-32 = Y +x86-64 = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/kni.ini b/src/seastar/dpdk/doc/guides/nics/features/kni.ini new file mode 100644 index 00000000..6deb66ae --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/kni.ini @@ -0,0 +1,7 @@ +; +; Supported features of the 'kni' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Usage doc = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/liquidio.ini b/src/seastar/dpdk/doc/guides/nics/features/liquidio.ini new file mode 100644 index 00000000..49cc3566 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/liquidio.ini @@ -0,0 +1,28 @@ +; +; Supported features of the 'LiquidIO' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Link status = Y +Link status event = Y +Jumbo frame = Y +Scattered Rx = Y +Allmulticast mode = Y +RSS hash = Y +RSS key update = Y +RSS reta update = Y +VLAN filter = Y +CRC offload = Y +VLAN offload = P +L3 checksum offload = Y +L4 checksum offload = Y +Inner L3 checksum = Y +Inner L4 checksum = Y +Basic stats = Y +Extended stats = Y +Multiprocess aware = Y +Linux UIO = Y +Linux VFIO = Y +x86-64 = Y +Usage doc = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/mlx4.ini b/src/seastar/dpdk/doc/guides/nics/features/mlx4.ini new file mode 100644 index 00000000..285f0ecf --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/mlx4.ini @@ -0,0 +1,33 @@ +; +; Supported features of the 'mlx4' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Link status = Y +Link status event = Y +Removal event = Y +Queue start/stop = Y +MTU update = Y +Jumbo frame = Y +Scattered Rx = Y +Promiscuous mode = Y +Allmulticast mode = Y +Unicast MAC filter = Y +Multicast MAC filter = Y +RSS hash = Y +SR-IOV = Y +VLAN filter = Y +L3 checksum offload = Y +L4 checksum offload = Y +Inner L3 checksum = Y +Inner L4 checksum = Y +Packet type parsing = Y +Basic stats = Y +Stats per queue = Y +Multiprocess aware = Y +Other kdrv = Y +Power8 = Y +x86-32 = Y +x86-64 = Y +Usage doc = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/mlx5.ini b/src/seastar/dpdk/doc/guides/nics/features/mlx5.ini new file mode 100644 index 00000000..e228c412 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/mlx5.ini @@ -0,0 +1,43 @@ +; +; Supported features of the 'mlx5' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Speed capabilities = Y +Link status = Y +Link status event = Y +Rx interrupt = Y +Queue start/stop = Y +MTU update = Y +Jumbo frame = Y +Scattered Rx = Y +TSO = Y +Promiscuous mode = Y +Allmulticast mode = Y +Unicast MAC filter = Y +Multicast MAC filter = Y +RSS hash = Y +RSS key update = Y +RSS reta update = Y +SR-IOV = Y +VLAN filter = Y +Flow director = Y +Flow API = Y +CRC offload = Y +VLAN offload = Y +L3 checksum offload = Y +L4 checksum offload = Y +Inner L3 checksum = Y +Inner L4 checksum = Y +Packet type parsing = Y +Rx descriptor status = Y +Tx descriptor status = Y +Basic stats = Y +Stats per queue = Y +Multiprocess aware = Y +Other kdrv = Y +Power8 = Y +x86-32 = Y +x86-64 = Y +Usage doc = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/nfp.ini b/src/seastar/dpdk/doc/guides/nics/features/nfp.ini new file mode 100644 index 00000000..a1281d2a --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/nfp.ini @@ -0,0 +1,29 @@ +; +; Supported features of the 'nfp' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Speed capabilities = Y +Link status = Y +Link status event = Y +Rx interrupt = Y +Queue start/stop = Y +MTU update = Y +Jumbo frame = Y +Promiscuous mode = Y +TSO = Y +RSS hash = Y +RSS key update = Y +RSS reta update = Y +SR-IOV = Y +Flow control = Y +VLAN offload = Y +L3 checksum offload = Y +L4 checksum offload = Y +Basic stats = Y +Stats per queue = Y +Linux UIO = Y +Linux VFIO = Y +x86-64 = Y +Usage doc = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/null.ini b/src/seastar/dpdk/doc/guides/nics/features/null.ini new file mode 100644 index 00000000..3957f7ca --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/null.ini @@ -0,0 +1,6 @@ +; +; Supported features of the 'null' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] diff --git a/src/seastar/dpdk/doc/guides/nics/features/pcap.ini b/src/seastar/dpdk/doc/guides/nics/features/pcap.ini new file mode 100644 index 00000000..28e64880 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/pcap.ini @@ -0,0 +1,15 @@ +; +; Supported features of the 'pcap' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Jumbo frame = Y +Basic stats = Y +Multiprocess aware = Y +ARMv7 = Y +ARMv8 = Y +Power8 = Y +x86-32 = Y +x86-64 = Y +Usage doc = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/qede.ini b/src/seastar/dpdk/doc/guides/nics/features/qede.ini new file mode 100644 index 00000000..fba5dc33 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/qede.ini @@ -0,0 +1,40 @@ +; +; Supported features of the 'qede' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Speed capabilities = Y +Link status = Y +Link status event = Y +MTU update = Y +Jumbo frame = Y +Scattered Rx = Y +Promiscuous mode = Y +Allmulticast mode = Y +Unicast MAC filter = Y +Multicast MAC filter = Y +RSS hash = Y +RSS key update = Y +RSS reta update = Y +VLAN filter = Y +Flow control = Y +CRC offload = Y +VLAN offload = Y +L3 checksum offload = Y +L4 checksum offload = Y +Tunnel filter = Y +Inner L3 checksum = Y +Inner L4 checksum = Y +Packet type parsing = Y +Basic stats = Y +Extended stats = Y +Stats per queue = Y +Multiprocess aware = Y +Linux UIO = Y +x86-64 = Y +Usage doc = Y +N-tuple filter = Y +Flow director = Y +LRO = Y +TSO = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/qede_vf.ini b/src/seastar/dpdk/doc/guides/nics/features/qede_vf.ini new file mode 100644 index 00000000..21ec40fa --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/qede_vf.ini @@ -0,0 +1,36 @@ +; +; Supported features of the 'qede_vf' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Speed capabilities = Y +Link status = Y +Link status event = Y +MTU update = Y +Jumbo frame = Y +Scattered Rx = Y +Promiscuous mode = Y +Allmulticast mode = Y +Unicast MAC filter = Y +Multicast MAC filter = Y +RSS hash = Y +RSS key update = Y +RSS reta update = Y +SR-IOV = Y +VLAN filter = Y +Flow control = Y +CRC offload = Y +VLAN offload = Y +L3 checksum offload = Y +L4 checksum offload = Y +Packet type parsing = Y +Basic stats = Y +Extended stats = Y +Stats per queue = Y +Multiprocess aware = Y +Linux UIO = Y +x86-64 = Y +LRO = Y +TSO = Y +Usage doc = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/ring.ini b/src/seastar/dpdk/doc/guides/nics/features/ring.ini new file mode 100644 index 00000000..ac207ba3 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/ring.ini @@ -0,0 +1,6 @@ +; +; Supported features of the 'ring' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] diff --git a/src/seastar/dpdk/doc/guides/nics/features/sfc_efx.ini b/src/seastar/dpdk/doc/guides/nics/features/sfc_efx.ini new file mode 100644 index 00000000..7957b5e9 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/sfc_efx.ini @@ -0,0 +1,34 @@ +; +; Supported features of the 'sfc_efx' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Speed capabilities = Y +Link status = Y +Link status event = Y +Queue start/stop = Y +MTU update = Y +Jumbo frame = Y +Scattered Rx = Y +TSO = Y +Promiscuous mode = Y +Allmulticast mode = Y +Multicast MAC filter = Y +RSS hash = Y +RSS key update = Y +RSS reta update = Y +SR-IOV = Y +Flow control = Y +Flow API = Y +VLAN offload = P +L3 checksum offload = Y +L4 checksum offload = Y +Packet type parsing = Y +Basic stats = Y +Extended stats = Y +FW version = Y +BSD nic_uio = Y +Linux UIO = Y +Linux VFIO = Y +x86-64 = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/szedata2.ini b/src/seastar/dpdk/doc/guides/nics/features/szedata2.ini new file mode 100644 index 00000000..624314d3 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/szedata2.ini @@ -0,0 +1,17 @@ +; +; Supported features of the 'szedata2' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Link status = Y +Queue start/stop = Y +Scattered Rx = Y +Promiscuous mode = Y +Allmulticast mode = Y +Basic stats = Y +Extended stats = Y +Stats per queue = Y +Other kdrv = Y +x86-64 = Y +Usage doc = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/tap.ini b/src/seastar/dpdk/doc/guides/nics/features/tap.ini new file mode 100644 index 00000000..3efae758c --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/tap.ini @@ -0,0 +1,26 @@ +; +; Supported features of the 'tap' driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Link status = Y +Link status event = Y +Jumbo frame = Y +Promiscuous mode = Y +Allmulticast mode = Y +Basic stats = Y +Flow API = Y +MTU update = Y +Multicast MAC filter = Y +Speed capabilities = Y +Unicast MAC filter = Y +Packet type parsing = Y +Flow control = Y +Other kdrv = Y +ARMv7 = Y +ARMv8 = Y +Power8 = Y +x86-32 = Y +x86-64 = Y +Usage doc = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/thunderx.ini b/src/seastar/dpdk/doc/guides/nics/features/thunderx.ini new file mode 100644 index 00000000..b9720be6 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/thunderx.ini @@ -0,0 +1,30 @@ +; +; Supported features of the 'thunderx' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Link status = Y +Link status event = Y +Queue start/stop = Y +MTU update = Y +Jumbo frame = Y +Scattered Rx = Y +Promiscuous mode = Y +Allmulticast mode = Y +RSS hash = Y +RSS key update = Y +RSS reta update = Y +SR-IOV = Y +CRC offload = Y +VLAN offload = P +L3 checksum offload = Y +L4 checksum offload = Y +Packet type parsing = Y +Basic stats = Y +Stats per queue = Y +Registers dump = Y +Multiprocess aware = Y +Linux VFIO = Y +ARMv8 = Y +Usage doc = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/vhost.ini b/src/seastar/dpdk/doc/guides/nics/features/vhost.ini new file mode 100644 index 00000000..dffd1f49 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/vhost.ini @@ -0,0 +1,14 @@ +; +; Supported features of the 'vhost' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Link status = Y +Link status event = Y +Free Tx mbuf on demand = Y +Queue status event = Y +Basic stats = Y +Extended stats = Y +x86-32 = Y +x86-64 = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/virtio.ini b/src/seastar/dpdk/doc/guides/nics/features/virtio.ini new file mode 100644 index 00000000..8e3aca1d --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/virtio.ini @@ -0,0 +1,28 @@ +; +; Supported features of the 'virtio' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Link status = Y +Rx interrupt = Y +Queue start/stop = Y +Scattered Rx = Y +Promiscuous mode = Y +Allmulticast mode = Y +Unicast MAC filter = Y +Multicast MAC filter = Y +VLAN filter = Y +Basic stats = Y +Stats per queue = Y +Extended stats = Y +Multiprocess aware = Y +BSD nic_uio = Y +Linux UIO = Y +Linux VFIO = Y +ARMv7 = Y +ARMv8 = Y +x86-32 = Y +x86-64 = Y +Usage doc = Y +MTU update = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/virtio_vec.ini b/src/seastar/dpdk/doc/guides/nics/features/virtio_vec.ini new file mode 100644 index 00000000..ec93f5c4 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/virtio_vec.ini @@ -0,0 +1,23 @@ +; +; Supported features of the 'virtio_vec' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Link status = Y +Rx interrupt = Y +Queue start/stop = Y +Promiscuous mode = Y +Allmulticast mode = Y +Unicast MAC filter = Y +Multicast MAC filter = Y +VLAN filter = Y +Basic stats = Y +Stats per queue = Y +BSD nic_uio = Y +Linux UIO = Y +Linux VFIO = Y +ARMv7 = Y +ARMv8 = Y +x86-32 = Y +x86-64 = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/vmxnet3.ini b/src/seastar/dpdk/doc/guides/nics/features/vmxnet3.ini new file mode 100644 index 00000000..ef95932a --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/vmxnet3.ini @@ -0,0 +1,28 @@ +; +; Supported features of the 'vmxnet3' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Link status = Y +Link status event = Y +Queue start/stop = Y +MTU update = Y +Jumbo frame = Y +LRO = Y +TSO = Y +Promiscuous mode = Y +Allmulticast mode = Y +Unicast MAC filter = Y +RSS hash = Y +VLAN filter = Y +VLAN offload = Y +L4 checksum offload = Y +Packet type parsing = Y +Basic stats = Y +Stats per queue = Y +Linux UIO = Y +Linux VFIO = Y +x86-32 = Y +x86-64 = Y +Usage doc = Y diff --git a/src/seastar/dpdk/doc/guides/nics/features/xenvirt.ini b/src/seastar/dpdk/doc/guides/nics/features/xenvirt.ini new file mode 100644 index 00000000..8ab5f465 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/features/xenvirt.ini @@ -0,0 +1,6 @@ +; +; Supported features of the 'xenvirt' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] diff --git a/src/seastar/dpdk/doc/guides/nics/fm10k.rst b/src/seastar/dpdk/doc/guides/nics/fm10k.rst new file mode 100644 index 00000000..7fc48624 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/fm10k.rst @@ -0,0 +1,202 @@ +.. BSD LICENSE + Copyright(c) 2015-2016 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +FM10K Poll Mode Driver +====================== + +The FM10K poll mode driver library provides support for the Intel FM10000 +(FM10K) family of 40GbE/100GbE adapters. + +FTAG Based Forwarding of FM10K +------------------------------ + +FTAG Based Forwarding is a unique feature of FM10K. The FM10K family of NICs +support the addition of a Fabric Tag (FTAG) to carry special information. +The FTAG is placed at the beginning of the frame, it contains information +such as where the packet comes from and goes, and the vlan tag. In FTAG based +forwarding mode, the switch logic forwards packets according to glort (global +resource tag) information, rather than the mac and vlan table. Currently this +feature works only on PF. + +To enable this feature, the user should pass a devargs parameter to the eal +like "-w 84:00.0,enable_ftag=1", and the application should make sure an +appropriate FTAG is inserted for every frame on TX side. + +Vector PMD for FM10K +-------------------- + +Vector PMD (vPMD) uses Intel® SIMD instructions to optimize packet I/O. +It improves load/store bandwidth efficiency of L1 data cache by using a wider +SSE/AVX ''register (1)''. +The wider register gives space to hold multiple packet buffers so as to save +on the number of instructions when bulk processing packets. + +There is no change to the PMD API. The RX/TX handlers are the only two entries for +vPMD packet I/O. They are transparently registered at runtime RX/TX execution +if all required conditions are met. + +1. To date, only an SSE version of FM10K vPMD is available. + To ensure that vPMD is in the binary code, set + ``CONFIG_RTE_LIBRTE_FM10K_INC_VECTOR=y`` in the configure file. + +Some constraints apply as pre-conditions for specific optimizations on bulk +packet transfers. The following sections explain RX and TX constraints in the +vPMD. + + +RX Constraints +~~~~~~~~~~~~~~ + + +Prerequisites and Pre-conditions +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +For Vector RX it is assumed that the number of descriptor rings will be a power +of 2. With this pre-condition, the ring pointer can easily scroll back to the +head after hitting the tail without a conditional check. In addition Vector RX +can use this assumption to do a bit mask using ``ring_size - 1``. + + +Features not Supported by Vector RX PMD +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Some features are not supported when trying to increase the throughput in +vPMD. They are: + +* IEEE1588 + +* Flow director + +* Header split + +* RX checksum offload + +Other features are supported using optional MACRO configuration. They include: + +* HW VLAN strip + +* L3/L4 packet type + +To enable via ``RX_OLFLAGS`` use ``RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y``. + +To guarantee the constraint, the following configuration flags in ``dev_conf.rxmode`` +will be checked: + +* ``hw_vlan_extend`` + +* ``hw_ip_checksum`` + +* ``header_split`` + +* ``fdir_conf->mode`` + + +RX Burst Size +^^^^^^^^^^^^^ + +As vPMD is focused on high throughput, it processes 4 packets at a time. So it assumes +that the RX burst should be greater than 4 packets per burst. It returns zero if using +``nb_pkt`` < 4 in the receive handler. If ``nb_pkt`` is not a multiple of 4, a +floor alignment will be applied. + + +TX Constraint +~~~~~~~~~~~~~ + +Features not Supported by TX Vector PMD +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +TX vPMD only works when ``txq_flags`` is set to ``FM10K_SIMPLE_TX_FLAG``. +This means that it does not support TX multi-segment, VLAN offload or TX csum +offload. The following MACROs are used for these three features: + +* ``ETH_TXQ_FLAGS_NOMULTSEGS`` + +* ``ETH_TXQ_FLAGS_NOVLANOFFL`` + +* ``ETH_TXQ_FLAGS_NOXSUMSCTP`` + +* ``ETH_TXQ_FLAGS_NOXSUMUDP`` + +* ``ETH_TXQ_FLAGS_NOXSUMTCP`` + +Limitations +----------- + + +Switch manager +~~~~~~~~~~~~~~ + +The Intel FM10000 family of NICs integrate a hardware switch and multiple host +interfaces. The FM10000 PMD driver only manages host interfaces. For the +switch component another switch driver has to be loaded prior to to the +FM10000 PMD driver. The switch driver can be acquired from Intel support. +Only Testpoint is validated with DPDK, the latest version that has been +validated with DPDK is 4.1.6. + +CRC striping +~~~~~~~~~~~~ + +The FM10000 family of NICs strip the CRC for every packets coming into the +host interface. So, CRC will be stripped even when the +``rxmode.hw_strip_crc`` member is set to 0 in ``struct rte_eth_conf``. + + +Maximum packet length +~~~~~~~~~~~~~~~~~~~~~ + +The FM10000 family of NICS support a maximum of a 15K jumbo frame. The value +is fixed and cannot be changed. So, even when the ``rxmode.max_rx_pkt_len`` +member of ``struct rte_eth_conf`` is set to a value lower than 15364, frames +up to 15364 bytes can still reach the host interface. + +Statistic Polling Frequency +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The FM10000 NICs expose a set of statistics via the PCI BARs. These statistics +are read from the hardware registers when ``rte_eth_stats_get()`` or +``rte_eth_xstats_get()`` is called. The packet counting registers are 32 bits +while the byte counting registers are 48 bits. As a result, the statistics must +be polled regularly in order to ensure the consistency of the returned reads. + +Given the PCIe Gen3 x8, about 50Gbps of traffic can occur. With 64 byte packets +this gives almost 100 million packets/second, causing 32 bit integer overflow +after approx 40 seconds. To ensure these overflows are detected and accounted +for in the statistics, it is necessary to read statistic regularly. It is +suggested to read stats every 20 seconds, which will ensure the statistics +are accurate. + + +Interrupt mode +~~~~~~~~~~~~~~ + +The FM10000 family of NICS need one separate interrupt for mailbox. So only +drivers which support multiple interrupt vectors e.g. vfio-pci can work +for fm10k interrupt mode. diff --git a/src/seastar/dpdk/doc/guides/nics/i40e.rst b/src/seastar/dpdk/doc/guides/nics/i40e.rst new file mode 100644 index 00000000..4d3c7ca0 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/i40e.rst @@ -0,0 +1,449 @@ +.. BSD LICENSE + Copyright(c) 2016 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +I40E Poll Mode Driver +====================== + +The I40E PMD (librte_pmd_i40e) provides poll mode driver support +for the Intel X710/XL710/X722 10/40 Gbps family of adapters. + + +Features +-------- + +Features of the I40E PMD are: + +- Multiple queues for TX and RX +- Receiver Side Scaling (RSS) +- MAC/VLAN filtering +- Packet type information +- Flow director +- Cloud filter +- Checksum offload +- VLAN/QinQ stripping and inserting +- TSO offload +- Promiscuous mode +- Multicast mode +- Port hardware statistics +- Jumbo frames +- Link state information +- Link flow control +- Mirror on port, VLAN and VSI +- Interrupt mode for RX +- Scattered and gather for TX and RX +- Vector Poll mode driver +- DCB +- VMDQ +- SR-IOV VF +- Hot plug +- IEEE1588/802.1AS timestamping +- VF Daemon (VFD) - EXPERIMENTAL + + +Prerequisites +------------- + +- Identifying your adapter using `Intel Support + <http://www.intel.com/support>`_ and get the latest NVM/FW images. + +- Follow the DPDK :ref:`Getting Started Guide for Linux <linux_gsg>` to setup the basic DPDK environment. + +- To get better performance on Intel platforms, please follow the "How to get best performance with NICs on Intel platforms" + section of the :ref:`Getting Started Guide for Linux <linux_gsg>`. + + +Pre-Installation Configuration +------------------------------ + +Config File Options +~~~~~~~~~~~~~~~~~~~ + +The following options can be modified in the ``config`` file. +Please note that enabling debugging options may affect system performance. + +- ``CONFIG_RTE_LIBRTE_I40E_PMD`` (default ``y``) + + Toggle compilation of the ``librte_pmd_i40e`` driver. + +- ``CONFIG_RTE_LIBRTE_I40E_DEBUG_*`` (default ``n``) + + Toggle display of generic debugging messages. + +- ``CONFIG_RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC`` (default ``y``) + + Toggle bulk allocation for RX. + +- ``CONFIG_RTE_LIBRTE_I40E_INC_VECTOR`` (default ``n``) + + Toggle the use of Vector PMD instead of normal RX/TX path. + To enable vPMD for RX, bulk allocation for Rx must be allowed. + +- ``CONFIG_RTE_LIBRTE_I40E_16BYTE_RX_DESC`` (default ``n``) + + Toggle to use a 16-byte RX descriptor, by default the RX descriptor is 32 byte. + +- ``CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_PF`` (default ``64``) + + Number of queues reserved for PF. + +- ``CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VF`` (default ``4``) + + Number of queues reserved for each SR-IOV VF. + +- ``CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM`` (default ``4``) + + Number of queues reserved for each VMDQ Pool. + +- ``CONFIG_RTE_LIBRTE_I40E_ITR_INTERVAL`` (default ``-1``) + + Interrupt Throttling interval. + + +Driver compilation and testing +------------------------------ + +Refer to the document :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` +for details. + + +SR-IOV: Prerequisites and sample Application Notes +-------------------------------------------------- + +#. Load the kernel module: + + .. code-block:: console + + modprobe i40e + + Check the output in dmesg: + + .. code-block:: console + + i40e 0000:83:00.1 ens802f0: renamed from eth0 + +#. Bring up the PF ports: + + .. code-block:: console + + ifconfig ens802f0 up + +#. Create VF device(s): + + Echo the number of VFs to be created into the ``sriov_numvfs`` sysfs entry + of the parent PF. + + Example: + + .. code-block:: console + + echo 2 > /sys/devices/pci0000:00/0000:00:03.0/0000:81:00.0/sriov_numvfs + + +#. Assign VF MAC address: + + Assign MAC address to the VF using iproute2 utility. The syntax is: + + .. code-block:: console + + ip link set <PF netdev id> vf <VF id> mac <macaddr> + + Example: + + .. code-block:: console + + ip link set ens802f0 vf 0 mac a0:b0:c0:d0:e0:f0 + +#. Assign VF to VM, and bring up the VM. + Please see the documentation for the *I40E/IXGBE/IGB Virtual Function Driver*. + +#. Running testpmd: + + Follow instructions available in the document + :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` + to run testpmd. + + Example output: + + .. code-block:: console + + ... + EAL: PCI device 0000:83:00.0 on NUMA socket 1 + EAL: probe driver: 8086:1572 rte_i40e_pmd + EAL: PCI memory mapped at 0x7f7f80000000 + EAL: PCI memory mapped at 0x7f7f80800000 + PMD: eth_i40e_dev_init(): FW 5.0 API 1.5 NVM 05.00.02 eetrack 8000208a + Interactive-mode selected + Configuring Port 0 (socket 0) + ... + + PMD: i40e_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are + satisfied.Rx Burst Bulk Alloc function will be used on port=0, queue=0. + + ... + Port 0: 68:05:CA:26:85:84 + Checking link statuses... + Port 0 Link Up - speed 10000 Mbps - full-duplex + Done + + testpmd> + + +Sample Application Notes +------------------------ + +Vlan filter +~~~~~~~~~~~ + +Vlan filter only works when Promiscuous mode is off. + +To start ``testpmd``, and add vlan 10 to port 0: + +.. code-block:: console + + ./app/testpmd -l 0-15 -n 4 -- -i --forward-mode=mac + ... + + testpmd> set promisc 0 off + testpmd> rx_vlan add 10 0 + + +Flow Director +~~~~~~~~~~~~~ + +The Flow Director works in receive mode to identify specific flows or sets of flows and route them to specific queues. +The Flow Director filters can match the different fields for different type of packet: flow type, specific input set per flow type and the flexible payload. + +The default input set of each flow type is:: + + ipv4-other : src_ip_address, dst_ip_address + ipv4-frag : src_ip_address, dst_ip_address + ipv4-tcp : src_ip_address, dst_ip_address, src_port, dst_port + ipv4-udp : src_ip_address, dst_ip_address, src_port, dst_port + ipv4-sctp : src_ip_address, dst_ip_address, src_port, dst_port, + verification_tag + ipv6-other : src_ip_address, dst_ip_address + ipv6-frag : src_ip_address, dst_ip_address + ipv6-tcp : src_ip_address, dst_ip_address, src_port, dst_port + ipv6-udp : src_ip_address, dst_ip_address, src_port, dst_port + ipv6-sctp : src_ip_address, dst_ip_address, src_port, dst_port, + verification_tag + l2_payload : ether_type + +The flex payload is selected from offset 0 to 15 of packet's payload by default, while it is masked out from matching. + +Start ``testpmd`` with ``--disable-rss`` and ``--pkt-filter-mode=perfect``: + +.. code-block:: console + + ./app/testpmd -l 0-15 -n 4 -- -i --disable-rss --pkt-filter-mode=perfect \ + --rxq=8 --txq=8 --nb-cores=8 --nb-ports=1 + +Add a rule to direct ``ipv4-udp`` packet whose ``dst_ip=2.2.2.5, src_ip=2.2.2.3, src_port=32, dst_port=32`` to queue 1: + +.. code-block:: console + + testpmd> flow_director_filter 0 mode IP add flow ipv4-udp \ + src 2.2.2.3 32 dst 2.2.2.5 32 vlan 0 flexbytes () \ + fwd pf queue 1 fd_id 1 + +Check the flow director status: + +.. code-block:: console + + testpmd> show port fdir 0 + + ######################## FDIR infos for port 0 #################### + MODE: PERFECT + SUPPORTED FLOW TYPE: ipv4-frag ipv4-tcp ipv4-udp ipv4-sctp ipv4-other + ipv6-frag ipv6-tcp ipv6-udp ipv6-sctp ipv6-other + l2_payload + FLEX PAYLOAD INFO: + max_len: 16 payload_limit: 480 + payload_unit: 2 payload_seg: 3 + bitmask_unit: 2 bitmask_num: 2 + MASK: + vlan_tci: 0x0000, + src_ipv4: 0x00000000, + dst_ipv4: 0x00000000, + src_port: 0x0000, + dst_port: 0x0000 + src_ipv6: 0x00000000,0x00000000,0x00000000,0x00000000, + dst_ipv6: 0x00000000,0x00000000,0x00000000,0x00000000 + FLEX PAYLOAD SRC OFFSET: + L2_PAYLOAD: 0 1 2 3 4 5 6 ... + L3_PAYLOAD: 0 1 2 3 4 5 6 ... + L4_PAYLOAD: 0 1 2 3 4 5 6 ... + FLEX MASK CFG: + ipv4-udp: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv4-tcp: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv4-sctp: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv4-other: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv4-frag: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv6-udp: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv6-tcp: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv6-sctp: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv6-other: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + ipv6-frag: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + l2_payload: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + guarant_count: 1 best_count: 0 + guarant_space: 512 best_space: 7168 + collision: 0 free: 0 + maxhash: 0 maxlen: 0 + add: 0 remove: 0 + f_add: 0 f_remove: 0 + + +Delete all flow director rules on a port: + +.. code-block:: console + + testpmd> flush_flow_director 0 + +Floating VEB +~~~~~~~~~~~~~ + +The Intel® Ethernet Controller X710 and XL710 Family support a feature called +"Floating VEB". + +A Virtual Ethernet Bridge (VEB) is an IEEE Edge Virtual Bridging (EVB) term +for functionality that allows local switching between virtual endpoints within +a physical endpoint and also with an external bridge/network. + +A "Floating" VEB doesn't have an uplink connection to the outside world so all +switching is done internally and remains within the host. As such, this +feature provides security benefits. + +In addition, a Floating VEB overcomes a limitation of normal VEBs where they +cannot forward packets when the physical link is down. Floating VEBs don't need +to connect to the NIC port so they can still forward traffic from VF to VF +even when the physical link is down. + +Therefore, with this feature enabled VFs can be limited to communicating with +each other but not an outside network, and they can do so even when there is +no physical uplink on the associated NIC port. + +To enable this feature, the user should pass a ``devargs`` parameter to the +EAL, for example:: + + -w 84:00.0,enable_floating_veb=1 + +In this configuration the PMD will use the floating VEB feature for all the +VFs created by this PF device. + +Alternatively, the user can specify which VFs need to connect to this floating +VEB using the ``floating_veb_list`` argument:: + + -w 84:00.0,enable_floating_veb=1,floating_veb_list=1;3-4 + +In this example ``VF1``, ``VF3`` and ``VF4`` connect to the floating VEB, +while other VFs connect to the normal VEB. + +The current implementation only supports one floating VEB and one regular +VEB. VFs can connect to a floating VEB or a regular VEB according to the +configuration passed on the EAL command line. + +The floating VEB functionality requires a NIC firmware version of 5.0 +or greater. + + +Limitations or Known issues +--------------------------- + +MPLS packet classification on X710/XL710 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For firmware versions prior to 5.0, MPLS packets are not recognized by the NIC. +The L2 Payload flow type in flow director can be used to classify MPLS packet +by using a command in testpmd like: + + testpmd> flow_director_filter 0 mode IP add flow l2_payload ether \ + 0x8847 flexbytes () fwd pf queue <N> fd_id <M> + +With the NIC firmware version 5.0 or greater, some limited MPLS support +is added: Native MPLS (MPLS in Ethernet) skip is implemented, while no +new packet type, no classification or offload are possible. With this change, +L2 Payload flow type in flow director cannot be used to classify MPLS packet +as with previous firmware versions. Meanwhile, the Ethertype filter can be +used to classify MPLS packet by using a command in testpmd like: + + testpmd> ethertype_filter 0 add mac_ignr 00:00:00:00:00:00 ethertype \ + 0x8847 fwd queue <M> + +16 Byte Descriptor cannot be used on DPDK VF +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If the Linux i40e kernel driver is used as host driver, while DPDK i40e PMD +is used as the VF driver, DPDK cannot choose 16 byte receive descriptor. That +is to say, user should keep ``CONFIG_RTE_LIBRTE_I40E_16BYTE_RX_DESC=n`` in +config file. + +Link down with i40e kernel driver after DPDK application exit +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +After DPDK application quit, and the device is bound back to Linux i40e +kernel driver, the link cannot be up after ``ifconfig <dev> up``. +To work around this issue, ``ethtool -s <dev> autoneg on`` should be +set first and then the link can be brought up through ``ifconfig <dev> up``. + +NOTE: requires Linux kernel i40e driver version >= 1.4.X + +Receive packets with Ethertype 0x88A8 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Due to the FW limitation, PF can receive packets with Ethertype 0x88A8 +only when floating VEB is disabled. + +Incorrect Rx statistics when packet is oversize +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When a packet is over maximum frame size, the packet is dropped. +However the Rx statistics, when calling `rte_eth_stats_get` incorrectly +shows it as received. + +VF & TC max bandwidth setting +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The per VF max bandwidth and per TC max bandwidth cannot be enabled in parallel. +The dehavior is different when handling per VF and per TC max bandwidth setting. +When enabling per VF max bandwidth, SW will check if per TC max bandwidth is +enabled. If so, return failure. +When enabling per TC max bandwidth, SW will check if per VF max bandwidth +is enabled. If so, disable per VF max bandwidth and continue with per TC max +bandwidth setting. + +TC TX scheduling mode setting +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +There're 2 TX scheduling modes for TCs, round robin and strict priority mode. +If a TC is set to strict priority mode, it can consume unlimited bandwidth. +It means if APP has set the max bandwidth for that TC, it comes to no +effect. +It's suggested to set the strict priority mode for a TC that is latency +sensitive but no consuming much bandwidth. diff --git a/src/seastar/dpdk/doc/guides/nics/img/console.png b/src/seastar/dpdk/doc/guides/nics/img/console.png Binary files differnew file mode 100644 index 00000000..99423340 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/img/console.png diff --git a/src/seastar/dpdk/doc/guides/nics/img/fast_pkt_proc.png b/src/seastar/dpdk/doc/guides/nics/img/fast_pkt_proc.png Binary files differnew file mode 100644 index 00000000..48d57e5c --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/img/fast_pkt_proc.png diff --git a/src/seastar/dpdk/doc/guides/nics/img/forward_stats.png b/src/seastar/dpdk/doc/guides/nics/img/forward_stats.png Binary files differnew file mode 100644 index 00000000..23e35325 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/img/forward_stats.png diff --git a/src/seastar/dpdk/doc/guides/nics/img/host_vm_comms.png b/src/seastar/dpdk/doc/guides/nics/img/host_vm_comms.png Binary files differnew file mode 100644 index 00000000..4e0b3c96 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/img/host_vm_comms.png diff --git a/src/seastar/dpdk/doc/guides/nics/img/host_vm_comms_qemu.png b/src/seastar/dpdk/doc/guides/nics/img/host_vm_comms_qemu.png Binary files differnew file mode 100644 index 00000000..391a4eac --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/img/host_vm_comms_qemu.png diff --git a/src/seastar/dpdk/doc/guides/nics/img/inter_vm_comms.png b/src/seastar/dpdk/doc/guides/nics/img/inter_vm_comms.png Binary files differnew file mode 100644 index 00000000..6d85ece7 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/img/inter_vm_comms.png diff --git a/src/seastar/dpdk/doc/guides/nics/img/perf_benchmark.png b/src/seastar/dpdk/doc/guides/nics/img/perf_benchmark.png Binary files differnew file mode 100644 index 00000000..aba818c3 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/img/perf_benchmark.png diff --git a/src/seastar/dpdk/doc/guides/nics/img/single_port_nic.png b/src/seastar/dpdk/doc/guides/nics/img/single_port_nic.png Binary files differnew file mode 100644 index 00000000..8f39d73b --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/img/single_port_nic.png diff --git a/src/seastar/dpdk/doc/guides/nics/img/vm_vm_comms.png b/src/seastar/dpdk/doc/guides/nics/img/vm_vm_comms.png Binary files differnew file mode 100644 index 00000000..2bf1cd27 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/img/vm_vm_comms.png diff --git a/src/seastar/dpdk/doc/guides/nics/img/vmxnet3_int.png b/src/seastar/dpdk/doc/guides/nics/img/vmxnet3_int.png Binary files differnew file mode 100644 index 00000000..6541feba --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/img/vmxnet3_int.png diff --git a/src/seastar/dpdk/doc/guides/nics/img/vswitch_vm.png b/src/seastar/dpdk/doc/guides/nics/img/vswitch_vm.png Binary files differnew file mode 100644 index 00000000..ac817aaa --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/img/vswitch_vm.png diff --git a/src/seastar/dpdk/doc/guides/nics/index.rst b/src/seastar/dpdk/doc/guides/nics/index.rst new file mode 100644 index 00000000..240d0824 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/index.rst @@ -0,0 +1,86 @@ +.. BSD LICENSE + Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Network Interface Controller Drivers +==================================== + +.. toctree:: + :maxdepth: 3 + :numbered: + + overview + build_and_test + ark + avp + bnx2x + bnxt + cxgbe + dpaa2 + e1000em + ena + enic + fm10k + i40e + ixgbe + intel_vf + kni + liquidio + mlx4 + mlx5 + nfp + qede + sfc_efx + szedata2 + tap + thunderx + virtio + vhost + vmxnet3 + pcap_ring + +**Figures** + +:numref:`figure_single_port_nic` :ref:`figure_single_port_nic` + +:numref:`figure_perf_benchmark` :ref:`figure_perf_benchmark` + +:numref:`figure_fast_pkt_proc` :ref:`figure_fast_pkt_proc` + +:numref:`figure_inter_vm_comms` :ref:`figure_inter_vm_comms` + +:numref:`figure_host_vm_comms` :ref:`figure_host_vm_comms` + +:numref:`figure_host_vm_comms_qemu` :ref:`figure_host_vm_comms_qemu` + +:numref:`figure_vmxnet3_int` :ref:`figure_vmxnet3_int` + +:numref:`figure_vswitch_vm` :ref:`figure_vswitch_vm` + +:numref:`figure_vm_vm_comms` :ref:`figure_vm_vm_comms` diff --git a/src/seastar/dpdk/doc/guides/nics/intel_vf.rst b/src/seastar/dpdk/doc/guides/nics/intel_vf.rst new file mode 100644 index 00000000..1e83bf6e --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/intel_vf.rst @@ -0,0 +1,609 @@ +.. BSD LICENSE + Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +I40E/IXGBE/IGB Virtual Function Driver +====================================== + +Supported Intel® Ethernet Controllers (see the *DPDK Release Notes* for details) +support the following modes of operation in a virtualized environment: + +* **SR-IOV mode**: Involves direct assignment of part of the port resources to different guest operating systems + using the PCI-SIG Single Root I/O Virtualization (SR IOV) standard, + also known as "native mode" or "pass-through" mode. + In this chapter, this mode is referred to as IOV mode. + +* **VMDq mode**: Involves central management of the networking resources by an IO Virtual Machine (IOVM) or + a Virtual Machine Monitor (VMM), also known as software switch acceleration mode. + In this chapter, this mode is referred to as the Next Generation VMDq mode. + +SR-IOV Mode Utilization in a DPDK Environment +--------------------------------------------- + +The DPDK uses the SR-IOV feature for hardware-based I/O sharing in IOV mode. +Therefore, it is possible to partition SR-IOV capability on Ethernet controller NIC resources logically and +expose them to a virtual machine as a separate PCI function called a "Virtual Function". +Refer to :numref:`figure_single_port_nic`. + +Therefore, a NIC is logically distributed among multiple virtual machines (as shown in :numref:`figure_single_port_nic`), +while still having global data in common to share with the Physical Function and other Virtual Functions. +The DPDK fm10kvf, i40evf, igbvf or ixgbevf as a Poll Mode Driver (PMD) serves for the Intel® 82576 Gigabit Ethernet Controller, +Intel® Ethernet Controller I350 family, Intel® 82599 10 Gigabit Ethernet Controller NIC, +Intel® Fortville 10/40 Gigabit Ethernet Controller NIC's virtual PCI function, or PCIe host-interface of the Intel Ethernet Switch +FM10000 Series. +Meanwhile the DPDK Poll Mode Driver (PMD) also supports "Physical Function" of such NIC's on the host. + +The DPDK PF/VF Poll Mode Driver (PMD) supports the Layer 2 switch on Intel® 82576 Gigabit Ethernet Controller, +Intel® Ethernet Controller I350 family, Intel® 82599 10 Gigabit Ethernet Controller, +and Intel® Fortville 10/40 Gigabit Ethernet Controller NICs so that guest can choose it for inter virtual machine traffic in SR-IOV mode. + +For more detail on SR-IOV, please refer to the following documents: + +* `SR-IOV provides hardware based I/O sharing <http://www.intel.com/network/connectivity/solutions/vmdc.htm>`_ + +* `PCI-SIG-Single Root I/O Virtualization Support on IA + <http://www.intel.com/content/www/us/en/pci-express/pci-sig-single-root-io-virtualization-support-in-virtualization-technology-for-connectivity-paper.html>`_ + +* `Scalable I/O Virtualized Servers <http://www.intel.com/content/www/us/en/virtualization/server-virtualization/scalable-i-o-virtualized-servers-paper.html>`_ + +.. _figure_single_port_nic: + +.. figure:: img/single_port_nic.* + + Virtualization for a Single Port NIC in SR-IOV Mode + + +Physical and Virtual Function Infrastructure +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The following describes the Physical Function and Virtual Functions infrastructure for the supported Ethernet Controller NICs. + +Virtual Functions operate under the respective Physical Function on the same NIC Port and therefore have no access +to the global NIC resources that are shared between other functions for the same NIC port. + +A Virtual Function has basic access to the queue resources and control structures of the queues assigned to it. +For global resource access, a Virtual Function has to send a request to the Physical Function for that port, +and the Physical Function operates on the global resources on behalf of the Virtual Function. +For this out-of-band communication, an SR-IOV enabled NIC provides a memory buffer for each Virtual Function, +which is called a "Mailbox". + +The PCIE host-interface of Intel Ethernet Switch FM10000 Series VF infrastructure +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In a virtualized environment, the programmer can enable a maximum of *64 Virtual Functions (VF)* +globally per PCIE host-interface of the Intel Ethernet Switch FM10000 Series device. +Each VF can have a maximum of 16 queue pairs. +The Physical Function in host could be only configured by the Linux* fm10k driver +(in the case of the Linux Kernel-based Virtual Machine [KVM]), DPDK PMD PF driver doesn't support it yet. + +For example, + +* Using Linux* fm10k driver: + + .. code-block:: console + + rmmod fm10k (To remove the fm10k module) + insmod fm0k.ko max_vfs=2,2 (To enable two Virtual Functions per port) + +Virtual Function enumeration is performed in the following sequence by the Linux* pci driver for a dual-port NIC. +When you enable the four Virtual Functions with the above command, the four enabled functions have a Function# +represented by (Bus#, Device#, Function#) in sequence starting from 0 to 3. +However: + +* Virtual Functions 0 and 2 belong to Physical Function 0 + +* Virtual Functions 1 and 3 belong to Physical Function 1 + +.. note:: + + The above is an important consideration to take into account when targeting specific packets to a selected port. + +Intel® X710/XL710 Gigabit Ethernet Controller VF Infrastructure +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In a virtualized environment, the programmer can enable a maximum of *128 Virtual Functions (VF)* +globally per Intel® X710/XL710 Gigabit Ethernet Controller NIC device. +The number of queue pairs of each VF can be configured by ``CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VF`` in ``config`` file. +The Physical Function in host could be either configured by the Linux* i40e driver +(in the case of the Linux Kernel-based Virtual Machine [KVM]) or by DPDK PMD PF driver. +When using both DPDK PMD PF/VF drivers, the whole NIC will be taken over by DPDK based application. + +For example, + +* Using Linux* i40e driver: + + .. code-block:: console + + rmmod i40e (To remove the i40e module) + insmod i40e.ko max_vfs=2,2 (To enable two Virtual Functions per port) + +* Using the DPDK PMD PF i40e driver: + + Kernel Params: iommu=pt, intel_iommu=on + + .. code-block:: console + + modprobe uio + insmod igb_uio + ./dpdk-devbind.py -b igb_uio bb:ss.f + echo 2 > /sys/bus/pci/devices/0000\:bb\:ss.f/max_vfs (To enable two VFs on a specific PCI device) + + Launch the DPDK testpmd/example or your own host daemon application using the DPDK PMD library. + +Virtual Function enumeration is performed in the following sequence by the Linux* pci driver for a dual-port NIC. +When you enable the four Virtual Functions with the above command, the four enabled functions have a Function# +represented by (Bus#, Device#, Function#) in sequence starting from 0 to 3. +However: + +* Virtual Functions 0 and 2 belong to Physical Function 0 + +* Virtual Functions 1 and 3 belong to Physical Function 1 + +.. note:: + + The above is an important consideration to take into account when targeting specific packets to a selected port. + + For Intel® X710/XL710 Gigabit Ethernet Controller, queues are in pairs. One queue pair means one receive queue and + one transmit queue. The default number of queue pairs per VF is 4, and can be 16 in maximum. + +Intel® 82599 10 Gigabit Ethernet Controller VF Infrastructure +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The programmer can enable a maximum of *63 Virtual Functions* and there must be *one Physical Function* per Intel® 82599 +10 Gigabit Ethernet Controller NIC port. +The reason for this is that the device allows for a maximum of 128 queues per port and a virtual/physical function has to +have at least one queue pair (RX/TX). +The current implementation of the DPDK ixgbevf driver supports a single queue pair (RX/TX) per Virtual Function. +The Physical Function in host could be either configured by the Linux* ixgbe driver +(in the case of the Linux Kernel-based Virtual Machine [KVM]) or by DPDK PMD PF driver. +When using both DPDK PMD PF/VF drivers, the whole NIC will be taken over by DPDK based application. + +For example, + +* Using Linux* ixgbe driver: + + .. code-block:: console + + rmmod ixgbe (To remove the ixgbe module) + insmod ixgbe max_vfs=2,2 (To enable two Virtual Functions per port) + +* Using the DPDK PMD PF ixgbe driver: + + Kernel Params: iommu=pt, intel_iommu=on + + .. code-block:: console + + modprobe uio + insmod igb_uio + ./dpdk-devbind.py -b igb_uio bb:ss.f + echo 2 > /sys/bus/pci/devices/0000\:bb\:ss.f/max_vfs (To enable two VFs on a specific PCI device) + + Launch the DPDK testpmd/example or your own host daemon application using the DPDK PMD library. + +* Using the DPDK PMD PF ixgbe driver to enable VF RSS: + + Same steps as above to install the modules of uio, igb_uio, specify max_vfs for PCI device, and + launch the DPDK testpmd/example or your own host daemon application using the DPDK PMD library. + + The available queue number (at most 4) per VF depends on the total number of pool, which is + determined by the max number of VF at PF initialization stage and the number of queue specified + in config: + + * If the max number of VFs (max_vfs) is set in the range of 1 to 32: + + If the number of Rx queues is specified as 4 (``--rxq=4`` in testpmd), then there are totally 32 + pools (ETH_32_POOLS), and each VF could have 4 Rx queues; + + If the number of Rx queues is specified as 2 (``--rxq=2`` in testpmd), then there are totally 32 + pools (ETH_32_POOLS), and each VF could have 2 Rx queues; + + * If the max number of VFs (max_vfs) is in the range of 33 to 64: + + If the number of Rx queues in specified as 4 (``--rxq=4`` in testpmd), then error message is expected + as ``rxq`` is not correct at this case; + + If the number of rxq is 2 (``--rxq=2`` in testpmd), then there is totally 64 pools (ETH_64_POOLS), + and each VF have 2 Rx queues; + + On host, to enable VF RSS functionality, rx mq mode should be set as ETH_MQ_RX_VMDQ_RSS + or ETH_MQ_RX_RSS mode, and SRIOV mode should be activated (max_vfs >= 1). + It also needs config VF RSS information like hash function, RSS key, RSS key length. + +.. note:: + + The limitation for VF RSS on Intel® 82599 10 Gigabit Ethernet Controller is: + The hash and key are shared among PF and all VF, the RETA table with 128 entries is also shared + among PF and all VF; So it could not to provide a method to query the hash and reta content per + VF on guest, while, if possible, please query them on host for the shared RETA information. + +Virtual Function enumeration is performed in the following sequence by the Linux* pci driver for a dual-port NIC. +When you enable the four Virtual Functions with the above command, the four enabled functions have a Function# +represented by (Bus#, Device#, Function#) in sequence starting from 0 to 3. +However: + +* Virtual Functions 0 and 2 belong to Physical Function 0 + +* Virtual Functions 1 and 3 belong to Physical Function 1 + +.. note:: + + The above is an important consideration to take into account when targeting specific packets to a selected port. + +Intel® 82576 Gigabit Ethernet Controller and Intel® Ethernet Controller I350 Family VF Infrastructure +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In a virtualized environment, an Intel® 82576 Gigabit Ethernet Controller serves up to eight virtual machines (VMs). +The controller has 16 TX and 16 RX queues. +They are generally referred to (or thought of) as queue pairs (one TX and one RX queue). +This gives the controller 16 queue pairs. + +A pool is a group of queue pairs for assignment to the same VF, used for transmit and receive operations. +The controller has eight pools, with each pool containing two queue pairs, that is, two TX and two RX queues assigned to each VF. + +In a virtualized environment, an Intel® Ethernet Controller I350 family device serves up to eight virtual machines (VMs) per port. +The eight queues can be accessed by eight different VMs if configured correctly (the i350 has 4x1GbE ports each with 8T X and 8 RX queues), +that means, one Transmit and one Receive queue assigned to each VF. + +For example, + +* Using Linux* igb driver: + + .. code-block:: console + + rmmod igb (To remove the igb module) + insmod igb max_vfs=2,2 (To enable two Virtual Functions per port) + +* Using DPDK PMD PF igb driver: + + Kernel Params: iommu=pt, intel_iommu=on modprobe uio + + .. code-block:: console + + insmod igb_uio + ./dpdk-devbind.py -b igb_uio bb:ss.f + echo 2 > /sys/bus/pci/devices/0000\:bb\:ss.f/max_vfs (To enable two VFs on a specific pci device) + + Launch DPDK testpmd/example or your own host daemon application using the DPDK PMD library. + +Virtual Function enumeration is performed in the following sequence by the Linux* pci driver for a four-port NIC. +When you enable the four Virtual Functions with the above command, the four enabled functions have a Function# +represented by (Bus#, Device#, Function#) in sequence, starting from 0 to 7. +However: + +* Virtual Functions 0 and 4 belong to Physical Function 0 + +* Virtual Functions 1 and 5 belong to Physical Function 1 + +* Virtual Functions 2 and 6 belong to Physical Function 2 + +* Virtual Functions 3 and 7 belong to Physical Function 3 + +.. note:: + + The above is an important consideration to take into account when targeting specific packets to a selected port. + +Validated Hypervisors +~~~~~~~~~~~~~~~~~~~~~ + +The validated hypervisor is: + +* KVM (Kernel Virtual Machine) with Qemu, version 0.14.0 + +However, the hypervisor is bypassed to configure the Virtual Function devices using the Mailbox interface, +the solution is hypervisor-agnostic. +Xen* and VMware* (when SR- IOV is supported) will also be able to support the DPDK with Virtual Function driver support. + +Expected Guest Operating System in Virtual Machine +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The expected guest operating systems in a virtualized environment are: + +* Fedora* 14 (64-bit) + +* Ubuntu* 10.04 (64-bit) + +For supported kernel versions, refer to the *DPDK Release Notes*. + +Setting Up a KVM Virtual Machine Monitor +---------------------------------------- + +The following describes a target environment: + +* Host Operating System: Fedora 14 + +* Hypervisor: KVM (Kernel Virtual Machine) with Qemu version 0.14.0 + +* Guest Operating System: Fedora 14 + +* Linux Kernel Version: Refer to the *DPDK Getting Started Guide* + +* Target Applications: l2fwd, l3fwd-vf + +The setup procedure is as follows: + +#. Before booting the Host OS, open **BIOS setup** and enable **Intel® VT features**. + +#. While booting the Host OS kernel, pass the intel_iommu=on kernel command line argument using GRUB. + When using DPDK PF driver on host, pass the iommu=pt kernel command line argument in GRUB. + +#. Download qemu-kvm-0.14.0 from + `http://sourceforge.net/projects/kvm/files/qemu-kvm/ <http://sourceforge.net/projects/kvm/files/qemu-kvm/>`_ + and install it in the Host OS using the following steps: + + When using a recent kernel (2.6.25+) with kvm modules included: + + .. code-block:: console + + tar xzf qemu-kvm-release.tar.gz + cd qemu-kvm-release + ./configure --prefix=/usr/local/kvm + make + sudo make install + sudo /sbin/modprobe kvm-intel + + When using an older kernel, or a kernel from a distribution without the kvm modules, + you must download (from the same link), compile and install the modules yourself: + + .. code-block:: console + + tar xjf kvm-kmod-release.tar.bz2 + cd kvm-kmod-release + ./configure + make + sudo make install + sudo /sbin/modprobe kvm-intel + + qemu-kvm installs in the /usr/local/bin directory. + + For more details about KVM configuration and usage, please refer to: + + `http://www.linux-kvm.org/page/HOWTO1 <http://www.linux-kvm.org/page/HOWTO1>`_. + +#. Create a Virtual Machine and install Fedora 14 on the Virtual Machine. + This is referred to as the Guest Operating System (Guest OS). + +#. Download and install the latest ixgbe driver from: + + `http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=14687 <http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=14687>`_ + +#. In the Host OS + + When using Linux kernel ixgbe driver, unload the Linux ixgbe driver and reload it with the max_vfs=2,2 argument: + + .. code-block:: console + + rmmod ixgbe + modprobe ixgbe max_vfs=2,2 + + When using DPDK PMD PF driver, insert DPDK kernel module igb_uio and set the number of VF by sysfs max_vfs: + + .. code-block:: console + + modprobe uio + insmod igb_uio + ./dpdk-devbind.py -b igb_uio 02:00.0 02:00.1 0e:00.0 0e:00.1 + echo 2 > /sys/bus/pci/devices/0000\:02\:00.0/max_vfs + echo 2 > /sys/bus/pci/devices/0000\:02\:00.1/max_vfs + echo 2 > /sys/bus/pci/devices/0000\:0e\:00.0/max_vfs + echo 2 > /sys/bus/pci/devices/0000\:0e\:00.1/max_vfs + + .. note:: + + You need to explicitly specify number of vfs for each port, for example, + in the command above, it creates two vfs for the first two ixgbe ports. + + Let say we have a machine with four physical ixgbe ports: + + + 0000:02:00.0 + + 0000:02:00.1 + + 0000:0e:00.0 + + 0000:0e:00.1 + + The command above creates two vfs for device 0000:02:00.0: + + .. code-block:: console + + ls -alrt /sys/bus/pci/devices/0000\:02\:00.0/virt* + lrwxrwxrwx. 1 root root 0 Apr 13 05:40 /sys/bus/pci/devices/0000:02:00.0/virtfn1 -> ../0000:02:10.2 + lrwxrwxrwx. 1 root root 0 Apr 13 05:40 /sys/bus/pci/devices/0000:02:00.0/virtfn0 -> ../0000:02:10.0 + + It also creates two vfs for device 0000:02:00.1: + + .. code-block:: console + + ls -alrt /sys/bus/pci/devices/0000\:02\:00.1/virt* + lrwxrwxrwx. 1 root root 0 Apr 13 05:51 /sys/bus/pci/devices/0000:02:00.1/virtfn1 -> ../0000:02:10.3 + lrwxrwxrwx. 1 root root 0 Apr 13 05:51 /sys/bus/pci/devices/0000:02:00.1/virtfn0 -> ../0000:02:10.1 + +#. List the PCI devices connected and notice that the Host OS shows two Physical Functions (traditional ports) + and four Virtual Functions (two for each port). + This is the result of the previous step. + +#. Insert the pci_stub module to hold the PCI devices that are freed from the default driver using the following command + (see http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM Section 4 for more information): + + .. code-block:: console + + sudo /sbin/modprobe pci-stub + + Unbind the default driver from the PCI devices representing the Virtual Functions. + A script to perform this action is as follows: + + .. code-block:: console + + echo "8086 10ed" > /sys/bus/pci/drivers/pci-stub/new_id + echo 0000:08:10.0 > /sys/bus/pci/devices/0000:08:10.0/driver/unbind + echo 0000:08:10.0 > /sys/bus/pci/drivers/pci-stub/bind + + where, 0000:08:10.0 belongs to the Virtual Function visible in the Host OS. + +#. Now, start the Virtual Machine by running the following command: + + .. code-block:: console + + /usr/local/kvm/bin/qemu-system-x86_64 -m 4096 -smp 4 -boot c -hda lucid.qcow2 -device pci-assign,host=08:10.0 + + where: + + — -m = memory to assign + + — -smp = number of smp cores + + — -boot = boot option + + — -hda = virtual disk image + + — -device = device to attach + + .. note:: + + — The pci-assign,host=08:10.0 value indicates that you want to attach a PCI device + to a Virtual Machine and the respective (Bus:Device.Function) + numbers should be passed for the Virtual Function to be attached. + + — qemu-kvm-0.14.0 allows a maximum of four PCI devices assigned to a VM, + but this is qemu-kvm version dependent since qemu-kvm-0.14.1 allows a maximum of five PCI devices. + + — qemu-system-x86_64 also has a -cpu command line option that is used to select the cpu_model + to emulate in a Virtual Machine. Therefore, it can be used as: + + .. code-block:: console + + /usr/local/kvm/bin/qemu-system-x86_64 -cpu ? + + (to list all available cpu_models) + + /usr/local/kvm/bin/qemu-system-x86_64 -m 4096 -cpu host -smp 4 -boot c -hda lucid.qcow2 -device pci-assign,host=08:10.0 + + (to use the same cpu_model equivalent to the host cpu) + + For more information, please refer to: `http://wiki.qemu.org/Features/CPUModels <http://wiki.qemu.org/Features/CPUModels>`_. + +#. Install and run DPDK host app to take over the Physical Function. Eg. + + .. code-block:: console + + make install T=x86_64-native-linuxapp-gcc + ./x86_64-native-linuxapp-gcc/app/testpmd -l 0-3 -n 4 -- -i + +#. Finally, access the Guest OS using vncviewer with the localhost:5900 port and check the lspci command output in the Guest OS. + The virtual functions will be listed as available for use. + +#. Configure and install the DPDK with an x86_64-native-linuxapp-gcc configuration on the Guest OS as normal, + that is, there is no change to the normal installation procedure. + + .. code-block:: console + + make config T=x86_64-native-linuxapp-gcc O=x86_64-native-linuxapp-gcc + cd x86_64-native-linuxapp-gcc + make + +.. note:: + + If you are unable to compile the DPDK and you are getting "error: CPU you selected does not support x86-64 instruction set", + power off the Guest OS and start the virtual machine with the correct -cpu option in the qemu- system-x86_64 command as shown in step 9. + You must select the best x86_64 cpu_model to emulate or you can select host option if available. + +.. note:: + + Run the DPDK l2fwd sample application in the Guest OS with Hugepages enabled. + For the expected benchmark performance, you must pin the cores from the Guest OS to the Host OS (taskset can be used to do this) and + you must also look at the PCI Bus layout on the board to ensure you are not running the traffic over the QPI Interface. + +.. note:: + + * The Virtual Machine Manager (the Fedora package name is virt-manager) is a utility for virtual machine management + that can also be used to create, start, stop and delete virtual machines. + If this option is used, step 2 and 6 in the instructions provided will be different. + + * virsh, a command line utility for virtual machine management, + can also be used to bind and unbind devices to a virtual machine in Ubuntu. + If this option is used, step 6 in the instructions provided will be different. + + * The Virtual Machine Monitor (see :numref:`figure_perf_benchmark`) is equivalent to a Host OS with KVM installed as described in the instructions. + +.. _figure_perf_benchmark: + +.. figure:: img/perf_benchmark.* + + Performance Benchmark Setup + + +DPDK SR-IOV PMD PF/VF Driver Usage Model +---------------------------------------- + +Fast Host-based Packet Processing +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Software Defined Network (SDN) trends are demanding fast host-based packet handling. +In a virtualization environment, +the DPDK VF PMD driver performs the same throughput result as a non-VT native environment. + +With such host instance fast packet processing, lots of services such as filtering, QoS, +DPI can be offloaded on the host fast path. + +:numref:`figure_fast_pkt_proc` shows the scenario where some VMs directly communicate externally via a VFs, +while others connect to a virtual switch and share the same uplink bandwidth. + +.. _figure_fast_pkt_proc: + +.. figure:: img/fast_pkt_proc.* + + Fast Host-based Packet Processing + + +SR-IOV (PF/VF) Approach for Inter-VM Communication +-------------------------------------------------- + +Inter-VM data communication is one of the traffic bottle necks in virtualization platforms. +SR-IOV device assignment helps a VM to attach the real device, taking advantage of the bridge in the NIC. +So VF-to-VF traffic within the same physical port (VM0<->VM1) have hardware acceleration. +However, when VF crosses physical ports (VM0<->VM2), there is no such hardware bridge. +In this case, the DPDK PMD PF driver provides host forwarding between such VMs. + +:numref:`figure_inter_vm_comms` shows an example. +In this case an update of the MAC address lookup tables in both the NIC and host DPDK application is required. + +In the NIC, writing the destination of a MAC address belongs to another cross device VM to the PF specific pool. +So when a packet comes in, its destination MAC address will match and forward to the host DPDK PMD application. + +In the host DPDK application, the behavior is similar to L2 forwarding, +that is, the packet is forwarded to the correct PF pool. +The SR-IOV NIC switch forwards the packet to a specific VM according to the MAC destination address +which belongs to the destination VF on the VM. + +.. _figure_inter_vm_comms: + +.. figure:: img/inter_vm_comms.* + + Inter-VM Communication diff --git a/src/seastar/dpdk/doc/guides/nics/ixgbe.rst b/src/seastar/dpdk/doc/guides/nics/ixgbe.rst new file mode 100644 index 00000000..696ff693 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/ixgbe.rst @@ -0,0 +1,260 @@ +.. BSD LICENSE + Copyright(c) 2010-2016 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +IXGBE Driver +============ + +Vector PMD for IXGBE +-------------------- + +Vector PMD uses Intel® SIMD instructions to optimize packet I/O. +It improves load/store bandwidth efficiency of L1 data cache by using a wider SSE/AVX register 1 (1). +The wider register gives space to hold multiple packet buffers so as to save instruction number when processing bulk of packets. + +There is no change to PMD API. The RX/TX handler are the only two entries for vPMD packet I/O. +They are transparently registered at runtime RX/TX execution if all condition checks pass. + +1. To date, only an SSE version of IX GBE vPMD is available. + To ensure that vPMD is in the binary code, ensure that the option CONFIG_RTE_IXGBE_INC_VECTOR=y is in the configure file. + +Some constraints apply as pre-conditions for specific optimizations on bulk packet transfers. +The following sections explain RX and TX constraints in the vPMD. + +RX Constraints +~~~~~~~~~~~~~~ + +Prerequisites and Pre-conditions +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following prerequisites apply: + +* To enable vPMD to work for RX, bulk allocation for Rx must be allowed. + +Ensure that the following pre-conditions are satisfied: + +* rxq->rx_free_thresh >= RTE_PMD_IXGBE_RX_MAX_BURST + +* rxq->rx_free_thresh < rxq->nb_rx_desc + +* (rxq->nb_rx_desc % rxq->rx_free_thresh) == 0 + +* rxq->nb_rx_desc < (IXGBE_MAX_RING_DESC - RTE_PMD_IXGBE_RX_MAX_BURST) + +These conditions are checked in the code. + +Scattered packets are not supported in this mode. +If an incoming packet is greater than the maximum acceptable length of one "mbuf" data size (by default, the size is 2 KB), +vPMD for RX would be disabled. + +By default, IXGBE_MAX_RING_DESC is set to 4096 and RTE_PMD_IXGBE_RX_MAX_BURST is set to 32. + +Feature not Supported by RX Vector PMD +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Some features are not supported when trying to increase the throughput in vPMD. +They are: + +* IEEE1588 + +* FDIR + +* Header split + +* RX checksum off load + +Other features are supported using optional MACRO configuration. They include: + +* HW VLAN strip + +* HW extend dual VLAN + +To guarantee the constraint, configuration flags in dev_conf.rxmode will be checked: + +* hw_vlan_strip + +* hw_vlan_extend + +* hw_ip_checksum + +* header_split + +* dev_conf + +fdir_conf->mode will also be checked. + +RX Burst Size +^^^^^^^^^^^^^ + +As vPMD is focused on high throughput, it assumes that the RX burst size is equal to or greater than 32 per burst. +It returns zero if using nb_pkt < 32 as the expected packet number in the receive handler. + +TX Constraint +~~~~~~~~~~~~~ + +Prerequisite +^^^^^^^^^^^^ + +The only prerequisite is related to tx_rs_thresh. +The tx_rs_thresh value must be greater than or equal to RTE_PMD_IXGBE_TX_MAX_BURST, +but less or equal to RTE_IXGBE_TX_MAX_FREE_BUF_SZ. +Consequently, by default the tx_rs_thresh value is in the range 32 to 64. + +Feature not Supported by TX Vector PMD +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +TX vPMD only works when txq_flags is set to IXGBE_SIMPLE_FLAGS. + +This means that it does not support TX multi-segment, VLAN offload and TX csum offload. +The following MACROs are used for these three features: + +* ETH_TXQ_FLAGS_NOMULTSEGS + +* ETH_TXQ_FLAGS_NOVLANOFFL + +* ETH_TXQ_FLAGS_NOXSUMSCTP + +* ETH_TXQ_FLAGS_NOXSUMUDP + +* ETH_TXQ_FLAGS_NOXSUMTCP + +Application Programming Interface +--------------------------------- + +In DPDK release v16.11 an API for ixgbe specific functions has been added to the ixgbe PMD. +The declarations for the API functions are in the header ``rte_pmd_ixgbe.h``. + +Sample Application Notes +------------------------ + +l3fwd +~~~~~ + +When running l3fwd with vPMD, there is one thing to note. +In the configuration, ensure that port_conf.rxmode.hw_ip_checksum=0. +Otherwise, by default, RX vPMD is disabled. + +load_balancer +~~~~~~~~~~~~~ + +As in the case of l3fwd, set configure port_conf.rxmode.hw_ip_checksum=0 to enable vPMD. +In addition, for improved performance, use -bsz "(32,32),(64,64),(32,32)" in load_balancer to avoid using the default burst size of 144. + + +Limitations or Known issues +--------------------------- + +Malicious Driver Detection not Supported +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The Intel x550 series NICs support a feature called MDD (Malicious +Driver Detection) which checks the behavior of the VF driver. +If this feature is enabled, the VF must use the advanced context descriptor +correctly and set the CC (Check Context) bit. +DPDK PF doesn't support MDD, but kernel PF does. We may hit problem in this +scenario kernel PF + DPDK VF. If user enables MDD in kernel PF, DPDK VF will +not work. Because kernel PF thinks the VF is malicious. But actually it's not. +The only reason is the VF doesn't act as MDD required. +There's significant performance impact to support MDD. DPDK should check if +the advanced context descriptor should be set and set it. And DPDK has to ask +the info about the header length from the upper layer, because parsing the +packet itself is not acceptable. So, it's too expensive to support MDD. +When using kernel PF + DPDK VF on x550, please make sure to use a kernel +PF driver that disables MDD or can disable MDD. + +Some kernel drivers already disable MDD by default while some kernels can use +the command ``insmod ixgbe.ko MDD=0,0`` to disable MDD. Each "0" in the +command refers to a port. For example, if there are 6 ixgbe ports, the command +should be changed to ``insmod ixgbe.ko MDD=0,0,0,0,0,0``. + + +Statistics +~~~~~~~~~~ + +The statistics of ixgbe hardware must be polled regularly in order for it to +remain consistent. Running a DPDK application without polling the statistics will +cause registers on hardware to count to the maximum value, and "stick" at +that value. + +In order to avoid statistic registers every reaching the maximum value, +read the statistics from the hardware using ``rte_eth_stats_get()`` or +``rte_eth_xstats_get()``. + +The maximum time between statistics polls that ensures consistent results can +be calculated as follows: + +.. code-block:: c + + max_read_interval = UINT_MAX / max_packets_per_second + max_read_interval = 4294967295 / 14880952 + max_read_interval = 288.6218096127183 (seconds) + max_read_interval = ~4 mins 48 sec. + +In order to ensure valid results, it is recommended to poll every 4 minutes. + +MTU setting +~~~~~~~~~~~ + +Although the user can set the MTU separately on PF and VF ports, the ixgbe NIC +only supports one global MTU per physical port. +So when the user sets different MTUs on PF and VF ports in one physical port, +the real MTU for all these PF and VF ports is the largest value set. +This behavior is based on the kernel driver behavior. + + +Supported Chipsets and NICs +--------------------------- + +- Intel 82599EB 10 Gigabit Ethernet Controller +- Intel 82598EB 10 Gigabit Ethernet Controller +- Intel 82599ES 10 Gigabit Ethernet Controller +- Intel 82599EN 10 Gigabit Ethernet Controller +- Intel Ethernet Controller X540-AT2 +- Intel Ethernet Controller X550-BT2 +- Intel Ethernet Controller X550-AT2 +- Intel Ethernet Controller X550-AT +- Intel Ethernet Converged Network Adapter X520-SR1 +- Intel Ethernet Converged Network Adapter X520-SR2 +- Intel Ethernet Converged Network Adapter X520-LR1 +- Intel Ethernet Converged Network Adapter X520-DA1 +- Intel Ethernet Converged Network Adapter X520-DA2 +- Intel Ethernet Converged Network Adapter X520-DA4 +- Intel Ethernet Converged Network Adapter X520-QDA1 +- Intel Ethernet Converged Network Adapter X520-T2 +- Intel 10 Gigabit AF DA Dual Port Server Adapter +- Intel 10 Gigabit AT Server Adapter +- Intel 10 Gigabit AT2 Server Adapter +- Intel 10 Gigabit CX4 Dual Port Server Adapter +- Intel 10 Gigabit XF LR Server Adapter +- Intel 10 Gigabit XF SR Dual Port Server Adapter +- Intel 10 Gigabit XF SR Server Adapter +- Intel Ethernet Converged Network Adapter X540-T1 +- Intel Ethernet Converged Network Adapter X540-T2 +- Intel Ethernet Converged Network Adapter X550-T1 +- Intel Ethernet Converged Network Adapter X550-T2 diff --git a/src/seastar/dpdk/doc/guides/nics/kni.rst b/src/seastar/dpdk/doc/guides/nics/kni.rst new file mode 100644 index 00000000..77542b56 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/kni.rst @@ -0,0 +1,197 @@ +.. BSD LICENSE + Copyright(c) 2017 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +KNI Poll Mode Driver +====================== + +KNI PMD is wrapper to the :ref:`librte_kni <kni>` library. + +This PMD enables using KNI without having a KNI specific application, +any forwarding application can use PMD interface for KNI. + +Sending packets to any DPDK controlled interface or sending to the +Linux networking stack will be transparent to the DPDK application. + +To create a KNI device ``net_kni#`` device name should be used, and this +will create ``kni#`` Linux virtual network interface. + +There is no physical device backend for the virtual KNI device. + +Packets sent to the KNI Linux interface will be received by the DPDK +application, and DPDK application may forward packets to a physical NIC +or to a virtual device (like another KNI interface or PCAP interface). + +To forward any traffic from physical NIC to the Linux networking stack, +an application should control a physical port and create one virtual KNI port, +and forward between two. + +Using this PMD requires KNI kernel module be inserted. + + +Usage +----- + +EAL ``--vdev`` argument can be used to create KNI device instance, like:: + + testpmd --vdev=net_kni0 --vdev=net_kn1 -- -i + +Above command will create ``kni0`` and ``kni1`` Linux network interfaces, +those interfaces can be controlled by standard Linux tools. + +When testpmd forwarding starts, any packets sent to ``kni0`` interface +forwarded to the ``kni1`` interface and vice versa. + +There is no hard limit on number of interfaces that can be created. + + +Default interface configuration +------------------------------- + +``librte_kni`` can create Linux network interfaces with different features, +feature set controlled by a configuration struct, and KNI PMD uses a fixed +configuration: + + .. code-block:: console + + Interface name: kni# + force bind kernel thread to a core : NO + mbuf size: MAX_PACKET_SZ + +KNI control path is not supported with the PMD, since there is no physical +backend device by default. + + +PMD arguments +------------- + +``no_request_thread``, by default PMD creates a phtread for each KNI interface +to handle Linux network interface control commands, like ``ifconfig kni0 up`` + +With ``no_request_thread`` option, pthread is not created and control commands +not handled by PMD. + +By default request thread is enabled. And this argument should not be used +most of the time, unless this PMD used with customized DPDK application to handle +requests itself. + +Argument usage:: + + testpmd --vdev "net_kni0,no_request_thread=1" -- -i + + +PMD log messages +---------------- + +If KNI kernel module (rte_kni.ko) not inserted, following error log printed:: + + "KNI: KNI subsystem has not been initialized. Invoke rte_kni_init() first" + + +PMD testing +----------- + +It is possible to test PMD quickly using KNI kernel module loopback feature: + +* Insert KNI kernel module with loopback support: + + .. code-block:: console + + insmod build/kmod/rte_kni.ko lo_mode=lo_mode_fifo_skb + +* Start testpmd with no physical device but two KNI virtual devices: + + .. code-block:: console + + ./testpmd --vdev net_kni0 --vdev net_kni1 -- -i + + .. code-block:: console + + ... + Configuring Port 0 (socket 0) + KNI: pci: 00:00:00 c580:b8 + Port 0: 1A:4A:5B:7C:A2:8C + Configuring Port 1 (socket 0) + KNI: pci: 00:00:00 600:b9 + Port 1: AE:95:21:07:93:DD + Checking link statuses... + Port 0 Link Up - speed 10000 Mbps - full-duplex + Port 1 Link Up - speed 10000 Mbps - full-duplex + Done + testpmd> + +* Observe Linux interfaces + + .. code-block:: console + + $ ifconfig kni0 && ifconfig kni1 + kni0: flags=4098<BROADCAST,MULTICAST> mtu 1500 + ether ae:8e:79:8e:9b:c8 txqueuelen 1000 (Ethernet) + RX packets 0 bytes 0 (0.0 B) + RX errors 0 dropped 0 overruns 0 frame 0 + TX packets 0 bytes 0 (0.0 B) + TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 + + kni1: flags=4098<BROADCAST,MULTICAST> mtu 1500 + ether 9e:76:43:53:3e:9b txqueuelen 1000 (Ethernet) + RX packets 0 bytes 0 (0.0 B) + RX errors 0 dropped 0 overruns 0 frame 0 + TX packets 0 bytes 0 (0.0 B) + TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 + + +* Start forwarding with tx_first: + + .. code-block:: console + + testpmd> start tx_first + +* Quit and check forwarding stats: + + .. code-block:: console + + testpmd> quit + Telling cores to stop... + Waiting for lcores to finish... + + ---------------------- Forward statistics for port 0 ---------------------- + RX-packets: 35637905 RX-dropped: 0 RX-total: 35637905 + TX-packets: 35637947 TX-dropped: 0 TX-total: 35637947 + ---------------------------------------------------------------------------- + + ---------------------- Forward statistics for port 1 ---------------------- + RX-packets: 35637915 RX-dropped: 0 RX-total: 35637915 + TX-packets: 35637937 TX-dropped: 0 TX-total: 35637937 + ---------------------------------------------------------------------------- + + +++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++ + RX-packets: 71275820 RX-dropped: 0 RX-total: 71275820 + TX-packets: 71275884 TX-dropped: 0 TX-total: 71275884 + ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + diff --git a/src/seastar/dpdk/doc/guides/nics/liquidio.rst b/src/seastar/dpdk/doc/guides/nics/liquidio.rst new file mode 100644 index 00000000..f04cb16d --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/liquidio.rst @@ -0,0 +1,223 @@ +.. BSD LICENSE + Copyright(c) 2017 Cavium, Inc.. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Cavium, Inc. nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER(S) OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +LiquidIO VF Poll Mode Driver +============================ + +The LiquidIO VF PMD library (librte_pmd_lio) provides poll mode driver support for +Cavium LiquidIO® II server adapter VFs. PF management and VF creation can be +done using kernel driver. + +More information can be found at `Cavium Official Website +<http://cavium.com/LiquidIO_Adapters.html>`_. + +Supported LiquidIO Adapters +----------------------------- + +- LiquidIO II CN2350 210SV/225SV +- LiquidIO II CN2360 210SV/225SV + + +Pre-Installation Configuration +------------------------------ + +The following options can be modified in the ``config`` file. +Please note that enabling debugging options may affect system performance. + +- ``CONFIG_RTE_LIBRTE_LIO_PMD`` (default ``y``) + + Toggle compilation of LiquidIO PMD. + +- ``CONFIG_RTE_LIBRTE_LIO_DEBUG_DRIVER`` (default ``n``) + + Toggle display of generic debugging messages. + +- ``CONFIG_RTE_LIBRTE_LIO_DEBUG_INIT`` (default ``n``) + + Toggle display of initialization related messages. + +- ``CONFIG_RTE_LIBRTE_LIO_DEBUG_RX`` (default ``n``) + + Toggle display of receive fast path run-time messages. + +- ``CONFIG_RTE_LIBRTE_LIO_DEBUG_TX`` (default ``n``) + + Toggle display of transmit fast path run-time messages. + +- ``CONFIG_RTE_LIBRTE_LIO_DEBUG_MBOX`` (default ``n``) + + Toggle display of mailbox messages. + +- ``CONFIG_RTE_LIBRTE_LIO_DEBUG_REGS`` (default ``n``) + + Toggle display of register reads and writes. + + +SR-IOV: Prerequisites and Sample Application Notes +-------------------------------------------------- + +This section provides instructions to configure SR-IOV with Linux OS. + +#. Verify SR-IOV and ARI capabilities are enabled on the adapter using ``lspci``: + + .. code-block:: console + + lspci -s <slot> -vvv + + Example output: + + .. code-block:: console + + [...] + Capabilities: [148 v1] Alternative Routing-ID Interpretation (ARI) + [...] + Capabilities: [178 v1] Single Root I/O Virtualization (SR-IOV) + [...] + Kernel driver in use: LiquidIO + +#. Load the kernel module: + + .. code-block:: console + + modprobe liquidio + +#. Bring up the PF ports: + + .. code-block:: console + + ifconfig p4p1 up + ifconfig p4p2 up + +#. Change PF MTU if required: + + .. code-block:: console + + ifconfig p4p1 mtu 9000 + ifconfig p4p2 mtu 9000 + +#. Create VF device(s): + + Echo number of VFs to be created into ``"sriov_numvfs"`` sysfs entry + of the parent PF. + + .. code-block:: console + + echo 1 > /sys/bus/pci/devices/0000:03:00.0/sriov_numvfs + echo 1 > /sys/bus/pci/devices/0000:03:00.1/sriov_numvfs + +#. Assign VF MAC address: + + Assign MAC address to the VF using iproute2 utility. The syntax is:: + + ip link set <PF iface> vf <VF id> mac <macaddr> + + Example output: + + .. code-block:: console + + ip link set p4p1 vf 0 mac F2:A8:1B:5E:B4:66 + +#. Assign VF(s) to VM. + + The VF devices may be passed through to the guest VM using qemu or + virt-manager or virsh etc. + + Example qemu guest launch command: + + .. code-block:: console + + ./qemu-system-x86_64 -name lio-vm -machine accel=kvm \ + -cpu host -m 4096 -smp 4 \ + -drive file=<disk_file>,if=none,id=disk1,format=<type> \ + -device virtio-blk-pci,scsi=off,drive=disk1,id=virtio-disk1,bootindex=1 \ + -device vfio-pci,host=03:00.3 -device vfio-pci,host=03:08.3 + +#. Running testpmd + + Refer to the document + :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` to run + ``testpmd`` application. + + .. note:: + + Use ``igb_uio`` instead of ``vfio-pci`` in VM. + + Example output: + + .. code-block:: console + + [...] + EAL: PCI device 0000:03:00.3 on NUMA socket 0 + EAL: probe driver: 177d:9712 net_liovf + EAL: using IOMMU type 1 (Type 1) + PMD: net_liovf[03:00.3]INFO: DEVICE : CN23XX VF + EAL: PCI device 0000:03:08.3 on NUMA socket 0 + EAL: probe driver: 177d:9712 net_liovf + PMD: net_liovf[03:08.3]INFO: DEVICE : CN23XX VF + Interactive-mode selected + USER1: create a new mbuf pool <mbuf_pool_socket_0>: n=171456, size=2176, socket=0 + Configuring Port 0 (socket 0) + PMD: net_liovf[03:00.3]INFO: Starting port 0 + Port 0: F2:A8:1B:5E:B4:66 + Configuring Port 1 (socket 0) + PMD: net_liovf[03:08.3]INFO: Starting port 1 + Port 1: 32:76:CC:EE:56:D7 + Checking link statuses... + Port 0 Link Up - speed 10000 Mbps - full-duplex + Port 1 Link Up - speed 10000 Mbps - full-duplex + Done + testpmd> + + +Limitations +----------- + +VF MTU +~~~~~~ + +VF MTU is limited by PF MTU. Raise PF value before configuring VF for larger packet size. + +VLAN offload +~~~~~~~~~~~~ + +Tx VLAN insertion is not supported and consequently VLAN offload feature is +marked partial. + +Ring size +~~~~~~~~~ + +Number of descriptors for Rx/Tx ring should be in the range 128 to 512. + +CRC striping +~~~~~~~~~~~~ + +LiquidIO adapters strip ethernet FCS of every packet coming to the host +interface. So, CRC will be stripped even when the ``rxmode.hw_strip_crc`` +member is set to 0 in ``struct rte_eth_conf``. diff --git a/src/seastar/dpdk/doc/guides/nics/mlx4.rst b/src/seastar/dpdk/doc/guides/nics/mlx4.rst new file mode 100644 index 00000000..f1f26d4f --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/mlx4.rst @@ -0,0 +1,394 @@ +.. BSD LICENSE + Copyright 2012-2015 6WIND S.A. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of 6WIND S.A. nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +MLX4 poll mode driver library +============================= + +The MLX4 poll mode driver library (**librte_pmd_mlx4**) implements support +for **Mellanox ConnectX-3** and **Mellanox ConnectX-3 Pro** 10/40 Gbps adapters +as well as their virtual functions (VF) in SR-IOV context. + +Information and documentation about this family of adapters can be found on +the `Mellanox website <http://www.mellanox.com>`_. Help is also provided by +the `Mellanox community <http://community.mellanox.com/welcome>`_. + +There is also a `section dedicated to this poll mode driver +<http://www.mellanox.com/page/products_dyn?product_family=209&mtag=pmd_for_dpdk>`_. + +.. note:: + + Due to external dependencies, this driver is disabled by default. It must + be enabled manually by setting ``CONFIG_RTE_LIBRTE_MLX4_PMD=y`` and + recompiling DPDK. + +Implementation details +---------------------- + +Most Mellanox ConnectX-3 devices provide two ports but expose a single PCI +bus address, thus unlike most drivers, librte_pmd_mlx4 registers itself as a +PCI driver that allocates one Ethernet device per detected port. + +For this reason, one cannot white/blacklist a single port without also +white/blacklisting the others on the same device. + +Besides its dependency on libibverbs (that implies libmlx4 and associated +kernel support), librte_pmd_mlx4 relies heavily on system calls for control +operations such as querying/updating the MTU and flow control parameters. + +For security reasons and robustness, this driver only deals with virtual +memory addresses. The way resources allocations are handled by the kernel +combined with hardware specifications that allow it to handle virtual memory +addresses directly ensure that DPDK applications cannot access random +physical memory (or memory that does not belong to the current process). + +This capability allows the PMD to coexist with kernel network interfaces +which remain functional, although they stop receiving unicast packets as +long as they share the same MAC address. + +Compiling librte_pmd_mlx4 causes DPDK to be linked against libibverbs. + +Features +-------- + +- RSS, also known as RCA, is supported. In this mode the number of + configured RX queues must be a power of two. +- VLAN filtering is supported. +- Link state information is provided. +- Promiscuous mode is supported. +- All multicast mode is supported. +- Multiple MAC addresses (unicast, multicast) can be configured. +- Scattered packets are supported for TX and RX. +- Inner L3/L4 (IP, TCP and UDP) TX/RX checksum offloading and validation. +- Outer L3 (IP) TX/RX checksum offloading and validation for VXLAN frames. +- Secondary process TX is supported. + +Limitations +----------- + +- RSS hash key cannot be modified. +- RSS RETA cannot be configured +- RSS always includes L3 (IPv4/IPv6) and L4 (UDP/TCP). They cannot be + dissociated. +- Hardware counters are not implemented (they are software counters). +- Secondary process RX is not supported. + +Configuration +------------- + +Compilation options +~~~~~~~~~~~~~~~~~~~ + +These options can be modified in the ``.config`` file. + +- ``CONFIG_RTE_LIBRTE_MLX4_PMD`` (default **n**) + + Toggle compilation of librte_pmd_mlx4 itself. + +- ``CONFIG_RTE_LIBRTE_MLX4_DEBUG`` (default **n**) + + Toggle debugging code and stricter compilation flags. Enabling this option + adds additional run-time checks and debugging messages at the cost of + lower performance. + +- ``CONFIG_RTE_LIBRTE_MLX4_SGE_WR_N`` (default **4**) + + Number of scatter/gather elements (SGEs) per work request (WR). Lowering + this number improves performance but also limits the ability to receive + scattered packets (packets that do not fit a single mbuf). The default + value is a safe tradeoff. + +- ``CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE`` (default **0**) + + Amount of data to be inlined during TX operations. Improves latency but + lowers throughput. + +- ``CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE`` (default **8**) + + Maximum number of cached memory pools (MPs) per TX queue. Each MP from + which buffers are to be transmitted must be associated to memory regions + (MRs). This is a slow operation that must be cached. + + This value is always 1 for RX queues since they use a single MP. + +- ``CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS`` (default **1**) + + Toggle software counters. No counters are available if this option is + disabled since hardware counters are not supported. + +Environment variables +~~~~~~~~~~~~~~~~~~~~~ + +- ``MLX4_INLINE_RECV_SIZE`` + + A nonzero value enables inline receive for packets up to that size. May + significantly improve performance in some cases but lower it in + others. Requires careful testing. + +Run-time configuration +~~~~~~~~~~~~~~~~~~~~~~ + +- The only constraint when RSS mode is requested is to make sure the number + of RX queues is a power of two. This is a hardware requirement. + +- librte_pmd_mlx4 brings kernel network interfaces up during initialization + because it is affected by their state. Forcing them down prevents packets + reception. + +- **ethtool** operations on related kernel interfaces also affect the PMD. + +- ``port`` parameter [int] + + This parameter provides a physical port to probe and can be specified multiple + times for additional ports. All ports are probed by default if left + unspecified. + +Kernel module parameters +~~~~~~~~~~~~~~~~~~~~~~~~ + +The **mlx4_core** kernel module has several parameters that affect the +behavior and/or the performance of librte_pmd_mlx4. Some of them are described +below. + +- **num_vfs** (integer or triplet, optionally prefixed by device address + strings) + + Create the given number of VFs on the specified devices. + +- **log_num_mgm_entry_size** (integer) + + Device-managed flow steering (DMFS) is required by DPDK applications. It is + enabled by using a negative value, the last four bits of which have a + special meaning. + + - **-1**: force device-managed flow steering (DMFS). + - **-7**: configure optimized steering mode to improve performance with the + following limitation: VLAN filtering is not supported with this mode. + This is the recommended mode in case VLAN filter is not needed. + +Prerequisites +------------- + +This driver relies on external libraries and kernel drivers for resources +allocations and initialization. The following dependencies are not part of +DPDK and must be installed separately: + +- **libibverbs** + + User space verbs framework used by librte_pmd_mlx4. This library provides + a generic interface between the kernel and low-level user space drivers + such as libmlx4. + + It allows slow and privileged operations (context initialization, hardware + resources allocations) to be managed by the kernel and fast operations to + never leave user space. + +- **libmlx4** + + Low-level user space driver library for Mellanox ConnectX-3 devices, + it is automatically loaded by libibverbs. + + This library basically implements send/receive calls to the hardware + queues. + +- **Kernel modules** (mlnx-ofed-kernel) + + They provide the kernel-side verbs API and low level device drivers that + manage actual hardware initialization and resources sharing with user + space processes. + + Unlike most other PMDs, these modules must remain loaded and bound to + their devices: + + - mlx4_core: hardware driver managing Mellanox ConnectX-3 devices. + - mlx4_en: Ethernet device driver that provides kernel network interfaces. + - mlx4_ib: InifiniBand device driver. + - ib_uverbs: user space driver for verbs (entry point for libibverbs). + +- **Firmware update** + + Mellanox OFED releases include firmware updates for ConnectX-3 adapters. + + Because each release provides new features, these updates must be applied to + match the kernel modules and libraries they come with. + +.. note:: + + Both libraries are BSD and GPL licensed. Linux kernel modules are GPL + licensed. + +Currently supported by DPDK: + +- Mellanox OFED **4.0-2.0.0.0**. +- Firmware version **2.40.7000**. +- Supported architectures: **x86_64** and **POWER8**. + +Getting Mellanox OFED +~~~~~~~~~~~~~~~~~~~~~ + +While these libraries and kernel modules are available on OpenFabrics +Alliance's `website <https://www.openfabrics.org/>`_ and provided by package +managers on most distributions, this PMD requires Ethernet extensions that +may not be supported at the moment (this is a work in progress). + +`Mellanox OFED +<http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers>`_ +includes the necessary support and should be used in the meantime. For DPDK, +only libibverbs, libmlx4, mlnx-ofed-kernel packages and firmware updates are +required from that distribution. + +.. note:: + + Several versions of Mellanox OFED are available. Installing the version + this DPDK release was developed and tested against is strongly + recommended. Please check the `prerequisites`_. + +Supported NICs +-------------- + +* Mellanox(R) ConnectX(R)-3 Pro 40G MCX354A-FCC_Ax (2*40G) + +Usage example +------------- + +This section demonstrates how to launch **testpmd** with Mellanox ConnectX-3 +devices managed by librte_pmd_mlx4. + +#. Load the kernel modules: + + .. code-block:: console + + modprobe -a ib_uverbs mlx4_en mlx4_core mlx4_ib + + Alternatively if MLNX_OFED is fully installed, the following script can + be run: + + .. code-block:: console + + /etc/init.d/openibd restart + + .. note:: + + User space I/O kernel modules (uio and igb_uio) are not used and do + not have to be loaded. + +#. Make sure Ethernet interfaces are in working order and linked to kernel + verbs. Related sysfs entries should be present: + + .. code-block:: console + + ls -d /sys/class/net/*/device/infiniband_verbs/uverbs* | cut -d / -f 5 + + Example output: + + .. code-block:: console + + eth2 + eth3 + eth4 + eth5 + +#. Optionally, retrieve their PCI bus addresses for whitelisting: + + .. code-block:: console + + { + for intf in eth2 eth3 eth4 eth5; + do + (cd "/sys/class/net/${intf}/device/" && pwd -P); + done; + } | + sed -n 's,.*/\(.*\),-w \1,p' + + Example output: + + .. code-block:: console + + -w 0000:83:00.0 + -w 0000:83:00.0 + -w 0000:84:00.0 + -w 0000:84:00.0 + + .. note:: + + There are only two distinct PCI bus addresses because the Mellanox + ConnectX-3 adapters installed on this system are dual port. + +#. Request huge pages: + + .. code-block:: console + + echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages/nr_hugepages + +#. Start testpmd with basic parameters: + + .. code-block:: console + + testpmd -l 8-15 -n 4 -w 0000:83:00.0 -w 0000:84:00.0 -- --rxq=2 --txq=2 -i + + Example output: + + .. code-block:: console + + [...] + EAL: PCI device 0000:83:00.0 on NUMA socket 1 + EAL: probe driver: 15b3:1007 librte_pmd_mlx4 + PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_0" (VF: false) + PMD: librte_pmd_mlx4: 2 port(s) detected + PMD: librte_pmd_mlx4: port 1 MAC address is 00:02:c9:b5:b7:50 + PMD: librte_pmd_mlx4: port 2 MAC address is 00:02:c9:b5:b7:51 + EAL: PCI device 0000:84:00.0 on NUMA socket 1 + EAL: probe driver: 15b3:1007 librte_pmd_mlx4 + PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_1" (VF: false) + PMD: librte_pmd_mlx4: 2 port(s) detected + PMD: librte_pmd_mlx4: port 1 MAC address is 00:02:c9:b5:ba:b0 + PMD: librte_pmd_mlx4: port 2 MAC address is 00:02:c9:b5:ba:b1 + Interactive-mode selected + Configuring Port 0 (socket 0) + PMD: librte_pmd_mlx4: 0x867d60: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx4: 0x867d60: RX queues number update: 0 -> 2 + Port 0: 00:02:C9:B5:B7:50 + Configuring Port 1 (socket 0) + PMD: librte_pmd_mlx4: 0x867da0: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx4: 0x867da0: RX queues number update: 0 -> 2 + Port 1: 00:02:C9:B5:B7:51 + Configuring Port 2 (socket 0) + PMD: librte_pmd_mlx4: 0x867de0: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx4: 0x867de0: RX queues number update: 0 -> 2 + Port 2: 00:02:C9:B5:BA:B0 + Configuring Port 3 (socket 0) + PMD: librte_pmd_mlx4: 0x867e20: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx4: 0x867e20: RX queues number update: 0 -> 2 + Port 3: 00:02:C9:B5:BA:B1 + Checking link statuses... + Port 0 Link Up - speed 10000 Mbps - full-duplex + Port 1 Link Up - speed 40000 Mbps - full-duplex + Port 2 Link Up - speed 10000 Mbps - full-duplex + Port 3 Link Up - speed 40000 Mbps - full-duplex + Done + testpmd> diff --git a/src/seastar/dpdk/doc/guides/nics/mlx5.rst b/src/seastar/dpdk/doc/guides/nics/mlx5.rst new file mode 100644 index 00000000..da6dc278 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/mlx5.rst @@ -0,0 +1,471 @@ +.. BSD LICENSE + Copyright 2015 6WIND S.A. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of 6WIND S.A. nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +MLX5 poll mode driver +===================== + +The MLX5 poll mode driver library (**librte_pmd_mlx5**) provides support +for **Mellanox ConnectX-4**, **Mellanox ConnectX-4 Lx** and **Mellanox +ConnectX-5** families of 10/25/40/50/100 Gb/s adapters as well as their +virtual functions (VF) in SR-IOV context. + +Information and documentation about these adapters can be found on the +`Mellanox website <http://www.mellanox.com>`__. Help is also provided by the +`Mellanox community <http://community.mellanox.com/welcome>`__. + +There is also a `section dedicated to this poll mode driver +<http://www.mellanox.com/page/products_dyn?product_family=209&mtag=pmd_for_dpdk>`__. + +.. note:: + + Due to external dependencies, this driver is disabled by default. It must + be enabled manually by setting ``CONFIG_RTE_LIBRTE_MLX5_PMD=y`` and + recompiling DPDK. + +Implementation details +---------------------- + +Besides its dependency on libibverbs (that implies libmlx5 and associated +kernel support), librte_pmd_mlx5 relies heavily on system calls for control +operations such as querying/updating the MTU and flow control parameters. + +For security reasons and robustness, this driver only deals with virtual +memory addresses. The way resources allocations are handled by the kernel +combined with hardware specifications that allow it to handle virtual memory +addresses directly ensure that DPDK applications cannot access random +physical memory (or memory that does not belong to the current process). + +This capability allows the PMD to coexist with kernel network interfaces +which remain functional, although they stop receiving unicast packets as +long as they share the same MAC address. + +Enabling librte_pmd_mlx5 causes DPDK applications to be linked against +libibverbs. + +Features +-------- + +- Multiple TX and RX queues. +- Support for scattered TX and RX frames. +- IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of queues. +- Several RSS hash keys, one for each flow type. +- Configurable RETA table. +- Support for multiple MAC addresses. +- VLAN filtering. +- RX VLAN stripping. +- TX VLAN insertion. +- RX CRC stripping configuration. +- Promiscuous mode. +- Multicast promiscuous mode. +- Hardware checksum offloads. +- Flow director (RTE_FDIR_MODE_PERFECT, RTE_FDIR_MODE_PERFECT_MAC_VLAN and + RTE_ETH_FDIR_REJECT). +- Flow API. +- Secondary process TX is supported. +- KVM and VMware ESX SR-IOV modes are supported. +- RSS hash result is supported. +- Hardware TSO. +- Hardware checksum TX offload for VXLAN and GRE. + +Limitations +----------- + +- Inner RSS for VXLAN frames is not supported yet. +- Port statistics through software counters only. +- Hardware checksum RX offloads for VXLAN inner header are not supported yet. +- Secondary process RX is not supported. + +Configuration +------------- + +Compilation options +~~~~~~~~~~~~~~~~~~~ + +These options can be modified in the ``.config`` file. + +- ``CONFIG_RTE_LIBRTE_MLX5_PMD`` (default **n**) + + Toggle compilation of librte_pmd_mlx5 itself. + +- ``CONFIG_RTE_LIBRTE_MLX5_DEBUG`` (default **n**) + + Toggle debugging code and stricter compilation flags. Enabling this option + adds additional run-time checks and debugging messages at the cost of + lower performance. + +- ``CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE`` (default **8**) + + Maximum number of cached memory pools (MPs) per TX queue. Each MP from + which buffers are to be transmitted must be associated to memory regions + (MRs). This is a slow operation that must be cached. + + This value is always 1 for RX queues since they use a single MP. + +Environment variables +~~~~~~~~~~~~~~~~~~~~~ + +- ``MLX5_PMD_ENABLE_PADDING`` + + Enables HW packet padding in PCI bus transactions. + + When packet size is cache aligned and CRC stripping is enabled, 4 fewer + bytes are written to the PCI bus. Enabling padding makes such packets + aligned again. + + In cases where PCI bandwidth is the bottleneck, padding can improve + performance by 10%. + + This is disabled by default since this can also decrease performance for + unaligned packet sizes. + +Run-time configuration +~~~~~~~~~~~~~~~~~~~~~~ + +- librte_pmd_mlx5 brings kernel network interfaces up during initialization + because it is affected by their state. Forcing them down prevents packets + reception. + +- **ethtool** operations on related kernel interfaces also affect the PMD. + +- ``rxq_cqe_comp_en`` parameter [int] + + A nonzero value enables the compression of CQE on RX side. This feature + allows to save PCI bandwidth and improve performance at the cost of a + slightly higher CPU usage. Enabled by default. + + Supported on: + + - x86_64 with ConnectX4 and ConnectX4 LX + - Power8 with ConnectX4 LX + +- ``txq_inline`` parameter [int] + + Amount of data to be inlined during TX operations. Improves latency. + Can improve PPS performance when PCI back pressure is detected and may be + useful for scenarios involving heavy traffic on many queues. + + It is not enabled by default (set to 0) since the additional software + logic necessary to handle this mode can lower performance when back + pressure is not expected. + +- ``txqs_min_inline`` parameter [int] + + Enable inline send only when the number of TX queues is greater or equal + to this value. + + This option should be used in combination with ``txq_inline`` above. + +- ``txq_mpw_en`` parameter [int] + + A nonzero value enables multi-packet send (MPS) for ConnectX-4 Lx and + enhanced multi-packet send (Enhanced MPS) for ConnectX-5. MPS allows the + TX burst function to pack up multiple packets in a single descriptor + session in order to save PCI bandwidth and improve performance at the + cost of a slightly higher CPU usage. When ``txq_inline`` is set along + with ``txq_mpw_en``, TX burst function tries to copy entire packet data + on to TX descriptor instead of including pointer of packet only if there + is enough room remained in the descriptor. ``txq_inline`` sets + per-descriptor space for either pointers or inlined packets. In addition, + Enhanced MPS supports hybrid mode - mixing inlined packets and pointers + in the same descriptor. + + This option cannot be used in conjunction with ``tso`` below. When ``tso`` + is set, ``txq_mpw_en`` is disabled. + + It is currently only supported on the ConnectX-4 Lx and ConnectX-5 + families of adapters. Enabled by default. + +- ``txq_mpw_hdr_dseg_en`` parameter [int] + + A nonzero value enables including two pointers in the first block of TX + descriptor. This can be used to lessen CPU load for memory copy. + + Effective only when Enhanced MPS is supported. Disabled by default. + +- ``txq_max_inline_len`` parameter [int] + + Maximum size of packet to be inlined. This limits the size of packet to + be inlined. If the size of a packet is larger than configured value, the + packet isn't inlined even though there's enough space remained in the + descriptor. Instead, the packet is included with pointer. + + Effective only when Enhanced MPS is supported. The default value is 256. + +- ``tso`` parameter [int] + + A nonzero value enables hardware TSO. + When hardware TSO is enabled, packets marked with TCP segmentation + offload will be divided into segments by the hardware. + + Disabled by default. + +Prerequisites +------------- + +This driver relies on external libraries and kernel drivers for resources +allocations and initialization. The following dependencies are not part of +DPDK and must be installed separately: + +- **libibverbs** + + User space Verbs framework used by librte_pmd_mlx5. This library provides + a generic interface between the kernel and low-level user space drivers + such as libmlx5. + + It allows slow and privileged operations (context initialization, hardware + resources allocations) to be managed by the kernel and fast operations to + never leave user space. + +- **libmlx5** + + Low-level user space driver library for Mellanox ConnectX-4/ConnectX-5 + devices, it is automatically loaded by libibverbs. + + This library basically implements send/receive calls to the hardware + queues. + +- **Kernel modules** (mlnx-ofed-kernel) + + They provide the kernel-side Verbs API and low level device drivers that + manage actual hardware initialization and resources sharing with user + space processes. + + Unlike most other PMDs, these modules must remain loaded and bound to + their devices: + + - mlx5_core: hardware driver managing Mellanox ConnectX-4/ConnectX-5 + devices and related Ethernet kernel network devices. + - mlx5_ib: InifiniBand device driver. + - ib_uverbs: user space driver for Verbs (entry point for libibverbs). + +- **Firmware update** + + Mellanox OFED releases include firmware updates for ConnectX-4/ConnectX-5 + adapters. + + Because each release provides new features, these updates must be applied to + match the kernel modules and libraries they come with. + +.. note:: + + Both libraries are BSD and GPL licensed. Linux kernel modules are GPL + licensed. + +Currently supported by DPDK: + +- Mellanox OFED version: **4.0-2.0.0.0** +- firmware version: + + - ConnectX-4: **12.18.2000** + - ConnectX-4 Lx: **14.18.2000** + - ConnectX-5: **16.19.1200** + - ConnectX-5 Ex: **16.19.1200** + +Getting Mellanox OFED +~~~~~~~~~~~~~~~~~~~~~ + +While these libraries and kernel modules are available on OpenFabrics +Alliance's `website <https://www.openfabrics.org/>`__ and provided by package +managers on most distributions, this PMD requires Ethernet extensions that +may not be supported at the moment (this is a work in progress). + +`Mellanox OFED +<http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux>`__ +includes the necessary support and should be used in the meantime. For DPDK, +only libibverbs, libmlx5, mlnx-ofed-kernel packages and firmware updates are +required from that distribution. + +.. note:: + + Several versions of Mellanox OFED are available. Installing the version + this DPDK release was developed and tested against is strongly + recommended. Please check the `prerequisites`_. + +Supported NICs +-------------- + +* Mellanox(R) ConnectX(R)-4 10G MCX4111A-XCAT (1x10G) +* Mellanox(R) ConnectX(R)-4 10G MCX4121A-XCAT (2x10G) +* Mellanox(R) ConnectX(R)-4 25G MCX4111A-ACAT (1x25G) +* Mellanox(R) ConnectX(R)-4 25G MCX4121A-ACAT (2x25G) +* Mellanox(R) ConnectX(R)-4 40G MCX4131A-BCAT (1x40G) +* Mellanox(R) ConnectX(R)-4 40G MCX413A-BCAT (1x40G) +* Mellanox(R) ConnectX(R)-4 40G MCX415A-BCAT (1x40G) +* Mellanox(R) ConnectX(R)-4 50G MCX4131A-GCAT (1x50G) +* Mellanox(R) ConnectX(R)-4 50G MCX413A-GCAT (1x50G) +* Mellanox(R) ConnectX(R)-4 50G MCX414A-BCAT (2x50G) +* Mellanox(R) ConnectX(R)-4 50G MCX415A-GCAT (2x50G) +* Mellanox(R) ConnectX(R)-4 50G MCX416A-BCAT (2x50G) +* Mellanox(R) ConnectX(R)-4 50G MCX416A-GCAT (2x50G) +* Mellanox(R) ConnectX(R)-4 50G MCX415A-CCAT (1x100G) +* Mellanox(R) ConnectX(R)-4 100G MCX416A-CCAT (2x100G) +* Mellanox(R) ConnectX(R)-4 Lx 10G MCX4121A-XCAT (2x10G) +* Mellanox(R) ConnectX(R)-4 Lx 25G MCX4121A-ACAT (2x25G) +* Mellanox(R) ConnectX(R)-5 100G MCX556A-ECAT (2x100G) +* Mellanox(R) ConnectX(R)-5 Ex EN 100G MCX516A-CDAT (2x100G) + +Notes for testpmd +----------------- + +Compared to librte_pmd_mlx4 that implements a single RSS configuration per +port, librte_pmd_mlx5 supports per-protocol RSS configuration. + +Since ``testpmd`` defaults to IP RSS mode and there is currently no +command-line parameter to enable additional protocols (UDP and TCP as well +as IP), the following commands must be entered from its CLI to get the same +behavior as librte_pmd_mlx4: + +.. code-block:: console + + > port stop all + > port config all rss all + > port start all + +Usage example +------------- + +This section demonstrates how to launch **testpmd** with Mellanox +ConnectX-4/ConnectX-5 devices managed by librte_pmd_mlx5. + +#. Load the kernel modules: + + .. code-block:: console + + modprobe -a ib_uverbs mlx5_core mlx5_ib + + Alternatively if MLNX_OFED is fully installed, the following script can + be run: + + .. code-block:: console + + /etc/init.d/openibd restart + + .. note:: + + User space I/O kernel modules (uio and igb_uio) are not used and do + not have to be loaded. + +#. Make sure Ethernet interfaces are in working order and linked to kernel + verbs. Related sysfs entries should be present: + + .. code-block:: console + + ls -d /sys/class/net/*/device/infiniband_verbs/uverbs* | cut -d / -f 5 + + Example output: + + .. code-block:: console + + eth30 + eth31 + eth32 + eth33 + +#. Optionally, retrieve their PCI bus addresses for whitelisting: + + .. code-block:: console + + { + for intf in eth2 eth3 eth4 eth5; + do + (cd "/sys/class/net/${intf}/device/" && pwd -P); + done; + } | + sed -n 's,.*/\(.*\),-w \1,p' + + Example output: + + .. code-block:: console + + -w 0000:05:00.1 + -w 0000:06:00.0 + -w 0000:06:00.1 + -w 0000:05:00.0 + +#. Request huge pages: + + .. code-block:: console + + echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages/nr_hugepages + +#. Start testpmd with basic parameters: + + .. code-block:: console + + testpmd -l 8-15 -n 4 -w 05:00.0 -w 05:00.1 -w 06:00.0 -w 06:00.1 -- --rxq=2 --txq=2 -i + + Example output: + + .. code-block:: console + + [...] + EAL: PCI device 0000:05:00.0 on NUMA socket 0 + EAL: probe driver: 15b3:1013 librte_pmd_mlx5 + PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_0" (VF: false) + PMD: librte_pmd_mlx5: 1 port(s) detected + PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fe + EAL: PCI device 0000:05:00.1 on NUMA socket 0 + EAL: probe driver: 15b3:1013 librte_pmd_mlx5 + PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_1" (VF: false) + PMD: librte_pmd_mlx5: 1 port(s) detected + PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:ff + EAL: PCI device 0000:06:00.0 on NUMA socket 0 + EAL: probe driver: 15b3:1013 librte_pmd_mlx5 + PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_2" (VF: false) + PMD: librte_pmd_mlx5: 1 port(s) detected + PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fa + EAL: PCI device 0000:06:00.1 on NUMA socket 0 + EAL: probe driver: 15b3:1013 librte_pmd_mlx5 + PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_3" (VF: false) + PMD: librte_pmd_mlx5: 1 port(s) detected + PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fb + Interactive-mode selected + Configuring Port 0 (socket 0) + PMD: librte_pmd_mlx5: 0x8cba80: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx5: 0x8cba80: RX queues number update: 0 -> 2 + Port 0: E4:1D:2D:E7:0C:FE + Configuring Port 1 (socket 0) + PMD: librte_pmd_mlx5: 0x8ccac8: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx5: 0x8ccac8: RX queues number update: 0 -> 2 + Port 1: E4:1D:2D:E7:0C:FF + Configuring Port 2 (socket 0) + PMD: librte_pmd_mlx5: 0x8cdb10: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx5: 0x8cdb10: RX queues number update: 0 -> 2 + Port 2: E4:1D:2D:E7:0C:FA + Configuring Port 3 (socket 0) + PMD: librte_pmd_mlx5: 0x8ceb58: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx5: 0x8ceb58: RX queues number update: 0 -> 2 + Port 3: E4:1D:2D:E7:0C:FB + Checking link statuses... + Port 0 Link Up - speed 40000 Mbps - full-duplex + Port 1 Link Up - speed 40000 Mbps - full-duplex + Port 2 Link Up - speed 10000 Mbps - full-duplex + Port 3 Link Up - speed 10000 Mbps - full-duplex + Done + testpmd> diff --git a/src/seastar/dpdk/doc/guides/nics/nfp.rst b/src/seastar/dpdk/doc/guides/nics/nfp.rst new file mode 100644 index 00000000..c732fb1f --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/nfp.rst @@ -0,0 +1,123 @@ +.. BSD LICENSE + Copyright(c) 2015 Netronome Systems, Inc. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +NFP poll mode driver library +============================ + +Netronome's sixth generation of flow processors pack 216 programmable +cores and over 100 hardware accelerators that uniquely combine packet, +flow, security and content processing in a single device that scales +up to 400 Gbps. + +This document explains how to use DPDK with the Netronome Poll Mode +Driver (PMD) supporting Netronome's Network Flow Processor 6xxx +(NFP-6xxx). + +Currently the driver supports virtual functions (VFs) only. + +Dependencies +------------ + +Before using the Netronome's DPDK PMD some NFP-6xxx configuration, +which is not related to DPDK, is required. The system requires +installation of **Netronome's BSP (Board Support Package)** which includes +Linux drivers, programs and libraries. + +If you have a NFP-6xxx device you should already have the code and +documentation for doing this configuration. Contact +**support@netronome.com** to obtain the latest available firmware. + +The NFP Linux kernel drivers (including the required PF driver for the +NFP) are available on Github at +**https://github.com/Netronome/nfp-drv-kmods** along with build +instructions. + +DPDK runs in userspace and PMDs uses the Linux kernel UIO interface to +allow access to physical devices from userspace. The NFP PMD requires +the **igb_uio** UIO driver, available with DPDK, to perform correct +initialization. + +Building the software +--------------------- + +Netronome's PMD code is provided in the **drivers/net/nfp** directory. +Although NFP PMD has Netronome´s BSP dependencies, it is possible to +compile it along with other DPDK PMDs even if no BSP was installed before. +Of course, a DPDK app will require such a BSP installed for using the +NFP PMD. + +Default PMD configuration is at **common_linuxapp configuration** file: + +- **CONFIG_RTE_LIBRTE_NFP_PMD=y** + +Once DPDK is built all the DPDK apps and examples include support for +the NFP PMD. + + +Driver compilation and testing +------------------------------ + +Refer to the document :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` +for details. + + +System configuration +-------------------- + +#. **Enable SR-IOV on the NFP-6xxx device:** The current NFP PMD works with + Virtual Functions (VFs) on a NFP device. Make sure that one of the Physical + Function (PF) drivers from the above Github repository is installed and + loaded. + + Virtual Functions need to be enabled before they can be used with the PMD. + Before enabling the VFs it is useful to obtain information about the + current NFP PCI device detected by the system: + + .. code-block:: console + + lspci -d19ee: + + Now, for example, configure two virtual functions on a NFP-6xxx device + whose PCI system identity is "0000:03:00.0": + + .. code-block:: console + + echo 2 > /sys/bus/pci/devices/0000:03:00.0/sriov_numvfs + + The result of this command may be shown using lspci again: + + .. code-block:: console + + lspci -d19ee: -k + + Two new PCI devices should appear in the output of the above command. The + -k option shows the device driver, if any, that devices are bound to. + Depending on the modules loaded at this point the new PCI devices may be + bound to nfp_netvf driver. diff --git a/src/seastar/dpdk/doc/guides/nics/overview.rst b/src/seastar/dpdk/doc/guides/nics/overview.rst new file mode 100644 index 00000000..757a3c90 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/overview.rst @@ -0,0 +1,58 @@ +.. BSD LICENSE + Copyright 2016 6WIND S.A. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of 6WIND S.A. nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Overview of Networking Drivers +============================== + +The networking drivers may be classified in two categories: + +- physical for real devices +- virtual for emulated devices + +Some physical devices may be shaped through a virtual layer as for +SR-IOV. +The interface seen in the virtual environment is a VF (Virtual Function). + +The ethdev layer exposes an API to use the networking functions +of these devices. +The bottom half part of ethdev is implemented by the drivers. +Thus some features may not be implemented. + +There are more differences between drivers regarding some internal properties, +portability or even documentation availability. +Most of these differences are summarized below. + +.. _table_net_pmd_features: + +.. include:: overview_table.txt + +.. Note:: + + Features marked with "P" are partially supported. Refer to the appropriate + NIC guide in the following sections for details. diff --git a/src/seastar/dpdk/doc/guides/nics/pcap_ring.rst b/src/seastar/dpdk/doc/guides/nics/pcap_ring.rst new file mode 100644 index 00000000..5e4f5f60 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/pcap_ring.rst @@ -0,0 +1,282 @@ +.. BSD LICENSE + Copyright(c) 2010-2015 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Libpcap and Ring Based Poll Mode Drivers +======================================== + +In addition to Poll Mode Drivers (PMDs) for physical and virtual hardware, +the DPDK also includes two pure-software PMDs. These two drivers are: + +* A libpcap -based PMD (librte_pmd_pcap) that reads and writes packets using libpcap, + - both from files on disk, as well as from physical NIC devices using standard Linux kernel drivers. + +* A ring-based PMD (librte_pmd_ring) that allows a set of software FIFOs (that is, rte_ring) + to be accessed using the PMD APIs, as though they were physical NICs. + +.. note:: + + The libpcap -based PMD is disabled by default in the build configuration files, + owing to an external dependency on the libpcap development files which must be installed on the board. + Once the libpcap development files are installed, + the library can be enabled by setting CONFIG_RTE_LIBRTE_PMD_PCAP=y and recompiling the DPDK. + +Using the Drivers from the EAL Command Line +------------------------------------------- + +For ease of use, the DPDK EAL also has been extended to allow pseudo-Ethernet devices, +using one or more of these drivers, +to be created at application startup time during EAL initialization. + +To do so, the --vdev= parameter must be passed to the EAL. +This takes take options to allow ring and pcap-based Ethernet to be allocated and used transparently by the application. +This can be used, for example, for testing on a virtual machine where there are no Ethernet ports. + +Libpcap-based PMD +~~~~~~~~~~~~~~~~~ + +Pcap-based devices can be created using the virtual device --vdev option. +The device name must start with the net_pcap prefix followed by numbers or letters. +The name is unique for each device. Each device can have multiple stream options and multiple devices can be used. +Multiple device definitions can be arranged using multiple --vdev. +Device name and stream options must be separated by commas as shown below: + +.. code-block:: console + + $RTE_TARGET/app/testpmd -l 0-3 -n 4 \ + --vdev 'net_pcap0,stream_opt0=..,stream_opt1=..' \ + --vdev='net_pcap1,stream_opt0=..' + +Device Streams +^^^^^^^^^^^^^^ + +Multiple ways of stream definitions can be assessed and combined as long as the following two rules are respected: + +* A device is provided with two different streams - reception and transmission. + +* A device is provided with one network interface name used for reading and writing packets. + +The different stream types are: + +* rx_pcap: Defines a reception stream based on a pcap file. + The driver reads each packet within the given pcap file as if it was receiving it from the wire. + The value is a path to a valid pcap file. + + rx_pcap=/path/to/file.pcap + +* tx_pcap: Defines a transmission stream based on a pcap file. + The driver writes each received packet to the given pcap file. + The value is a path to a pcap file. + The file is overwritten if it already exists and it is created if it does not. + + tx_pcap=/path/to/file.pcap + +* rx_iface: Defines a reception stream based on a network interface name. + The driver reads packets coming from the given interface using the Linux kernel driver for that interface. + The value is an interface name. + + rx_iface=eth0 + +* tx_iface: Defines a transmission stream based on a network interface name. + The driver sends packets to the given interface using the Linux kernel driver for that interface. + The value is an interface name. + + tx_iface=eth0 + +* iface: Defines a device mapping a network interface. + The driver both reads and writes packets from and to the given interface. + The value is an interface name. + + iface=eth0 + +Examples of Usage +^^^^^^^^^^^^^^^^^ + +Read packets from one pcap file and write them to another: + +.. code-block:: console + + $RTE_TARGET/app/testpmd -l 0-3 -n 4 \ + --vdev 'net_pcap0,rx_pcap=file_rx.pcap,tx_pcap=file_tx.pcap' \ + -- --port-topology=chained + +Read packets from a network interface and write them to a pcap file: + +.. code-block:: console + + $RTE_TARGET/app/testpmd -l 0-3 -n 4 \ + --vdev 'net_pcap0,rx_iface=eth0,tx_pcap=file_tx.pcap' \ + -- --port-topology=chained + +Read packets from a pcap file and write them to a network interface: + +.. code-block:: console + + $RTE_TARGET/app/testpmd -l 0-3 -n 4 \ + --vdev 'net_pcap0,rx_pcap=file_rx.pcap,tx_iface=eth1' \ + -- --port-topology=chained + +Forward packets through two network interfaces: + +.. code-block:: console + + $RTE_TARGET/app/testpmd -l 0-3 -n 4 \ + --vdev 'net_pcap0,iface=eth0' --vdev='net_pcap1;iface=eth1' + +Using libpcap-based PMD with the testpmd Application +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +One of the first things that testpmd does before starting to forward packets is to flush the RX streams +by reading the first 512 packets on every RX stream and discarding them. +When using a libpcap-based PMD this behavior can be turned off using the following command line option: + +.. code-block:: console + + --no-flush-rx + +It is also available in the runtime command line: + +.. code-block:: console + + set flush_rx on/off + +It is useful for the case where the rx_pcap is being used and no packets are meant to be discarded. +Otherwise, the first 512 packets from the input pcap file will be discarded by the RX flushing operation. + +.. code-block:: console + + $RTE_TARGET/app/testpmd -l 0-3 -n 4 \ + --vdev 'net_pcap0,rx_pcap=file_rx.pcap,tx_pcap=file_tx.pcap' \ + -- --port-topology=chained --no-flush-rx + + +Rings-based PMD +~~~~~~~~~~~~~~~ + +To run a DPDK application on a machine without any Ethernet devices, a pair of ring-based rte_ethdevs can be used as below. +The device names passed to the --vdev option must start with net_ring and take no additional parameters. +Multiple devices may be specified, separated by commas. + +.. code-block:: console + + ./testpmd -l 1-3 -n 4 --vdev=net_ring0 --vdev=net_ring1 -- -i + EAL: Detected lcore 1 as core 1 on socket 0 + ... + + Interactive-mode selected + Configuring Port 0 (socket 0) + Configuring Port 1 (socket 0) + Checking link statuses... + Port 0 Link Up - speed 10000 Mbps - full-duplex + Port 1 Link Up - speed 10000 Mbps - full-duplex + Done + + testpmd> start tx_first + io packet forwarding - CRC stripping disabled - packets/burst=16 + nb forwarding cores=1 - nb forwarding ports=2 + RX queues=1 - RX desc=128 - RX free threshold=0 + RX threshold registers: pthresh=8 hthresh=8 wthresh=4 + TX queues=1 - TX desc=512 - TX free threshold=0 + TX threshold registers: pthresh=36 hthresh=0 wthresh=0 + TX RS bit threshold=0 - TXQ flags=0x0 + + testpmd> stop + Telling cores to stop... + Waiting for lcores to finish... + +.. image:: img/forward_stats.* + +.. code-block:: console + + +++++++++++++++ Accumulated forward statistics for allports++++++++++ + RX-packets: 462384736 RX-dropped: 0 RX-total: 462384736 + TX-packets: 462384768 TX-dropped: 0 TX-total: 462384768 + +++++++++++++++++++++++++++++++++++++++++++++++++++++ + + Done. + + +Using the Poll Mode Driver from an Application +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Both drivers can provide similar APIs to allow the user to create a PMD, that is, +rte_ethdev structure, instances at run-time in the end-application, +for example, using rte_eth_from_rings() or rte_eth_from_pcaps() APIs. +For the rings-based PMD, this functionality could be used, for example, +to allow data exchange between cores using rings to be done in exactly the +same way as sending or receiving packets from an Ethernet device. +For the libpcap-based PMD, it allows an application to open one or more pcap files +and use these as a source of packet input to the application. + +Usage Examples +^^^^^^^^^^^^^^ + +To create two pseudo-Ethernet ports where all traffic sent to a port is looped back +for reception on the same port (error handling omitted for clarity): + +.. code-block:: c + + #define RING_SIZE 256 + #define NUM_RINGS 2 + #define SOCKET0 0 + + struct rte_ring *ring[NUM_RINGS]; + int port0, port1; + + ring[0] = rte_ring_create("R0", RING_SIZE, SOCKET0, RING_F_SP_ENQ|RING_F_SC_DEQ); + ring[1] = rte_ring_create("R1", RING_SIZE, SOCKET0, RING_F_SP_ENQ|RING_F_SC_DEQ); + + /* create two ethdev's */ + + port0 = rte_eth_from_rings("net_ring0", ring, NUM_RINGS, ring, NUM_RINGS, SOCKET0); + port1 = rte_eth_from_rings("net_ring1", ring, NUM_RINGS, ring, NUM_RINGS, SOCKET0); + + +To create two pseudo-Ethernet ports where the traffic is switched between them, +that is, traffic sent to port 0 is read back from port 1 and vice-versa, +the final two lines could be changed as below: + +.. code-block:: c + + port0 = rte_eth_from_rings("net_ring0", &ring[0], 1, &ring[1], 1, SOCKET0); + port1 = rte_eth_from_rings("net_ring1", &ring[1], 1, &ring[0], 1, SOCKET0); + +This type of configuration could be useful in a pipeline model, for example, +where one may want to have inter-core communication using pseudo Ethernet devices rather than raw rings, +for reasons of API consistency. + +Enqueuing and dequeuing items from an rte_ring using the rings-based PMD may be slower than using the native rings API. +This is because DPDK Ethernet drivers make use of function pointers to call the appropriate enqueue or dequeue functions, +while the rte_ring specific functions are direct function calls in the code and are often inlined by the compiler. + + Once an ethdev has been created, for either a ring or a pcap-based PMD, + it should be configured and started in the same way as a regular Ethernet device, that is, + by calling rte_eth_dev_configure() to set the number of receive and transmit queues, + then calling rte_eth_rx_queue_setup() / tx_queue_setup() for each of those queues and + finally calling rte_eth_dev_start() to allow transmission and reception of packets to begin. diff --git a/src/seastar/dpdk/doc/guides/nics/qede.rst b/src/seastar/dpdk/doc/guides/nics/qede.rst new file mode 100644 index 00000000..afe2df89 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/qede.rst @@ -0,0 +1,270 @@ +.. BSD LICENSE + Copyright (c) 2016 QLogic Corporation + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of QLogic Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +QEDE Poll Mode Driver +====================== + +The QEDE poll mode driver library (**librte_pmd_qede**) implements support +for **QLogic FastLinQ QL4xxxx 10G/25G/40G/50G/100G CNA** family of adapters as well +as their virtual functions (VF) in SR-IOV context. It is supported on +several standard Linux distros like RHEL7.x, SLES12.x and Ubuntu. +It is compile-tested under FreeBSD OS. + +More information can be found at `QLogic Corporation's Website +<http://www.qlogic.com>`_. + +Supported Features +------------------ + +- Unicast/Multicast filtering +- Promiscuous mode +- Allmulti mode +- Port hardware statistics +- Jumbo frames +- VLAN offload - Filtering and stripping +- Stateless checksum offloads (IPv4/TCP/UDP) +- Multiple Rx/Tx queues +- RSS (with RETA/hash table/key) +- TSS +- Multiple MAC address +- Default pause flow control +- SR-IOV VF +- MTU change +- Multiprocess aware +- Scatter-Gather +- VXLAN tunneling offload +- N-tuple filter and flow director (limited support) +- LRO/TSO + +Non-supported Features +---------------------- + +- SR-IOV PF +- GENEVE and NVGRE Tunneling offloads +- NPAR + +Supported QLogic Adapters +------------------------- + +- QLogic FastLinQ QL4xxxx 10G/25G/40G/50G/100G CNAs. + +Prerequisites +------------- + +- Requires firmware version **8.18.x.** and management firmware + version **8.18.x or higher**. Firmware may be available + inbox in certain newer Linux distros under the standard directory + ``E.g. /lib/firmware/qed/qed_init_values-8.18.9.0.bin`` + +- If the required firmware files are not available then visit + `QLogic Driver Download Center <http://driverdownloads.qlogic.com>`_. + +Performance note +~~~~~~~~~~~~~~~~ + +- For better performance, it is recommended to use 4K or higher RX/TX rings. + +Config File Options +~~~~~~~~~~~~~~~~~~~ + +The following options can be modified in the ``.config`` file. Please note that +enabling debugging options may affect system performance. + +- ``CONFIG_RTE_LIBRTE_QEDE_PMD`` (default **y**) + + Toggle compilation of QEDE PMD driver. + +- ``CONFIG_RTE_LIBRTE_QEDE_DEBUG_INFO`` (default **n**) + + Toggle display of generic debugging messages. + +- ``CONFIG_RTE_LIBRTE_QEDE_DEBUG_DRIVER`` (default **n**) + + Toggle display of ecore related messages. + +- ``CONFIG_RTE_LIBRTE_QEDE_DEBUG_TX`` (default **n**) + + Toggle display of transmit fast path run-time messages. + +- ``CONFIG_RTE_LIBRTE_QEDE_DEBUG_RX`` (default **n**) + + Toggle display of receive fast path run-time messages. + +- ``CONFIG_RTE_LIBRTE_QEDE_FW`` (default **""**) + + Gives absolute path of firmware file. + ``Eg: "/lib/firmware/qed/qed_init_values_zipped-8.18.9.0.bin"`` + Empty string indicates driver will pick up the firmware file + from the default location. + +Driver compilation and testing +------------------------------ + +Refer to the document :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` +for details. + +SR-IOV: Prerequisites and Sample Application Notes +-------------------------------------------------- + +This section provides instructions to configure SR-IOV with Linux OS. + +**Note**: librte_pmd_qede will be used to bind to SR-IOV VF device and Linux native kernel driver (QEDE) will function as SR-IOV PF driver. Requires PF driver to be 8.10.x.x or higher. + +#. Verify SR-IOV and ARI capability is enabled on the adapter using ``lspci``: + + .. code-block:: console + + lspci -s <slot> -vvv + + Example output: + + .. code-block:: console + + [...] + Capabilities: [1b8 v1] Alternative Routing-ID Interpretation (ARI) + [...] + Capabilities: [1c0 v1] Single Root I/O Virtualization (SR-IOV) + [...] + Kernel driver in use: igb_uio + +#. Load the kernel module: + + .. code-block:: console + + modprobe qede + + Example output: + + .. code-block:: console + + systemd-udevd[4848]: renamed network interface eth0 to ens5f0 + systemd-udevd[4848]: renamed network interface eth1 to ens5f1 + +#. Bring up the PF ports: + + .. code-block:: console + + ifconfig ens5f0 up + ifconfig ens5f1 up + +#. Create VF device(s): + + Echo the number of VFs to be created into ``"sriov_numvfs"`` sysfs entry + of the parent PF. + + Example output: + + .. code-block:: console + + echo 2 > /sys/devices/pci0000:00/0000:00:03.0/0000:81:00.0/sriov_numvfs + + +#. Assign VF MAC address: + + Assign MAC address to the VF using iproute2 utility. The syntax is:: + + ip link set <PF iface> vf <VF id> mac <macaddr> + + Example output: + + .. code-block:: console + + ip link set ens5f0 vf 0 mac 52:54:00:2f:9d:e8 + + +#. PCI Passthrough: + + The VF devices may be passed through to the guest VM using ``virt-manager`` or + ``virsh``. QEDE PMD should be used to bind the VF devices in the guest VM + using the instructions from Driver compilation and testing section above. + + +#. Running testpmd + (Enable QEDE_DEBUG_INFO=y to view informational messages): + + Refer to the document + :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` to run + ``testpmd`` application. + + Example output: + + .. code-block:: console + + testpmd -l 0,4-11 -n 4 -- -i --nb-cores=8 --portmask=0xf --rxd=4096 \ + --txd=4096 --txfreet=4068 --enable-rx-cksum --rxq=4 --txq=4 \ + --rss-ip --rss-udp + + [...] + + EAL: PCI device 0000:84:00.0 on NUMA socket 1 + EAL: probe driver: 1077:1634 rte_qede_pmd + EAL: Not managed by a supported kernel driver, skipped + EAL: PCI device 0000:84:00.1 on NUMA socket 1 + EAL: probe driver: 1077:1634 rte_qede_pmd + EAL: Not managed by a supported kernel driver, skipped + EAL: PCI device 0000:88:00.0 on NUMA socket 1 + EAL: probe driver: 1077:1656 rte_qede_pmd + EAL: PCI memory mapped at 0x7f738b200000 + EAL: PCI memory mapped at 0x7f738b280000 + EAL: PCI memory mapped at 0x7f738b300000 + PMD: Chip details : BB1 + PMD: Driver version : QEDE PMD 8.7.9.0_1.0.0 + PMD: Firmware version : 8.7.7.0 + PMD: Management firmware version : 8.7.8.0 + PMD: Firmware file : /lib/firmware/qed/qed_init_values_zipped-8.7.7.0.bin + [QEDE PMD: (84:00.0:dpdk-port-0)]qede_common_dev_init:macaddr \ + 00:0e:1e:d2:09:9c + [...] + [QEDE PMD: (84:00.0:dpdk-port-0)]qede_tx_queue_setup:txq 0 num_desc 4096 \ + tx_free_thresh 4068 socket 0 + [QEDE PMD: (84:00.0:dpdk-port-0)]qede_tx_queue_setup:txq 1 num_desc 4096 \ + tx_free_thresh 4068 socket 0 + [QEDE PMD: (84:00.0:dpdk-port-0)]qede_tx_queue_setup:txq 2 num_desc 4096 \ + tx_free_thresh 4068 socket 0 + [QEDE PMD: (84:00.0:dpdk-port-0)]qede_tx_queue_setup:txq 3 num_desc 4096 \ + tx_free_thresh 4068 socket 0 + [QEDE PMD: (84:00.0:dpdk-port-0)]qede_rx_queue_setup:rxq 0 num_desc 4096 \ + rx_buf_size=2148 socket 0 + [QEDE PMD: (84:00.0:dpdk-port-0)]qede_rx_queue_setup:rxq 1 num_desc 4096 \ + rx_buf_size=2148 socket 0 + [QEDE PMD: (84:00.0:dpdk-port-0)]qede_rx_queue_setup:rxq 2 num_desc 4096 \ + rx_buf_size=2148 socket 0 + [QEDE PMD: (84:00.0:dpdk-port-0)]qede_rx_queue_setup:rxq 3 num_desc 4096 \ + rx_buf_size=2148 socket 0 + [QEDE PMD: (84:00.0:dpdk-port-0)]qede_dev_start:port 0 + [QEDE PMD: (84:00.0:dpdk-port-0)]qede_dev_start:link status: down + [...] + Checking link statuses... + Port 0 Link Up - speed 25000 Mbps - full-duplex + Port 1 Link Up - speed 25000 Mbps - full-duplex + Port 2 Link Up - speed 25000 Mbps - full-duplex + Port 3 Link Up - speed 25000 Mbps - full-duplex + Done + testpmd> diff --git a/src/seastar/dpdk/doc/guides/nics/sfc_efx.rst b/src/seastar/dpdk/doc/guides/nics/sfc_efx.rst new file mode 100644 index 00000000..5f825e9a --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/sfc_efx.rst @@ -0,0 +1,277 @@ +.. BSD LICENSE + Copyright (c) 2016 Solarflare Communications Inc. + All rights reserved. + + This software was jointly developed between OKTET Labs (under contract + for Solarflare) and Solarflare Communications, Inc. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are met: + + 1. Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + 2. Redistributions in binary form must reproduce the above copyright notice, + this list of conditions and the following disclaimer in the documentation + and/or other materials provided with the distribution. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR + CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, + EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, + PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; + OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, + WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR + OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, + EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Solarflare libefx-based Poll Mode Driver +======================================== + +The SFC EFX PMD (**librte_pmd_sfc_efx**) provides poll mode driver support +for **Solarflare SFN7xxx and SFN8xxx** family of 10/40 Gbps adapters. +SFC EFX PMD has support for the latest Linux and FreeBSD operating systems. + +More information can be found at `Solarflare Communications website +<http://solarflare.com>`_. + + +Features +-------- + +SFC EFX PMD has support for: + +- Multiple transmit and receive queues + +- Link state information including link status change interrupt + +- IPv4/IPv6 TCP/UDP transmit checksum offload + +- Port hardware statistics + +- Extended statistics (see Solarflare Server Adapter User's Guide for + the statistics description) + +- Basic flow control + +- MTU update + +- Jumbo frames up to 9K + +- Promiscuous mode + +- Allmulticast mode + +- TCP segmentation offload (TSO) + +- Multicast MAC filter + +- IPv4/IPv6 TCP/UDP receive checksum offload + +- Received packet type information + +- Receive side scaling (RSS) + +- RSS hash + +- Scattered Rx DMA for packet that are larger that a single Rx descriptor + +- Deferred receive and transmit queue start + +- Transmit VLAN insertion (if running firmware variant supports it) + +- Flow API + + +Non-supported Features +---------------------- + +The features not yet supported include: + +- Receive queue interupts + +- Priority-based flow control + +- Loopback + +- Configurable RX CRC stripping (always stripped) + +- Header split on receive + +- VLAN filtering + +- VLAN stripping + +- LRO + + +Limitations +----------- + +Due to requirements on receive buffer alignment and usage of the receive +buffer for the auxiliary packet information provided by the NIC up to +extra 269 (14 bytes prefix plus up to 255 bytes for end padding) bytes may be +required in the receive buffer. +It should be taken into account when mbuf pool for receive is created. + + +Flow API support +---------------- + +Supported attributes: + +- Ingress + +Supported pattern items: + +- VOID + +- ETH (exact match of source/destination addresses, individual/group match + of destination address, EtherType) + +- VLAN (exact match of VID, double-tagging is supported) + +- IPV4 (exact match of source/destination addresses, + IP transport protocol) + +- IPV6 (exact match of source/destination addresses, + IP transport protocol) + +- TCP (exact match of source/destination ports) + +- UDP (exact match of source/destination ports) + +Supported actions: + +- VOID + +- QUEUE + +Validating flow rules depends on the firmware variant. + +Ethernet destinaton individual/group match +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Ethernet item supports I/G matching, if only the corresponding bit is set +in the mask of destination address. If destinaton address in the spec is +multicast, it matches all multicast (and broadcast) packets, oherwise it +matches unicast packets that are not filtered by other flow rules. + + +Supported NICs +-------------- + +- Solarflare Flareon [Ultra] Server Adapters: + + - Solarflare SFN8522 Dual Port SFP+ Server Adapter + + - Solarflare SFN8542 Dual Port QSFP+ Server Adapter + + - Solarflare SFN7002F Dual Port SFP+ Server Adapter + + - Solarflare SFN7004F Quad Port SFP+ Server Adapter + + - Solarflare SFN7042Q Dual Port QSFP+ Server Adapter + + - Solarflare SFN7122F Dual Port SFP+ Server Adapter + + - Solarflare SFN7124F Quad Port SFP+ Server Adapter + + - Solarflare SFN7142Q Dual Port QSFP+ Server Adapter + + - Solarflare SFN7322F Precision Time Synchronization Server Adapter + + +Prerequisites +------------- + +- Requires firmware version: + + - SFN7xxx: **4.7.1.1001** or higher + + - SFN8xxx: **6.0.2.1004** or higher + +Visit `Solarflare Support Downloads <https://support.solarflare.com>`_ to get +Solarflare Utilities (either Linux or FreeBSD) with the latest firmware. +Follow instructions from Solarflare Server Adapter User's Guide to +update firmware and configure the adapter. + + +Pre-Installation Configuration +------------------------------ + + +Config File Options +~~~~~~~~~~~~~~~~~~~ + +The following options can be modified in the ``.config`` file. +Please note that enabling debugging options may affect system performance. + +- ``CONFIG_RTE_LIBRTE_SFC_EFX_PMD`` (default **y**) + + Enable compilation of Solarflare libefx-based poll-mode driver. + +- ``CONFIG_RTE_LIBRTE_SFC_EFX_DEBUG`` (default **n**) + + Enable compilation of the extra run-time consistency checks. + + +Per-Device Parameters +~~~~~~~~~~~~~~~~~~~~~ + +The following per-device parameters can be passed via EAL PCI device +whitelist option like "-w 02:00.0,arg1=value1,...". + +Case-insensitive 1/y/yes/on or 0/n/no/off may be used to specify +boolean parameters value. + +- ``rx_datapath`` [auto|efx|ef10] (default **auto**) + + Choose receive datapath implementation. + **auto** allows the driver itself to make a choice based on firmware + features available and required by the datapath implementation. + **efx** chooses libefx-based datapath which supports Rx scatter. + **ef10** chooses EF10 (SFN7xxx, SFN8xxx) native datapath which is + more efficient than libefx-based and provides richer packet type + classification, but lacks Rx scatter support. + +- ``tx_datapath`` [auto|efx|ef10|ef10_simple] (default **auto**) + + Choose transmit datapath implementation. + **auto** allows the driver itself to make a choice based on firmware + features available and required by the datapath implementation. + **efx** chooses libefx-based datapath which supports VLAN insertion + (full-feature firmware variant only), TSO and multi-segment mbufs. + **ef10** chooses EF10 (SFN7xxx, SFN8xxx) native datapath which is + more efficient than libefx-based but has no VLAN insertion and TSO + support yet. + **ef10_simple** chooses EF10 (SFN7xxx, SFN8xxx) native datapath which + is even more faster then **ef10** but does not support multi-segment + mbufs. + +- ``perf_profile`` [auto|throughput|low-latency] (default **throughput**) + + Choose hardware tunning to be optimized for either throughput or + low-latency. + **auto** allows NIC firmware to make a choice based on + installed licences and firmware variant configured using **sfboot**. + +- ``debug_init`` [bool] (default **n**) + + Enable extra logging during device intialization and startup. + +- ``mcdi_logging`` [bool] (default **n**) + + Enable extra logging of the communication with the NIC's management CPU. + The logging is done using RTE_LOG() with INFO level and PMD type. + The format is consumed by the Solarflare netlogdecode cross-platform tool. + +- ``stats_update_period_ms`` [long] (default **1000**) + + Adjust period in milliseconds to update port hardware statistics. + The accepted range is 0 to 65535. The value of **0** may be used + to disable periodic statistics update. One should note that it's + only possible to set an arbitrary value on SFN8xxx provided that + firmware version is 6.2.1.1033 or higher, otherwise any positive + value will select a fixed update period of **1000** milliseconds diff --git a/src/seastar/dpdk/doc/guides/nics/szedata2.rst b/src/seastar/dpdk/doc/guides/nics/szedata2.rst new file mode 100644 index 00000000..60080a9f --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/szedata2.rst @@ -0,0 +1,150 @@ +.. BSD LICENSE + Copyright 2015 - 2016 CESNET + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of CESNET nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +SZEDATA2 poll mode driver library +================================= + +The SZEDATA2 poll mode driver library implements support for the Netcope +FPGA Boards (**NFB-***), FPGA-based programmable NICs. +The SZEDATA2 PMD uses interface provided by the libsze2 library to communicate +with the NFB cards over the sze2 layer. + +More information about the +`NFB cards <http://www.netcope.com/en/products/fpga-boards>`_ +and used technology +(`Netcope Development Kit <http://www.netcope.com/en/products/fpga-development-kit>`_) +can be found on the `Netcope Technologies website <http://www.netcope.com/>`_. + +.. note:: + + This driver has external dependencies. + Therefore it is disabled in default configuration files. + It can be enabled by setting ``CONFIG_RTE_LIBRTE_PMD_SZEDATA2=y`` + and recompiling. + +.. note:: + + Currently the driver is supported only on x86_64 architectures. + Only x86_64 versions of the external libraries are provided. + +Prerequisites +------------- + +This PMD requires kernel modules which are responsible for initialization and +allocation of resources needed for sze2 layer function. +Communication between PMD and kernel modules is mediated by libsze2 library. +These kernel modules and library are not part of DPDK and must be installed +separately: + +* **libsze2 library** + + The library provides API for initialization of sze2 transfers, receiving and + transmitting data segments. + +* **Kernel modules** + + * combov3 + * szedata2_cv3 + + Kernel modules manage initialization of hardware, allocation and + sharing of resources for user space applications. + +Information about getting the dependencies can be found `here +<http://www.netcope.com/en/company/community-support/dpdk-libsze2>`_. + +Configuration +------------- + +These configuration options can be modified before compilation in the +``.config`` file: + +* ``CONFIG_RTE_LIBRTE_PMD_SZEDATA2`` default value: **n** + + Value **y** enables compilation of szedata2 PMD. + +* ``CONFIG_RTE_LIBRTE_PMD_SZEDATA2_AS`` default value: **0** + + This option defines type of firmware address space. + Currently supported value is: + + * **0** for firmwares: + + * NIC_100G1_LR4 + * HANIC_100G1_LR4 + * HANIC_100G1_SR10 + +Using the SZEDATA2 PMD +---------------------- + +From DPDK version 16.04 the type of SZEDATA2 PMD is changed to PMD_PDEV. +SZEDATA2 device is automatically recognized during EAL initialization. +No special command line options are needed. + +Kernel modules have to be loaded before running the DPDK application. + +Example of usage +---------------- + +Read packets from 0. and 1. receive channel and write them to 0. and 1. +transmit channel: + +.. code-block:: console + + $RTE_TARGET/app/testpmd -l 0-3 -n 2 \ + -- --port-topology=chained --rxq=2 --txq=2 --nb-cores=2 -i -a + +Example output: + +.. code-block:: console + + [...] + EAL: PCI device 0000:06:00.0 on NUMA socket -1 + EAL: probe driver: 1b26:c1c1 rte_szedata2_pmd + PMD: Initializing szedata2 device (0000:06:00.0) + PMD: SZEDATA2 path: /dev/szedataII0 + PMD: Available DMA channels RX: 8 TX: 8 + PMD: resource0 phys_addr = 0xe8000000 len = 134217728 virt addr = 7f48f8000000 + PMD: szedata2 device (0000:06:00.0) successfully initialized + Interactive-mode selected + Auto-start selected + Configuring Port 0 (socket 0) + Port 0: 00:11:17:00:00:00 + Checking link statuses... + Port 0 Link Up - speed 10000 Mbps - full-duplex + Done + Start automatic packet forwarding + io packet forwarding - CRC stripping disabled - packets/burst=32 + nb forwarding cores=2 - nb forwarding ports=1 + RX queues=2 - RX desc=128 - RX free threshold=0 + RX threshold registers: pthresh=0 hthresh=0 wthresh=0 + TX queues=2 - TX desc=512 - TX free threshold=0 + TX threshold registers: pthresh=0 hthresh=0 wthresh=0 + TX RS bit threshold=0 - TXQ flags=0x0 + testpmd> diff --git a/src/seastar/dpdk/doc/guides/nics/tap.rst b/src/seastar/dpdk/doc/guides/nics/tap.rst new file mode 100644 index 00000000..5c5ba535 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/tap.rst @@ -0,0 +1,197 @@ +.. BSD LICENSE + Copyright(c) 2016 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Tun/Tap Poll Mode Driver +======================== + +The ``rte_eth_tap.c`` PMD creates a device using TUN/TAP interfaces on the +local host. The PMD allows for DPDK and the host to communicate using a raw +device interface on the host and in the DPDK application. + +The device created is a TAP device, which sends/receives packet in a raw +format with a L2 header. The usage for a TAP PMD is for connectivity to the +local host using a TAP interface. When the TAP PMD is initialized it will +create a number of tap devices in the host accessed via ``ifconfig -a`` or +``ip`` command. The commands can be used to assign and query the virtual like +device. + +These TAP interfaces can be used with Wireshark or tcpdump or Pktgen-DPDK +along with being able to be used as a network connection to the DPDK +application. The method enable one or more interfaces is to use the +``--vdev=net_tap0`` option on the DPDK application command line. Each +``--vdev=net_tap1`` option give will create an interface named dtap0, dtap1, +and so on. + +The interface name can be changed by adding the ``iface=foo0``, for example:: + + --vdev=net_tap0,iface=foo0 --vdev=net_tap1,iface=foo1, ... + +Also the speed of the interface can be changed from 10G to whatever number +needed, but the interface does not enforce that speed, for example:: + + --vdev=net_tap0,iface=foo0,speed=25000 + +It is possible to specify a remote netdevice to capture packets from by adding +``remote=foo1``, for example:: + + --vdev=net_tap,iface=tap0,remote=foo1 + +If a ``remote`` is set, the tap MAC address will be set to match the remote one +just after netdevice creation. Using TC rules, traffic from the remote netdevice +will be redirected to the tap. If the tap is in promiscuous mode, then all +packets will be redirected. In allmulti mode, all multicast packets will be +redirected. + +Using the remote feature is especially useful for capturing traffic from a +netdevice that has no support in the DPDK. It is possible to add explicit +rte_flow rules on the tap PMD to capture specific traffic (see next section for +examples). + +After the DPDK application is started you can send and receive packets on the +interface using the standard rx_burst/tx_burst APIs in DPDK. From the host +point of view you can use any host tool like tcpdump, Wireshark, ping, Pktgen +and others to communicate with the DPDK application. The DPDK application may +not understand network protocols like IPv4/6, UDP or TCP unless the +application has been written to understand these protocols. + +If you need the interface as a real network interface meaning running and has +a valid IP address then you can do this with the following commands:: + + sudo ip link set dtap0 up; sudo ip addr add 192.168.0.250/24 dev dtap0 + sudo ip link set dtap1 up; sudo ip addr add 192.168.1.250/24 dev dtap1 + +Please change the IP addresses as you see fit. + +If routing is enabled on the host you can also communicate with the DPDK App +over the internet via a standard socket layer application as long as you +account for the protocol handing in the application. + +If you have a Network Stack in your DPDK application or something like it you +can utilize that stack to handle the network protocols. Plus you would be able +to address the interface using an IP address assigned to the internal +interface. + +Flow API support +---------------- + +The tap PMD supports major flow API pattern items and actions, when running on +linux kernels above 4.2 ("Flower" classifier required). Supported items: + +- eth: src and dst (with variable masks), and eth_type (0xffff mask). +- vlan: vid, pcp, tpid, but not eid. (requires kernel 4.9) +- ipv4/6: src and dst (with variable masks), and ip_proto (0xffff mask). +- udp/tcp: src and dst port (0xffff) mask. + +Supported actions: + +- DROP +- QUEUE +- PASSTHRU + +It is generally not possible to provide a "last" item. However, if the "last" +item, once masked, is identical to the masked spec, then it is supported. + +Only IPv4/6 and MAC addresses can use a variable mask. All other items need a +full mask (exact match). + +As rules are translated to TC, it is possible to show them with something like:: + + tc -s filter show dev tap1 parent 1: + +Examples of testpmd flow rules +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Drop packets for destination IP 192.168.0.1:: + + testpmd> flow create 0 priority 1 ingress pattern eth / ipv4 dst is 1.1.1.1 \ + / end actions drop / end + +Ensure packets from a given MAC address are received on a queue 2:: + + testpmd> flow create 0 priority 2 ingress pattern eth src is 06:05:04:03:02:01 \ + / end actions queue index 2 / end + +Drop UDP packets in vlan 3:: + + testpmd> flow create 0 priority 3 ingress pattern eth / vlan vid is 3 / \ + ipv4 proto is 17 / end actions drop / end + +Example +------- + +The following is a simple example of using the TUN/TAP PMD with the Pktgen +packet generator. It requires that the ``socat`` utility is installed on the +test system. + +Build DPDK, then pull down Pktgen and build pktgen using the DPDK SDK/Target +used to build the dpdk you pulled down. + +Run pktgen from the pktgen directory in a terminal with a commandline like the +following:: + + sudo ./app/app/x86_64-native-linuxapp-gcc/app/pktgen -l 1-5 -n 4 \ + --proc-type auto --log-level 8 --socket-mem 512,512 --file-prefix pg \ + --vdev=net_tap0 --vdev=net_tap1 -b 05:00.0 -b 05:00.1 \ + -b 04:00.0 -b 04:00.1 -b 04:00.2 -b 04:00.3 \ + -b 81:00.0 -b 81:00.1 -b 81:00.2 -b 81:00.3 \ + -b 82:00.0 -b 83:00.0 -- -T -P -m [2:3].0 -m [4:5].1 \ + -f themes/black-yellow.theme + +.. Note: + + Change the ``-b`` options to blacklist all of your physical ports. The + following command line is all one line. + + Also, ``-f themes/black-yellow.theme`` is optional if the default colors + work on your system configuration. See the Pktgen docs for more + information. + +Verify with ``ifconfig -a`` command in a different xterm window, should have a +``dtap0`` and ``dtap1`` interfaces created. + +Next set the links for the two interfaces to up via the commands below:: + + sudo ip link set dtap0 up; sudo ip addr add 192.168.0.250/24 dev dtap0 + sudo ip link set dtap1 up; sudo ip addr add 192.168.1.250/24 dev dtap1 + +Then use socat to create a loopback for the two interfaces:: + + sudo socat interface:dtap0 interface:dtap1 + +Then on the Pktgen command line interface you can start sending packets using +the commands ``start 0`` and ``start 1`` or you can start both at the same +time with ``start all``. The command ``str`` is an alias for ``start all`` and +``stp`` is an alias for ``stop all``. + +While running you should see the 64 byte counters increasing to verify the +traffic is being looped back. You can use ``set all size XXX`` to change the +size of the packets after you stop the traffic. Use pktgen ``help`` +command to see a list of all commands. You can also use the ``-f`` option to +load commands at startup in command line or Lua script in pktgen. diff --git a/src/seastar/dpdk/doc/guides/nics/thunderx.rst b/src/seastar/dpdk/doc/guides/nics/thunderx.rst new file mode 100644 index 00000000..4fa0039d --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/thunderx.rst @@ -0,0 +1,377 @@ +.. BSD LICENSE + Copyright (C) Cavium networks Ltd. 2016. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Cavium networks nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +ThunderX NICVF Poll Mode Driver +=============================== + +The ThunderX NICVF PMD (**librte_pmd_thunderx_nicvf**) provides poll mode driver +support for the inbuilt NIC found in the **Cavium ThunderX** SoC family +as well as their virtual functions (VF) in SR-IOV context. + +More information can be found at `Cavium Networks Official Website +<http://www.cavium.com/ThunderX_ARM_Processors.html>`_. + +Features +-------- + +Features of the ThunderX PMD are: + +- Multiple queues for TX and RX +- Receive Side Scaling (RSS) +- Packet type information +- Checksum offload +- Promiscuous mode +- Multicast mode +- Port hardware statistics +- Jumbo frames +- Link state information +- Scattered and gather for TX and RX +- VLAN stripping +- SR-IOV VF +- NUMA support +- Multi queue set support (up to 96 queues (12 queue sets)) per port + +Supported ThunderX SoCs +----------------------- +- CN88xx +- CN81xx +- CN83xx + +Prerequisites +------------- +- Follow the DPDK :ref:`Getting Started Guide for Linux <linux_gsg>` to setup the basic DPDK environment. + +Pre-Installation Configuration +------------------------------ + +Config File Options +~~~~~~~~~~~~~~~~~~~ + +The following options can be modified in the ``config`` file. +Please note that enabling debugging options may affect system performance. + +- ``CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD`` (default ``y``) + + Toggle compilation of the ``librte_pmd_thunderx_nicvf`` driver. + +- ``CONFIG_RTE_LIBRTE_THUNDERX_NICVF_DEBUG_INIT`` (default ``n``) + + Toggle display of initialization related messages. + +- ``CONFIG_RTE_LIBRTE_THUNDERX_NICVF_DEBUG_RX`` (default ``n``) + + Toggle display of receive fast path run-time message + +- ``CONFIG_RTE_LIBRTE_THUNDERX_NICVF_DEBUG_TX`` (default ``n``) + + Toggle display of transmit fast path run-time message + +- ``CONFIG_RTE_LIBRTE_THUNDERX_NICVF_DEBUG_DRIVER`` (default ``n``) + + Toggle display of generic debugging messages + +- ``CONFIG_RTE_LIBRTE_THUNDERX_NICVF_DEBUG_MBOX`` (default ``n``) + + Toggle display of PF mailbox related run-time check messages + +Driver compilation and testing +------------------------------ + +Refer to the document :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` +for details. + +To compile the ThunderX NICVF PMD for Linux arm64 gcc, +use arm64-thunderx-linuxapp-gcc as target. + +Linux +----- + +SR-IOV: Prerequisites and sample Application Notes +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Current ThunderX NIC PF/VF kernel modules maps each physical Ethernet port +automatically to virtual function (VF) and presented them as PCIe-like SR-IOV device. +This section provides instructions to configure SR-IOV with Linux OS. + +#. Verify PF devices capabilities using ``lspci``: + + .. code-block:: console + + lspci -vvv + + Example output: + + .. code-block:: console + + 0002:01:00.0 Ethernet controller: Cavium Networks Device a01e (rev 01) + ... + Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI) + ... + Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV) + ... + Kernel driver in use: thunder-nic + ... + + .. note:: + + Unless ``thunder-nic`` driver is in use make sure your kernel config includes ``CONFIG_THUNDER_NIC_PF`` setting. + +#. Verify VF devices capabilities and drivers using ``lspci``: + + .. code-block:: console + + lspci -vvv + + Example output: + + .. code-block:: console + + 0002:01:00.1 Ethernet controller: Cavium Networks Device 0011 (rev 01) + ... + Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI) + ... + Kernel driver in use: thunder-nicvf + ... + + 0002:01:00.2 Ethernet controller: Cavium Networks Device 0011 (rev 01) + ... + Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI) + ... + Kernel driver in use: thunder-nicvf + ... + + .. note:: + + Unless ``thunder-nicvf`` driver is in use make sure your kernel config includes ``CONFIG_THUNDER_NIC_VF`` setting. + +#. Pass VF device to VM context (PCIe Passthrough): + + The VF devices may be passed through to the guest VM using qemu or + virt-manager or virsh etc. + + Example qemu guest launch command: + + .. code-block:: console + + sudo qemu-system-aarch64 -name vm1 \ + -machine virt,gic_version=3,accel=kvm,usb=off \ + -cpu host -m 4096 \ + -smp 4,sockets=1,cores=8,threads=1 \ + -nographic -nodefaults \ + -kernel <kernel image> \ + -append "root=/dev/vda console=ttyAMA0 rw hugepagesz=512M hugepages=3" \ + -device vfio-pci,host=0002:01:00.1 \ + -drive file=<rootfs.ext3>,if=none,id=disk1,format=raw \ + -device virtio-blk-device,scsi=off,drive=disk1,id=virtio-disk1,bootindex=1 \ + -netdev tap,id=net0,ifname=tap0,script=/etc/qemu-ifup_thunder \ + -device virtio-net-device,netdev=net0 \ + -serial stdio \ + -mem-path /dev/huge + +#. Enable **VFIO-NOIOMMU** mode (optional): + + .. code-block:: console + + echo 1 > /sys/module/vfio/parameters/enable_unsafe_noiommu_mode + + .. note:: + + **VFIO-NOIOMMU** is required only when running in VM context and should not be enabled otherwise. + +#. Running testpmd: + + Follow instructions available in the document + :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` + to run testpmd. + + Example output: + + .. code-block:: console + + ./arm64-thunderx-linuxapp-gcc/app/testpmd -l 0-3 -n 4 -w 0002:01:00.2 \ + -- -i --disable-hw-vlan-filter --disable-crc-strip --no-flush-rx \ + --port-topology=loop + + ... + + PMD: rte_nicvf_pmd_init(): librte_pmd_thunderx nicvf version 1.0 + + ... + EAL: probe driver: 177d:11 rte_nicvf_pmd + EAL: using IOMMU type 1 (Type 1) + EAL: PCI memory mapped at 0x3ffade50000 + EAL: Trying to map BAR 4 that contains the MSI-X table. + Trying offsets: 0x40000000000:0x0000, 0x10000:0x1f0000 + EAL: PCI memory mapped at 0x3ffadc60000 + PMD: nicvf_eth_dev_init(): nicvf: device (177d:11) 2:1:0:2 + PMD: nicvf_eth_dev_init(): node=0 vf=1 mode=tns-bypass sqs=false + loopback_supported=true + PMD: nicvf_eth_dev_init(): Port 0 (177d:11) mac=a6:c6:d9:17:78:01 + Interactive-mode selected + Configuring Port 0 (socket 0) + ... + + PMD: nicvf_dev_configure(): Configured ethdev port0 hwcap=0x0 + Port 0: A6:C6:D9:17:78:01 + Checking link statuses... + Port 0 Link Up - speed 10000 Mbps - full-duplex + Done + testpmd> + +Multiple Queue Set per DPDK port configuration +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +There are two types of VFs: + +- Primary VF +- Secondary VF + +Each port consists of a primary VF and n secondary VF(s). Each VF provides 8 Tx/Rx queues to a port. +When a given port is configured to use more than 8 queues, it requires one (or more) secondary VF. +Each secondary VF adds 8 additional queues to the queue set. + +During PMD driver initialization, the primary VF's are enumerated by checking the +specific flag (see sqs message in DPDK boot log - sqs indicates secondary queue set). +They are at the beginning of VF list (the remain ones are secondary VF's). + +The primary VFs are used as master queue sets. Secondary VFs provide +additional queue sets for primary ones. If a port is configured for more then +8 queues than it will request for additional queues from secondary VFs. + +Secondary VFs cannot be shared between primary VFs. + +Primary VFs are present on the beginning of the 'Network devices using kernel +driver' list, secondary VFs are on the remaining on the remaining part of the list. + + .. note:: + + The VNIC driver in the multiqueue setup works differently than other drivers like `ixgbe`. + We need to bind separately each specific queue set device with the ``usertools/dpdk-devbind.py`` utility. + + .. note:: + + Depending on the hardware used, the kernel driver sets a threshold ``vf_id``. VFs that try to attached with an id below or equal to + this boundary are considered primary VFs. VFs that try to attach with an id above this boundary are considered secondary VFs. + + +Example device binding +~~~~~~~~~~~~~~~~~~~~~~ + +If a system has three interfaces, a total of 18 VF devices will be created +on a non-NUMA machine. + + .. note:: + + NUMA systems have 12 VFs per port and non-NUMA 6 VFs per port. + + .. code-block:: console + + # usertools/dpdk-devbind.py --status + + Network devices using DPDK-compatible driver + ============================================ + <none> + + Network devices using kernel driver + =================================== + 0000:01:10.0 'Device a026' if= drv=thunder-BGX unused=vfio-pci,uio_pci_generic + 0000:01:10.1 'Device a026' if= drv=thunder-BGX unused=vfio-pci,uio_pci_generic + 0002:01:00.0 'Device a01e' if= drv=thunder-nic unused=vfio-pci,uio_pci_generic + 0002:01:00.1 'Device 0011' if=eth0 drv=thunder-nicvf unused=vfio-pci,uio_pci_generic + 0002:01:00.2 'Device 0011' if=eth1 drv=thunder-nicvf unused=vfio-pci,uio_pci_generic + 0002:01:00.3 'Device 0011' if=eth2 drv=thunder-nicvf unused=vfio-pci,uio_pci_generic + 0002:01:00.4 'Device 0011' if= drv=thunder-nicvf unused=vfio-pci,uio_pci_generic + 0002:01:00.5 'Device 0011' if= drv=thunder-nicvf unused=vfio-pci,uio_pci_generic + 0002:01:00.6 'Device 0011' if= drv=thunder-nicvf unused=vfio-pci,uio_pci_generic + 0002:01:00.7 'Device 0011' if= drv=thunder-nicvf unused=vfio-pci,uio_pci_generic + 0002:01:01.0 'Device 0011' if= drv=thunder-nicvf unused=vfio-pci,uio_pci_generic + 0002:01:01.1 'Device 0011' if= drv=thunder-nicvf unused=vfio-pci,uio_pci_generic + 0002:01:01.2 'Device 0011' if= drv=thunder-nicvf unused=vfio-pci,uio_pci_generic + 0002:01:01.3 'Device 0011' if= drv=thunder-nicvf unused=vfio-pci,uio_pci_generic + 0002:01:01.4 'Device 0011' if= drv=thunder-nicvf unused=vfio-pci,uio_pci_generic + 0002:01:01.5 'Device 0011' if= drv=thunder-nicvf unused=vfio-pci,uio_pci_generic + 0002:01:01.6 'Device 0011' if= drv=thunder-nicvf unused=vfio-pci,uio_pci_generic + 0002:01:01.7 'Device 0011' if= drv=thunder-nicvf unused=vfio-pci,uio_pci_generic + 0002:01:02.0 'Device 0011' if= drv=thunder-nicvf unused=vfio-pci,uio_pci_generic + 0002:01:02.1 'Device 0011' if= drv=thunder-nicvf unused=vfio-pci,uio_pci_generic + 0002:01:02.2 'Device 0011' if= drv=thunder-nicvf unused=vfio-pci,uio_pci_generic + + Other network devices + ===================== + 0002:00:03.0 'Device a01f' unused=vfio-pci,uio_pci_generic + + +We want to bind two physical interfaces with 24 queues each device, we attach two primary VFs +and four secondary queues. In our example we choose two 10G interfaces eth1 (0002:01:00.2) and eth2 (0002:01:00.3). +We will choose four secondary queue sets from the ending of the list (0002:01:01.7-0002:01:02.2). + + +#. Bind two primary VFs to the ``vfio-pci`` driver: + + .. code-block:: console + + usertools/dpdk-devbind.py -b vfio-pci 0002:01:00.2 + usertools/dpdk-devbind.py -b vfio-pci 0002:01:00.3 + +#. Bind four primary VFs to the ``vfio-pci`` driver: + + .. code-block:: console + + usertools/dpdk-devbind.py -b vfio-pci 0002:01:01.7 + usertools/dpdk-devbind.py -b vfio-pci 0002:01:02.0 + usertools/dpdk-devbind.py -b vfio-pci 0002:01:02.1 + usertools/dpdk-devbind.py -b vfio-pci 0002:01:02.2 + +The nicvf thunderx driver will make use of attached secondary VFs automatically during the interface configuration stage. + +Limitations +----------- + +CRC striping +~~~~~~~~~~~~ + +The ThunderX SoC family NICs strip the CRC for every packets coming into the +host interface. So, CRC will be stripped even when the +``rxmode.hw_strip_crc`` member is set to 0 in ``struct rte_eth_conf``. + +Maximum packet length +~~~~~~~~~~~~~~~~~~~~~ + +The ThunderX SoC family NICs support a maximum of a 9K jumbo frame. The value +is fixed and cannot be changed. So, even when the ``rxmode.max_rx_pkt_len`` +member of ``struct rte_eth_conf`` is set to a value lower than 9200, frames +up to 9200 bytes can still reach the host interface. + +Maximum packet segments +~~~~~~~~~~~~~~~~~~~~~~~ + +The ThunderX SoC family NICs support up to 12 segments per packet when working +in scatter/gather mode. So, setting MTU will result with ``EINVAL`` when the +frame size does not fit in the maximum number of segments. diff --git a/src/seastar/dpdk/doc/guides/nics/vhost.rst b/src/seastar/dpdk/doc/guides/nics/vhost.rst new file mode 100644 index 00000000..e651a166 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/vhost.rst @@ -0,0 +1,110 @@ +.. BSD LICENSE + Copyright(c) 2016 IGEL Co., Ltd.. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of IGEL Co., Ltd. nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Poll Mode Driver that wraps vhost library +========================================= + +This PMD is a thin wrapper of the DPDK vhost library. +The user can handle virtqueues as one of normal DPDK port. + +Vhost Implementation in DPDK +---------------------------- + +Please refer to Chapter "Vhost Library" of *DPDK Programmer's Guide* to know detail of vhost. + +Features and Limitations of vhost PMD +------------------------------------- + +Currently, the vhost PMD provides the basic functionality of packet reception, transmission and event handling. + +* It has multiple queues support. + +* It supports ``RTE_ETH_EVENT_INTR_LSC`` and ``RTE_ETH_EVENT_QUEUE_STATE`` events. + +* It supports Port Hotplug functionality. + +* Don't need to stop RX/TX, when the user wants to stop a guest or a virtio-net driver on guest. + +Vhost PMD arguments +------------------- + +The user can specify below arguments in `--vdev` option. + +#. ``iface``: + + It is used to specify a path to connect to a QEMU virtio-net device. + +#. ``queues``: + + It is used to specify the number of queues virtio-net device has. + (Default: 1) + +Vhost PMD event handling +------------------------ + +This section describes how to handle vhost PMD events. + +The user can register an event callback handler with ``rte_eth_dev_callback_register()``. +The registered callback handler will be invoked with one of below event types. + +#. ``RTE_ETH_EVENT_INTR_LSC``: + + It means link status of the port was changed. + +#. ``RTE_ETH_EVENT_QUEUE_STATE``: + + It means some of queue statuses were changed. Call ``rte_eth_vhost_get_queue_event()`` in the callback handler. + Because changing multiple statuses may occur only one event, call the function repeatedly as long as it doesn't return negative value. + +Vhost PMD with testpmd application +---------------------------------- + +This section demonstrates vhost PMD with testpmd DPDK sample application. + +#. Launch the testpmd with vhost PMD: + + .. code-block:: console + + ./testpmd -l 0-3 -n 4 --vdev 'net_vhost0,iface=/tmp/sock0,queues=1' -- -i + + Other basic DPDK preparations like hugepage enabling here. + Please refer to the *DPDK Getting Started Guide* for detailed instructions. + +#. Launch the QEMU: + + .. code-block:: console + + qemu-system-x86_64 <snip> + -chardev socket,id=chr0,path=/tmp/sock0 \ + -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \ + -device virtio-net-pci,netdev=net0 + + This command attaches one virtio-net device to QEMU guest. + After initialization processes between QEMU and DPDK vhost library are done, status of the port will be linked up. diff --git a/src/seastar/dpdk/doc/guides/nics/virtio.rst b/src/seastar/dpdk/doc/guides/nics/virtio.rst new file mode 100644 index 00000000..91bedea6 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/virtio.rst @@ -0,0 +1,336 @@ +.. BSD LICENSE + Copyright(c) 2010-2015 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Poll Mode Driver for Emulated Virtio NIC +======================================== + +Virtio is a para-virtualization framework initiated by IBM, and supported by KVM hypervisor. +In the Data Plane Development Kit (DPDK), +we provide a virtio Poll Mode Driver (PMD) as a software solution, comparing to SRIOV hardware solution, +for fast guest VM to guest VM communication and guest VM to host communication. + +Vhost is a kernel acceleration module for virtio qemu backend. +The DPDK extends kni to support vhost raw socket interface, +which enables vhost to directly read/ write packets from/to a physical port. +With this enhancement, virtio could achieve quite promising performance. + +In future release, we will also make enhancement to vhost backend, +releasing peak performance of virtio PMD driver. + +For basic qemu-KVM installation and other Intel EM poll mode driver in guest VM, +please refer to Chapter "Driver for VM Emulated Devices". + +In this chapter, we will demonstrate usage of virtio PMD driver with two backends, +standard qemu vhost back end and vhost kni back end. + +Virtio Implementation in DPDK +----------------------------- + +For details about the virtio spec, refer to Virtio PCI Card Specification written by Rusty Russell. + +As a PMD, virtio provides packet reception and transmission callbacks virtio_recv_pkts and virtio_xmit_pkts. + +In virtio_recv_pkts, index in range [vq->vq_used_cons_idx , vq->vq_ring.used->idx) in vring is available for virtio to burst out. + +In virtio_xmit_pkts, same index range in vring is available for virtio to clean. +Virtio will enqueue to be transmitted packets into vring, advance the vq->vq_ring.avail->idx, +and then notify the host back end if necessary. + +Features and Limitations of virtio PMD +-------------------------------------- + +In this release, the virtio PMD driver provides the basic functionality of packet reception and transmission. + +* It supports merge-able buffers per packet when receiving packets and scattered buffer per packet + when transmitting packets. The packet size supported is from 64 to 1518. + +* It supports multicast packets and promiscuous mode. + +* The descriptor number for the Rx/Tx queue is hard-coded to be 256 by qemu. + If given a different descriptor number by the upper application, + the virtio PMD generates a warning and fall back to the hard-coded value. + +* Features of mac/vlan filter are supported, negotiation with vhost/backend are needed to support them. + When backend can't support vlan filter, virtio app on guest should disable vlan filter to make sure + the virtio port is configured correctly. E.g. specify '--disable-hw-vlan' in testpmd command line. + +* RTE_PKTMBUF_HEADROOM should be defined larger than sizeof(struct virtio_net_hdr), which is 10 bytes. + +* Virtio does not support runtime configuration. + +* Virtio supports Link State interrupt. + +* Virtio supports Rx interrupt (so far, only support 1:1 mapping for queue/interrupt). + +* Virtio supports software vlan stripping and inserting. + +* Virtio supports using port IO to get PCI resource when uio/igb_uio module is not available. + +Prerequisites +------------- + +The following prerequisites apply: + +* In the BIOS, turn VT-x and VT-d on + +* Linux kernel with KVM module; vhost module loaded and ioeventfd supported. + Qemu standard backend without vhost support isn't tested, and probably isn't supported. + +Virtio with kni vhost Back End +------------------------------ + +This section demonstrates kni vhost back end example setup for Phy-VM Communication. + +.. _figure_host_vm_comms: + +.. figure:: img/host_vm_comms.* + + Host2VM Communication Example Using kni vhost Back End + + +Host2VM communication example + +#. Load the kni kernel module: + + .. code-block:: console + + insmod rte_kni.ko + + Other basic DPDK preparations like hugepage enabling, uio port binding are not listed here. + Please refer to the *DPDK Getting Started Guide* for detailed instructions. + +#. Launch the kni user application: + + .. code-block:: console + + examples/kni/build/app/kni -l 0-3 -n 4 -- -p 0x1 -P --config="(0,1,3)" + + This command generates one network device vEth0 for physical port. + If specify more physical ports, the generated network device will be vEth1, vEth2, and so on. + + For each physical port, kni creates two user threads. + One thread loops to fetch packets from the physical NIC port into the kni receive queue. + The other user thread loops to send packets in the kni transmit queue. + + For each physical port, kni also creates a kernel thread that retrieves packets from the kni receive queue, + place them onto kni's raw socket's queue and wake up the vhost kernel thread to exchange packets with the virtio virt queue. + + For more details about kni, please refer to :ref:`kni`. + +#. Enable the kni raw socket functionality for the specified physical NIC port, + get the generated file descriptor and set it in the qemu command line parameter. + Always remember to set ioeventfd_on and vhost_on. + + Example: + + .. code-block:: console + + echo 1 > /sys/class/net/vEth0/sock_en + fd=`cat /sys/class/net/vEth0/sock_fd` + exec qemu-system-x86_64 -enable-kvm -cpu host \ + -m 2048 -smp 4 -name dpdk-test1-vm1 \ + -drive file=/data/DPDKVMS/dpdk-vm.img \ + -netdev tap, fd=$fd,id=mynet_kni, script=no,vhost=on \ + -device virtio-net-pci,netdev=mynet_kni,bus=pci.0,addr=0x3,ioeventfd=on \ + -vnc:1 -daemonize + + In the above example, virtio port 0 in the guest VM will be associated with vEth0, which in turns corresponds to a physical port, + which means received packets come from vEth0, and transmitted packets is sent to vEth0. + +#. In the guest, bind the virtio device to the uio_pci_generic kernel module and start the forwarding application. + When the virtio port in guest bursts Rx, it is getting packets from the + raw socket's receive queue. + When the virtio port bursts Tx, it is sending packet to the tx_q. + + .. code-block:: console + + modprobe uio + echo 512 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages + modprobe uio_pci_generic + python usertools/dpdk-devbind.py -b uio_pci_generic 00:03.0 + + We use testpmd as the forwarding application in this example. + + .. figure:: img/console.* + + Running testpmd + +#. Use IXIA packet generator to inject a packet stream into the KNI physical port. + + The packet reception and transmission flow path is: + + IXIA packet generator->82599 PF->KNI Rx queue->KNI raw socket queue->Guest + VM virtio port 0 Rx burst->Guest VM virtio port 0 Tx burst-> KNI Tx queue + ->82599 PF-> IXIA packet generator + +Virtio with qemu virtio Back End +-------------------------------- + +.. _figure_host_vm_comms_qemu: + +.. figure:: img/host_vm_comms_qemu.* + + Host2VM Communication Example Using qemu vhost Back End + + +.. code-block:: console + + qemu-system-x86_64 -enable-kvm -cpu host -m 2048 -smp 2 -mem-path /dev/ + hugepages -mem-prealloc + -drive file=/data/DPDKVMS/dpdk-vm1 + -netdev tap,id=vm1_p1,ifname=tap0,script=no,vhost=on + -device virtio-net-pci,netdev=vm1_p1,bus=pci.0,addr=0x3,ioeventfd=on + -device pci-assign,host=04:10.1 \ + +In this example, the packet reception flow path is: + + IXIA packet generator->82599 PF->Linux Bridge->TAP0's socket queue-> Guest + VM virtio port 0 Rx burst-> Guest VM 82599 VF port1 Tx burst-> IXIA packet + generator + +The packet transmission flow is: + + IXIA packet generator-> Guest VM 82599 VF port1 Rx burst-> Guest VM virtio + port 0 Tx burst-> tap -> Linux Bridge->82599 PF-> IXIA packet generator + + +Virtio PMD Rx/Tx Callbacks +-------------------------- + +Virtio driver has 3 Rx callbacks and 2 Tx callbacks. + +Rx callbacks: + +#. ``virtio_recv_pkts``: + Regular version without mergeable Rx buffer support. + +#. ``virtio_recv_mergeable_pkts``: + Regular version with mergeable Rx buffer support. + +#. ``virtio_recv_pkts_vec``: + Vector version without mergeable Rx buffer support, also fixes the available + ring indexes and uses vector instructions to optimize performance. + +Tx callbacks: + +#. ``virtio_xmit_pkts``: + Regular version. + +#. ``virtio_xmit_pkts_simple``: + Vector version fixes the available ring indexes to optimize performance. + + +By default, the non-vector callbacks are used: + +* For Rx: If mergeable Rx buffers is disabled then ``virtio_recv_pkts`` is + used; otherwise ``virtio_recv_mergeable_pkts``. + +* For Tx: ``virtio_xmit_pkts``. + + +Vector callbacks will be used when: + +* ``txq_flags`` is set to ``VIRTIO_SIMPLE_FLAGS`` (0xF01), which implies: + + * Single segment is specified. + + * No offload support is needed. + +* Mergeable Rx buffers is disabled. + +The corresponding callbacks are: + +* For Rx: ``virtio_recv_pkts_vec``. + +* For Tx: ``virtio_xmit_pkts_simple``. + + +Example of using the vector version of the virtio poll mode driver in +``testpmd``:: + + testpmd -l 0-2 -n 4 -- -i --txqflags=0xF01 --rxq=1 --txq=1 --nb-cores=1 + + +Interrupt mode +-------------- + +.. _virtio_interrupt_mode: + +There are three kinds of interrupts from a virtio device over PCI bus: config +interrupt, Rx interrupts, and Tx interrupts. Config interrupt is used for +notification of device configuration changes, especially link status (lsc). +Interrupt mode is translated into Rx interrupts in the context of DPDK. + +Prerequisites for Rx interrupts +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To support Rx interrupts, +#. Check if guest kernel supports VFIO-NOIOMMU: + + Linux started to support VFIO-NOIOMMU since 4.8.0. Make sure the guest + kernel is compiled with: + + .. code-block:: console + + CONFIG_VFIO_NOIOMMU=y + +#. Properly set msix vectors when starting VM: + + Enable multi-queue when starting VM, and specify msix vectors in qemu + cmdline. (N+1) is the minimum, and (2N+2) is mostly recommended. + + .. code-block:: console + + $(QEMU) ... -device virtio-net-pci,mq=on,vectors=2N+2 ... + +#. In VM, insert vfio module in NOIOMMU mode: + + .. code-block:: console + + modprobe vfio enable_unsafe_noiommu_mode=1 + modprobe vfio-pci + +#. In VM, bind the virtio device with vfio-pci: + + .. code-block:: console + + python usertools/dpdk-devbind.py -b vfio-pci 00:03.0 + +Example +~~~~~~~ + +Here we use l3fwd-power as an example to show how to get started. + + Example: + + .. code-block:: console + + $ l3fwd-power -l 0-1 -- -p 1 -P --config="(0,0,1)" \ + --no-numa --parse-ptype diff --git a/src/seastar/dpdk/doc/guides/nics/vmxnet3.rst b/src/seastar/dpdk/doc/guides/nics/vmxnet3.rst new file mode 100644 index 00000000..bf845942 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/nics/vmxnet3.rst @@ -0,0 +1,189 @@ +.. BSD LICENSE + Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Poll Mode Driver for Paravirtual VMXNET3 NIC +============================================ + +The VMXNET3 adapter is the next generation of a paravirtualized NIC, introduced by VMware* ESXi. +It is designed for performance, offers all the features available in VMXNET2, and adds several new features such as, +multi-queue support (also known as Receive Side Scaling, RSS), +IPv6 offloads, and MSI/MSI-X interrupt delivery. +One can use the same device in a DPDK application with VMXNET3 PMD introduced in DPDK API. + +In this chapter, two setups with the use of the VMXNET3 PMD are demonstrated: + +#. Vmxnet3 with a native NIC connected to a vSwitch + +#. Vmxnet3 chaining VMs connected to a vSwitch + +VMXNET3 Implementation in the DPDK +---------------------------------- + +For details on the VMXNET3 device, refer to the VMXNET3 driver's vmxnet3 directory and support manual from VMware*. + +For performance details, refer to the following link from VMware: + +`http://www.vmware.com/pdf/vsp_4_vmxnet3_perf.pdf <http://www.vmware.com/pdf/vsp_4_vmxnet3_perf.pdf>`_ + +As a PMD, the VMXNET3 driver provides the packet reception and transmission callbacks, vmxnet3_recv_pkts and vmxnet3_xmit_pkts. + +The VMXNET3 PMD handles all the packet buffer memory allocation and resides in guest address space +and it is solely responsible to free that memory when not needed. +The packet buffers and features to be supported are made available to hypervisor via VMXNET3 PCI configuration space BARs. +During RX/TX, the packet buffers are exchanged by their GPAs, +and the hypervisor loads the buffers with packets in the RX case and sends packets to vSwitch in the TX case. + +The VMXNET3 PMD is compiled with vmxnet3 device headers. +The interface is similar to that of the other PMDs available in the DPDK API. +The driver pre-allocates the packet buffers and loads the command ring descriptors in advance. +The hypervisor fills those packet buffers on packet arrival and write completion ring descriptors, +which are eventually pulled by the PMD. +After reception, the DPDK application frees the descriptors and loads new packet buffers for the coming packets. +The interrupts are disabled and there is no notification required. +This keeps performance up on the RX side, even though the device provides a notification feature. + +In the transmit routine, the DPDK application fills packet buffer pointers in the descriptors of the command ring +and notifies the hypervisor. +In response the hypervisor takes packets and passes them to the vSwitch, It writes into the completion descriptors ring. +The rings are read by the PMD in the next transmit routine call and the buffers and descriptors are freed from memory. + +Features and Limitations of VMXNET3 PMD +--------------------------------------- + +In release 1.6.0, the VMXNET3 PMD provides the basic functionality of packet reception and transmission. +There are several options available for filtering packets at VMXNET3 device level including: + +#. MAC Address based filtering: + + * Unicast, Broadcast, All Multicast modes - SUPPORTED BY DEFAULT + + * Multicast with Multicast Filter table - NOT SUPPORTED + + * Promiscuous mode - SUPPORTED + + * RSS based load balancing between queues - SUPPORTED + +#. VLAN filtering: + + * VLAN tag based filtering without load balancing - SUPPORTED + +.. note:: + + + * Release 1.6.0 does not support separate headers and body receive cmd_ring and hence, + multiple segment buffers are not supported. + Only cmd_ring_0 is used for packet buffers, one for each descriptor. + + * Receive and transmit of scattered packets is not supported. + + * Multicast with Multicast Filter table is not supported. + +Prerequisites +------------- + +The following prerequisites apply: + +* Before starting a VM, a VMXNET3 interface to a VM through VMware vSphere Client must be assigned. + This is shown in the figure below. + +.. _figure_vmxnet3_int: + +.. figure:: img/vmxnet3_int.* + + Assigning a VMXNET3 interface to a VM using VMware vSphere Client + +.. note:: + + Depending on the Virtual Machine type, the VMware vSphere Client shows Ethernet adaptors while adding an Ethernet device. + Ensure that the VM type used offers a VMXNET3 device. Refer to the VMware documentation for a listed of VMs. + +.. note:: + + Follow the *DPDK Getting Started Guide* to setup the basic DPDK environment. + +.. note:: + + Follow the *DPDK Sample Application's User Guide*, L2 Forwarding/L3 Forwarding and + TestPMD for instructions on how to run a DPDK application using an assigned VMXNET3 device. + +VMXNET3 with a Native NIC Connected to a vSwitch +------------------------------------------------ + +This section describes an example setup for Phy-vSwitch-VM-Phy communication. + +.. _figure_vswitch_vm: + +.. figure:: img/vswitch_vm.* + + VMXNET3 with a Native NIC Connected to a vSwitch + +.. note:: + + Other instructions on preparing to use DPDK such as, hugepage enabling, uio port binding are not listed here. + Please refer to *DPDK Getting Started Guide and DPDK Sample Application's User Guide* for detailed instructions. + +The packet reception and transmission flow path is:: + + Packet generator -> 82576 + -> VMware ESXi vSwitch + -> VMXNET3 device + -> Guest VM VMXNET3 port 0 rx burst + -> Guest VM 82599 VF port 0 tx burst + -> 82599 VF + -> Packet generator + +VMXNET3 Chaining VMs Connected to a vSwitch +------------------------------------------- + +The following figure shows an example VM-to-VM communication over a Phy-VM-vSwitch-VM-Phy communication channel. + +.. _figure_vm_vm_comms: + +.. figure:: img/vm_vm_comms.* + + VMXNET3 Chaining VMs Connected to a vSwitch + +.. note:: + + When using the L2 Forwarding or L3 Forwarding applications, + a destination MAC address needs to be written in packets to hit the other VM's VMXNET3 interface. + +In this example, the packet flow path is:: + + Packet generator -> 82599 VF + -> Guest VM 82599 port 0 rx burst + -> Guest VM VMXNET3 port 1 tx burst + -> VMXNET3 device + -> VMware ESXi vSwitch + -> VMXNET3 device + -> Guest VM VMXNET3 port 0 rx burst + -> Guest VM 82599 VF port 1 tx burst + -> 82599 VF + -> Packet generator |