From 483eb2f56657e8e7f419ab1a4fab8dce9ade8609 Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Sat, 27 Apr 2024 20:24:20 +0200 Subject: Adding upstream version 14.2.21. Signed-off-by: Daniel Baumann --- .../dpdk/doc/guides/howto/flow_bifurcation.rst | 299 +++++++++ .../guides/howto/img/flow_bifurcation_overview.svg | 544 ++++++++++++++++ .../doc/guides/howto/img/ixgbe_bifu_queue_idx.svg | 101 +++ .../doc/guides/howto/img/lm_bond_virtio_sriov.svg | 666 +++++++++++++++++++ .../dpdk/doc/guides/howto/img/lm_vhost_user.svg | 644 +++++++++++++++++++ .../dpdk/doc/guides/howto/img/pvp_2nics.svg | 556 ++++++++++++++++ .../use_models_for_running_dpdk_in_containers.svg | 398 ++++++++++++ .../doc/guides/howto/img/vf_daemon_overview.svg | 440 +++++++++++++ .../howto/img/virtio_user_as_exceptional_path.svg | 207 ++++++ .../img/virtio_user_for_container_networking.svg | 685 ++++++++++++++++++++ src/seastar/dpdk/doc/guides/howto/index.rst | 44 ++ .../dpdk/doc/guides/howto/lm_bond_virtio_sriov.rst | 713 +++++++++++++++++++++ .../dpdk/doc/guides/howto/lm_virtio_vhost_user.rst | 469 ++++++++++++++ .../doc/guides/howto/pvp_reference_benchmark.rst | 398 ++++++++++++ src/seastar/dpdk/doc/guides/howto/vfd.rst | 407 ++++++++++++ .../howto/virtio_user_as_exceptional_path.rst | 142 ++++ .../howto/virtio_user_for_container_networking.rst | 144 +++++ 17 files changed, 6857 insertions(+) create mode 100644 src/seastar/dpdk/doc/guides/howto/flow_bifurcation.rst create mode 100644 src/seastar/dpdk/doc/guides/howto/img/flow_bifurcation_overview.svg create mode 100644 src/seastar/dpdk/doc/guides/howto/img/ixgbe_bifu_queue_idx.svg create mode 100644 src/seastar/dpdk/doc/guides/howto/img/lm_bond_virtio_sriov.svg create mode 100644 src/seastar/dpdk/doc/guides/howto/img/lm_vhost_user.svg create mode 100644 src/seastar/dpdk/doc/guides/howto/img/pvp_2nics.svg create mode 100644 src/seastar/dpdk/doc/guides/howto/img/use_models_for_running_dpdk_in_containers.svg create mode 100644 src/seastar/dpdk/doc/guides/howto/img/vf_daemon_overview.svg create mode 100644 src/seastar/dpdk/doc/guides/howto/img/virtio_user_as_exceptional_path.svg create mode 100644 src/seastar/dpdk/doc/guides/howto/img/virtio_user_for_container_networking.svg create mode 100644 src/seastar/dpdk/doc/guides/howto/index.rst create mode 100644 src/seastar/dpdk/doc/guides/howto/lm_bond_virtio_sriov.rst create mode 100644 src/seastar/dpdk/doc/guides/howto/lm_virtio_vhost_user.rst create mode 100644 src/seastar/dpdk/doc/guides/howto/pvp_reference_benchmark.rst create mode 100644 src/seastar/dpdk/doc/guides/howto/vfd.rst create mode 100644 src/seastar/dpdk/doc/guides/howto/virtio_user_as_exceptional_path.rst create mode 100644 src/seastar/dpdk/doc/guides/howto/virtio_user_for_container_networking.rst (limited to 'src/seastar/dpdk/doc/guides/howto') diff --git a/src/seastar/dpdk/doc/guides/howto/flow_bifurcation.rst b/src/seastar/dpdk/doc/guides/howto/flow_bifurcation.rst new file mode 100644 index 00000000..61016a4f --- /dev/null +++ b/src/seastar/dpdk/doc/guides/howto/flow_bifurcation.rst @@ -0,0 +1,299 @@ +.. BSD LICENSE + Copyright(c) 2016 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + + +Flow Bifurcation How-to Guide +============================= + +Flow Bifurcation is a mechanism which uses hardware capable Ethernet devices +to split traffic between Linux user space and kernel space. Since it is a +hardware assisted feature this approach can provide line rate processing +capability. Other than :ref:`KNI `, the software is just required to +enable device configuration, there is no need to take care of the packet +movement during the traffic split. This can yield better performance with +less CPU overhead. + +The Flow Bifurcation splits the incoming data traffic to user space +applications (such as DPDK applications) and/or kernel space programs (such as +the Linux kernel stack). It can direct some traffic, for example data plane +traffic, to DPDK, while directing some other traffic, for example control +plane traffic, to the traditional Linux networking stack. + +There are a number of technical options to achieve this. A typical example is +to combine the technology of SR-IOV and packet classification filtering. + +SR-IOV is a PCI standard that allows the same physical adapter to be split as +multiple virtual functions. Each virtual function (VF) has separated queues +with physical functions (PF). The network adapter will direct traffic to a +virtual function with a matching destination MAC address. In a sense, SR-IOV +has the capability for queue division. + +Packet classification filtering is a hardware capability available on most +network adapters. Filters can be configured to direct specific flows to a +given receive queue by hardware. Different NICs may have different filter +types to direct flows to a Virtual Function or a queue that belong to it. + +In this way the Linux networking stack can receive specific traffic through +the kernel driver while a DPDK application can receive specific traffic +bypassing the Linux kernel by using drivers like VFIO or the DPDK ``igb_uio`` +module. + +.. _figure_flow_bifurcation_overview: + +.. figure:: img/flow_bifurcation_overview.* + + Flow Bifurcation Overview + + +Using Flow Bifurcation on IXGBE in Linux +---------------------------------------- + +On Intel 82599 10 Gigabit Ethernet Controller series NICs Flow Bifurcation can +be achieved by SR-IOV and Intel Flow Director technologies. Traffic can be +directed to queues by the Flow Director capability, typically by matching +5-tuple of UDP/TCP packets. + +The typical procedure to achieve this is as follows: + +#. Boot the system without iommu, or with ``iommu=pt``. + +#. Create Virtual Functions: + + .. code-block:: console + + echo 2 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs + +#. Enable and set flow filters: + + .. code-block:: console + + ethtool -K eth1 ntuple on + ethtool -N eth1 flow-type udp4 src-ip 192.0.2.2 dst-ip 198.51.100.2 \ + action $queue_index_in_VF0 + ethtool -N eth1 flow-type udp4 src-ip 198.51.100.2 dst-ip 192.0.2.2 \ + action $queue_index_in_VF1 + + Where: + + * ``$queue_index_in_VFn``: Bits 39:32 of the variable defines VF id + 1; the lower 32 bits indicates the queue index of the VF. Thus: + + * ``$queue_index_in_VF0`` = ``(0x1 & 0xFF) << 32 + [queue index]``. + + * ``$queue_index_in_VF1`` = ``(0x2 & 0xFF) << 32 + [queue index]``. + + .. _figure_ixgbe_bifu_queue_idx: + + .. figure:: img/ixgbe_bifu_queue_idx.* + +#. Compile the DPDK application and insert ``igb_uio`` or probe the ``vfio-pci`` kernel modules as normal. + +#. Bind the virtual functions: + + .. code-block:: console + + modprobe vfio-pci + dpdk-devbind.py -b vfio-pci 01:10.0 + dpdk-devbind.py -b vfio-pci 01:10.1 + +#. Run a DPDK application on the VFs: + + .. code-block:: console + + testpmd -l 0-7 -n 4 -- -i -w 01:10.0 -w 01:10.1 --forward-mode=mac + +In this example, traffic matching the rules will go through the VF by matching +the filter rule. All other traffic, not matching the rules, will go through +the default queue or scaling on queues in the PF. That is to say UDP packets +with the specified IP source and destination addresses will go through the +DPDK application. All other traffic, with different hosts or different +protocols, will go through the Linux networking stack. + +.. note:: + + * The above steps work on the Linux kernel v4.2. + + * The Flow Bifurcation is implemented in Linux kernel and ixgbe kernel driver using the following patches: + + * `ethtool: Add helper routines to pass vf to rx_flow_spec `_ + + * `ixgbe: Allow flow director to use entire queue space `_ + + * The Ethtool version used in this example is 3.18. + + +Using Flow Bifurcation on I40E in Linux +--------------------------------------- + +On Intel X710/XL710 series Ethernet Controllers Flow Bifurcation can be +achieved by SR-IOV, Cloud Filter and L3 VEB switch. The traffic can be +directed to queues by the Cloud Filter and L3 VEB switch's matching rule. + +* L3 VEB filters work for non-tunneled packets. It can direct a packet just by + the Destination IP address to a queue in a VF. + +* Cloud filters work for the following types of tunneled packets. + + * Inner mac. + + * Inner mac + VNI. + + * Outer mac + Inner mac + VNI. + + * Inner mac + Inner vlan + VNI. + + * Inner mac + Inner vlan. + +The typical procedure to achieve this is as follows: + +#. Boot the system without iommu, or with ``iommu=pt``. + +#. Build and insert the ``i40e.ko`` module. + +#. Create Virtual Functions: + + .. code-block:: console + + echo 2 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs + +#. Add udp port offload to the NIC if using cloud filter: + + .. code-block:: console + + ip li add vxlan0 type vxlan id 42 group 239.1.1.1 local 10.16.43.214 dev + ifconfig vxlan0 up + ip -d li show vxlan0 + + .. note:: + + Output such as ``add vxlan port 8472, index 0 success`` should be + found in the system log. + +#. Examples of enabling and setting flow filters: + + * L3 VEB filter, for a route whose destination IP is 192.168.50.108 to VF + 0's queue 2. + + .. code-block:: console + + ethtool -N flow-type ip4 dst-ip 192.168.50.108 \ + user-def 0xffffffff00000000 action 2 loc 8 + + * Inner mac, for a route whose inner destination mac is 0:0:0:0:9:0 to + PF's queue 6. + + .. code-block:: console + + ethtool -N flow-type ether dst 00:00:00:00:00:00 \ + m ff:ff:ff:ff:ff:ff src 00:00:00:00:09:00 m 00:00:00:00:00:00 \ + user-def 0xffffffff00000003 action 6 loc 1 + + * Inner mac + VNI, for a route whose inner destination mac is 0:0:0:0:9:0 + and VNI is 8 to PF's queue 4. + + .. code-block:: console + + ethtool -N flow-type ether dst 00:00:00:00:00:00 \ + m ff:ff:ff:ff:ff:ff src 00:00:00:00:09:00 m 00:00:00:00:00:00 \ + user-def 0x800000003 action 4 loc 4 + + * Outer mac + Inner mac + VNI, for a route whose outer mac is + 68:05:ca:24:03:8b, inner destination mac is c2:1a:e1:53:bc:57, and VNI + is 8 to PF's queue 2. + + .. code-block:: console + + ethtool -N flow-type ether dst 68:05:ca:24:03:8b \ + m 00:00:00:00:00:00 src c2:1a:e1:53:bc:57 m 00:00:00:00:00:00 \ + user-def 0x800000003 action 2 loc 2 + + * Inner mac + Inner vlan + VNI, for a route whose inner destination mac is + 00:00:00:00:20:00, inner vlan is 10, and VNI is 8 to VF 0's queue 1. + + .. code-block:: console + + ethtool -N flow-type ether dst 00:00:00:00:01:00 \ + m ff:ff:ff:ff:ff:ff src 00:00:00:00:20:00 m 00:00:00:00:00:00 \ + vlan 10 user-def 0x800000000 action 1 loc 5 + + * Inner mac + Inner vlan, for a route whose inner destination mac is + 00:00:00:00:20:00, and inner vlan is 10 to VF 0's queue 1. + + .. code-block:: console + + ethtool -N flow-type ether dst 00:00:00:00:01:00 \ + m ff:ff:ff:ff:ff:ff src 00:00:00:00:20:00 m 00:00:00:00:00:00 \ + vlan 10 user-def 0xffffffff00000000 action 1 loc 5 + + .. note:: + + * If the upper 32 bits of 'user-def' are ``0xffffffff``, then the + filter can be used for programming an L3 VEB filter, otherwise the + upper 32 bits of 'user-def' can carry the tenant ID/VNI if + specified/required. + + * Cloud filters can be defined with inner mac, outer mac, inner ip, + inner vlan and VNI as part of the cloud tuple. It is always the + destination (not source) mac/ip that these filters use. For all + these examples dst and src mac address fields are overloaded dst == + outer, src == inner. + + * The filter will direct a packet matching the rule to a vf id + specified in the lower 32 bit of user-def to the queue specified by + 'action'. + + * If the vf id specified by the lower 32 bit of user-def is greater + than or equal to ``max_vfs``, then the filter is for the PF queues. + +#. Compile the DPDK application and insert ``igb_uio`` or probe the ``vfio-pci`` + kernel modules as normal. + +#. Bind the virtual function: + + .. code-block:: console + + modprobe vfio-pci + dpdk-devbind.py -b vfio-pci 01:10.0 + dpdk-devbind.py -b vfio-pci 01:10.1 + +#. run DPDK application on VFs: + + .. code-block:: console + + testpmd -l 0-7 -n 4 -- -i -w 01:10.0 -w 01:10.1 --forward-mode=mac + +.. note:: + + * The above steps work on the i40e Linux kernel driver v1.5.16. + + * The Ethtool version used in this example is 3.18. The mask ``ff`` means + 'not involved', while ``00`` or no mask means 'involved'. + + * For more details of the configuration, refer to the + `cloud filter test plan `_ diff --git a/src/seastar/dpdk/doc/guides/howto/img/flow_bifurcation_overview.svg b/src/seastar/dpdk/doc/guides/howto/img/flow_bifurcation_overview.svg new file mode 100644 index 00000000..4fa27648 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/howto/img/flow_bifurcation_overview.svg @@ -0,0 +1,544 @@ + + + +image/svg+xmlPage-1Sheet.85NICNIC +NIC +Rounded Rectangle.20LINUXLINUX +LINUX +Rounded Rectangle.8Kernel pf driverKernel pf driver +Rounded RectangleFilters support traffic steering to VFFilters support traffic +steering to VF +Rectangle.3Rx Queues (0-N) PFRx Queues +( 0-N ) + PF +Rectangle.4Rx Queues (0-M) VF(vf 0)Rx Queues +( 0-M ) +VF(vf0) +Rectangle.5filtersfilters +Rounded Rectangle.6Tools to program filtersTools to +program filters +2-D word balloonDirector flows to queue index in specified VFinspecified VF +Director flows +to queue index +in specified VF +Rounded Rectangle.24DPDKDPDK +Rounded Rectangle.25SocketSocket +Simple Arrow.44Single arrowheadDynamic connector.70Dynamic connector.81Dynamic connector.83Dynamic connector.84Sheet.98Sheet.109Sheet.110 \ No newline at end of file diff --git a/src/seastar/dpdk/doc/guides/howto/img/ixgbe_bifu_queue_idx.svg b/src/seastar/dpdk/doc/guides/howto/img/ixgbe_bifu_queue_idx.svg new file mode 100644 index 00000000..f7e2bd80 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/howto/img/ixgbe_bifu_queue_idx.svg @@ -0,0 +1,101 @@ + + + + + + + + + + + + + + + Page-1 + + + Rectangle + 0x000000 + + + + + + + + + + 0x000000 + + Rectangle.2 + VF ID + 1 + + + + + + + + + + VF ID + 1 + + Rectangle.3 + Queue Index + + + + + + + + + + Queue Index + + Sheet.4 + 0 + + + + 0 + + Sheet.6 + 31 + + + + 31 + + Sheet.7 + 39 + + + + 39 + + Sheet.8 + 63 + + + + 63 + + diff --git a/src/seastar/dpdk/doc/guides/howto/img/lm_bond_virtio_sriov.svg b/src/seastar/dpdk/doc/guides/howto/img/lm_bond_virtio_sriov.svg new file mode 100644 index 00000000..d913ae01 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/howto/img/lm_bond_virtio_sriov.svg @@ -0,0 +1,666 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + VM 1 + Switch with 10Gb ports + Server 1 + Server 2 + 10 Gb Traffic Generator + VM 2 + Linux, KVM, QEMU + Linux, KVM, QEMU + 10 Gb NIC + 10 Gb NIC + + + 10 Gb NIC + 10 Gb NIC + + DPDK Testpmd App. + bonded device withvirtio and VF slaves + + DPDK Testpmd App. + bonded device withvirtio and VF slaves + Kernel PF driver + Kernel PF driver + SW bridge with Tapand PF connected + NFS ServerVM disk image + + + + + + + + + + + + + + + + SW bridge with Tapand PF connected + 10 Gb Migration Link + + diff --git a/src/seastar/dpdk/doc/guides/howto/img/lm_vhost_user.svg b/src/seastar/dpdk/doc/guides/howto/img/lm_vhost_user.svg new file mode 100644 index 00000000..3601cf11 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/howto/img/lm_vhost_user.svg @@ -0,0 +1,644 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + VM 1 + Switch with 10Gb ports + Server 1 + Server 2 + 10 Gb Traffic Generator + VM 2 + Linux, KVM, QEMU 2.5 + 10 Gb NIC + 10 Gb NIC + + + 10 Gb NIC + 10 Gb NIC + + + DPDK Testpmd App + DPDK virtio PMD's + DPDK PF PMD and vhost_user + DPDK PF PMD and vhost_user + NFS ServerVM disk image + + + + + + + + + + + + + + + + 10 Gb Migration Link + DPDK Testpmd App + DPDK virtio PMD's + Linux, KVM, QEMU 2.5 + + diff --git a/src/seastar/dpdk/doc/guides/howto/img/pvp_2nics.svg b/src/seastar/dpdk/doc/guides/howto/img/pvp_2nics.svg new file mode 100644 index 00000000..517a8008 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/howto/img/pvp_2nics.svg @@ -0,0 +1,556 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + TE + + 10G NIC + Moongen + + DUT + + VM + + 10G NIC + + TestPMD(macswap) + + TestPMD (io) + + + + + + + + 10G NIC + + 10G NIC + + + + + + + + + + diff --git a/src/seastar/dpdk/doc/guides/howto/img/use_models_for_running_dpdk_in_containers.svg b/src/seastar/dpdk/doc/guides/howto/img/use_models_for_running_dpdk_in_containers.svg new file mode 100644 index 00000000..662c2266 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/howto/img/use_models_for_running_dpdk_in_containers.svg @@ -0,0 +1,398 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Page-1 + + + + Rectangle + Container + + + + + + + + + + Container + + Rectangle.3 + + + + + + + + + + Rectangle.4 + + + + + + + + + + Sheet.5 + Host kernel + + + + Host kernel + + Sheet.6 + NIC + + + + NIC + + Rectangle.7 + PF + + + + + + + + + + PF + + Rectangle.8 + VF + + + + + + + + + + VF + + Rectangle.10 + Hardware virtual switch + + + + + + + + + + Hardware virtual switch + + Rectangle.14 + Container + + + + + + + + + + Container + + Rectangle.15 + VF + + + + + + + + + + VF + + Dynamic connector.16 + + + + Ellipse + PF driver + + + + + + + + + + PF driver + + Dynamic connector.19 + + + + Ellipse.20 + DPDK + + + + + + + + + + DPDK + + Dynamic connector.21 + + + + Ellipse.22 + DPDK + + + + + + + + + + DPDK + + Rectangle.23 + Virtual Appliance + + + + + + + + + + Virtual Appliance + + Rectangle.25 + VM + Container + + + + + + + + + + VM + Container + + Rectangle.27 + Container + + + + + + + + + + Container + + Ellipse.28 + DPDK + + + + + + + + + + DPDK + + Rectangle.29 + + + + + + + + + + Sheet.30 + Host kernel + + + + Host kernel + + Rectangle.31 + vSwitch or vRouter + + + + + + + + + + vSwitchorvRouter + + Ellipse.32 + DPDK + + + + + + + + + + DPDK + + Dynamic connector + + + + Dynamic connector.35 + + + + Dynamic connector.36 + + + + Rectangle.37 + + + + + + + + + + Rectangle.38 + + + + + + + + + + Sheet.39 + NIC + + + + NIC + + Dynamic connector.41 + + + + Dynamic connector.42 + + + + Sheet.43 + (1) Slicing + + + + (1) Slicing + + Sheet.44 + (2) Aggregation + + + + (2) Aggregation + + diff --git a/src/seastar/dpdk/doc/guides/howto/img/vf_daemon_overview.svg b/src/seastar/dpdk/doc/guides/howto/img/vf_daemon_overview.svg new file mode 100644 index 00000000..d4a47234 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/howto/img/vf_daemon_overview.svg @@ -0,0 +1,440 @@ + + + + + +image/svg+xmlSimple Double Arrow.14VM +VF Application +DPDK +Virtual ethdev +VF driver +Simple Double Arrow.14Simple Double Arrow.14Host +PF Application +DPDK +Ethdev +PF driver + diff --git a/src/seastar/dpdk/doc/guides/howto/img/virtio_user_as_exceptional_path.svg b/src/seastar/dpdk/doc/guides/howto/img/virtio_user_as_exceptional_path.svg new file mode 100644 index 00000000..b231b709 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/howto/img/virtio_user_as_exceptional_path.svg @@ -0,0 +1,207 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Page-1 + + + + Rectangle.23 + + + + + + + + + + Dynamic connector.42 + + + + Rectangle.45 + tap + + + + + + + + + + tap + + Rectangle.46 + vhost ko + + + + + + + + + + vhost ko + + Sheet.47 + Kernel space + + + + Kernel space + + Sheet.48 + User space + + + + User space + + Rectangle.49 + ETHDEV + + + + + + + + + + ETHDEV + + Rectangle.50 + virtio PMD + + + + + + + + + + virtio PMD + + Rectangle.51 + other PMDs + + + + + + + + + + other PMDs + + Rectangle.52 + virtio-user + + + + + + + + + + virtio-user + + Rectangle.53 + vhost adapter + + + + + + + + + + vhost adapter + + Dynamic connector + + + + Dynamic connector.55 + + + + Rectangle.38 + + + + + + + + + + Sheet.57 + NIC + + + + NIC + + Dynamic connector.41 + + + + diff --git a/src/seastar/dpdk/doc/guides/howto/img/virtio_user_for_container_networking.svg b/src/seastar/dpdk/doc/guides/howto/img/virtio_user_for_container_networking.svg new file mode 100644 index 00000000..de808066 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/howto/img/virtio_user_for_container_networking.svg @@ -0,0 +1,685 @@ + +image/svg+xmlPage-1Rectangle.23Rectangle.49ETHDEVethdev +Rectangle.50virtio PMDvirtio PMD +Rectangle.52virtio-user (virtual device)virtio-user(virtual device) +Rectangle.53vhost-user adapterRectangle.38Sheet.57NICNIC +Rectangle.59virtio (PCI device)virtio(PCI device) +Rectangle.60vSwitch or vRoutervSwitchorvRouter +Sheet.61DPDKDPDK +Rectangle.62Sheet.63Contanier/AppContainer/App +Rectangle.64Rectangle.65Sheet.66virtiovirtio +Sheet.67vhostvhost +Dynamic connectorDynamic connector.70Dynamic connector.72Rectangle.71unix socket fileunix socket file +vhost-user +adapter + \ No newline at end of file diff --git a/src/seastar/dpdk/doc/guides/howto/index.rst b/src/seastar/dpdk/doc/guides/howto/index.rst new file mode 100644 index 00000000..a483444d --- /dev/null +++ b/src/seastar/dpdk/doc/guides/howto/index.rst @@ -0,0 +1,44 @@ +.. BSD LICENSE + Copyright(c) 2016 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +HowTo Guides +============ + +.. toctree:: + :maxdepth: 2 + :numbered: + + lm_bond_virtio_sriov + lm_virtio_vhost_user + flow_bifurcation + pvp_reference_benchmark + vfd + virtio_user_for_container_networking + virtio_user_as_exceptional_path diff --git a/src/seastar/dpdk/doc/guides/howto/lm_bond_virtio_sriov.rst b/src/seastar/dpdk/doc/guides/howto/lm_bond_virtio_sriov.rst new file mode 100644 index 00000000..40dd2cb2 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/howto/lm_bond_virtio_sriov.rst @@ -0,0 +1,713 @@ +.. BSD LICENSE + Copyright(c) 2016 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Live Migration of VM with SR-IOV VF +=================================== + +Overview +-------- + +It is not possible to migrate a Virtual Machine which has an SR-IOV Virtual Function (VF). + +To get around this problem the bonding PMD is used. + +The following sections show an example of how to do this. + +Test Setup +---------- + +A bonded device is created in the VM. +The virtio and VF PMD's are added as slaves to the bonded device. +The VF is set as the primary slave of the bonded device. + +A bridge must be set up on the Host connecting the tap device, which is the +backend of the Virtio device and the Physical Function (PF) device. + +To test the Live Migration two servers with identical operating systems installed are used. +KVM and Qemu 2.3 is also required on the servers. + +In this example, the servers have Niantic and or Fortville NIC's installed. +The NIC's on both servers are connected to a switch +which is also connected to the traffic generator. + +The switch is configured to broadcast traffic on all the NIC ports. +A :ref:`Sample switch configuration ` +can be found in this section. + +The host is running the Kernel PF driver (ixgbe or i40e). + +The ip address of host_server_1 is 10.237.212.46 + +The ip address of host_server_2 is 10.237.212.131 + +.. _figure_lm_bond_virtio_sriov: + +.. figure:: img/lm_bond_virtio_sriov.* + +Live Migration steps +-------------------- + +The sample scripts mentioned in the steps below can be found in the +:ref:`Sample host scripts ` and +:ref:`Sample VM scripts ` sections. + +On host_server_1: Terminal 1 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: console + + cd /root/dpdk/host_scripts + ./setup_vf_on_212_46.sh + +For Fortville NIC + +.. code-block:: console + + ./vm_virtio_vf_i40e_212_46.sh + +For Niantic NIC + +.. code-block:: console + + ./vm_virtio_vf_one_212_46.sh + +On host_server_1: Terminal 2 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: console + + cd /root/dpdk/host_scripts + ./setup_bridge_on_212_46.sh + ./connect_to_qemu_mon_on_host.sh + (qemu) + +On host_server_1: Terminal 1 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**In VM on host_server_1:** + +.. code-block:: console + + cd /root/dpdk/vm_scripts + ./setup_dpdk_in_vm.sh + ./run_testpmd_bonding_in_vm.sh + + testpmd> show port info all + +The ``mac_addr`` command only works with kernel PF for Niantic + +.. code-block:: console + + testpmd> mac_addr add port 1 vf 0 AA:BB:CC:DD:EE:FF + +The syntax of the ``testpmd`` command is: + +Create bonded device (mode) (socket). + +Mode 1 is active backup. + +Virtio is port 0 (P0). + +VF is port 1 (P1). + +Bonding is port 2 (P2). + +.. code-block:: console + + testpmd> create bonded device 1 0 + Created new bonded device net_bond_testpmd_0 on (port 2). + testpmd> add bonding slave 0 2 + testpmd> add bonding slave 1 2 + testpmd> show bonding config 2 + +The syntax of the ``testpmd`` command is: + +set bonding primary (slave id) (port id) + +Set primary to P1 before starting bonding port. + +.. code-block:: console + + testpmd> set bonding primary 1 2 + testpmd> show bonding config 2 + testpmd> port start 2 + Port 2: 02:09:C0:68:99:A5 + Checking link statuses... + Port 0 Link Up - speed 10000 Mbps - full-duplex + Port 1 Link Up - speed 10000 Mbps - full-duplex + Port 2 Link Up - speed 10000 Mbps - full-duplex + + testpmd> show bonding config 2 + +Primary is now P1. There are 2 active slaves. + +Use P2 only for forwarding. + +.. code-block:: console + + testpmd> set portlist 2 + testpmd> show config fwd + testpmd> set fwd mac + testpmd> start + testpmd> show bonding config 2 + +Primary is now P1. There are 2 active slaves. + +.. code-block:: console + + testpmd> show port stats all + +VF traffic is seen at P1 and P2. + +.. code-block:: console + + testpmd> clear port stats all + testpmd> set bonding primary 0 2 + testpmd> remove bonding slave 1 2 + testpmd> show bonding config 2 + +Primary is now P0. There is 1 active slave. + +.. code-block:: console + + testpmd> clear port stats all + testpmd> show port stats all + +No VF traffic is seen at P0 and P2, VF MAC address still present. + +.. code-block:: console + + testpmd> port stop 1 + testpmd> port close 1 + +Port close should remove VF MAC address, it does not remove perm_addr. + +The ``mac_addr`` command only works with the kernel PF for Niantic. + +.. code-block:: console + + testpmd> mac_addr remove 1 AA:BB:CC:DD:EE:FF + testpmd> port detach 1 + Port '0000:00:04.0' is detached. Now total ports is 2 + testpmd> show port stats all + +No VF traffic is seen at P0 and P2. + +On host_server_1: Terminal 2 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: console + + (qemu) device_del vf1 + + +On host_server_1: Terminal 1 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**In VM on host_server_1:** + +.. code-block:: console + + testpmd> show bonding config 2 + +Primary is now P0. There is 1 active slave. + +.. code-block:: console + + testpmd> show port info all + testpmd> show port stats all + +On host_server_2: Terminal 1 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: console + + cd /root/dpdk/host_scripts + ./setup_vf_on_212_131.sh + ./vm_virtio_one_migrate.sh + +On host_server_2: Terminal 2 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: console + + ./setup_bridge_on_212_131.sh + ./connect_to_qemu_mon_on_host.sh + (qemu) info status + VM status: paused (inmigrate) + (qemu) + +On host_server_1: Terminal 2 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Check that the switch is up before migrating. + +.. code-block:: console + + (qemu) migrate tcp:10.237.212.131:5555 + (qemu) info status + VM status: paused (postmigrate) + +For the Niantic NIC. + +.. code-block:: console + + (qemu) info migrate + capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off + Migration status: completed + total time: 11834 milliseconds + downtime: 18 milliseconds + setup: 3 milliseconds + transferred ram: 389137 kbytes + throughput: 269.49 mbps + remaining ram: 0 kbytes + total ram: 1590088 kbytes + duplicate: 301620 pages + skipped: 0 pages + normal: 96433 pages + normal bytes: 385732 kbytes + dirty sync count: 2 + (qemu) quit + +For the Fortville NIC. + +.. code-block:: console + + (qemu) info migrate + capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off + Migration status: completed + total time: 11619 milliseconds + downtime: 5 milliseconds + setup: 7 milliseconds + transferred ram: 379699 kbytes + throughput: 267.82 mbps + remaining ram: 0 kbytes + total ram: 1590088 kbytes + duplicate: 303985 pages + skipped: 0 pages + normal: 94073 pages + normal bytes: 376292 kbytes + dirty sync count: 2 + (qemu) quit + +On host_server_2: Terminal 1 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**In VM on host_server_2:** + + Hit Enter key. This brings the user to the testpmd prompt. + +.. code-block:: console + + testpmd> + +On host_server_2: Terminal 2 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: console + + (qemu) info status + VM status: running + +For the Niantic NIC. + +.. code-block:: console + + (qemu) device_add pci-assign,host=06:10.0,id=vf1 + +For the Fortville NIC. + +.. code-block:: console + + (qemu) device_add pci-assign,host=03:02.0,id=vf1 + +On host_server_2: Terminal 1 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**In VM on host_server_2:** + +.. code-block:: console + + testomd> show port info all + testpmd> show port stats all + testpmd> show bonding config 2 + testpmd> port attach 0000:00:04.0 + Port 1 is attached. + Now total ports is 3 + Done + + testpmd> port start 1 + +The ``mac_addr`` command only works with the Kernel PF for Niantic. + +.. code-block:: console + + testpmd> mac_addr add port 1 vf 0 AA:BB:CC:DD:EE:FF + testpmd> show port stats all. + testpmd> show config fwd + testpmd> show bonding config 2 + testpmd> add bonding slave 1 2 + testpmd> set bonding primary 1 2 + testpmd> show bonding config 2 + testpmd> show port stats all + +VF traffic is seen at P1 (VF) and P2 (Bonded device). + +.. code-block:: console + + testpmd> remove bonding slave 0 2 + testpmd> show bonding config 2 + testpmd> port stop 0 + testpmd> port close 0 + testpmd> port detach 0 + Port '0000:00:03.0' is detached. Now total ports is 2 + + testpmd> show port info all + testpmd> show config fwd + testpmd> show port stats all + +VF traffic is seen at P1 (VF) and P2 (Bonded device). + +.. _lm_bond_virtio_sriov_host_scripts: + +Sample host scripts +------------------- + +setup_vf_on_212_46.sh +~~~~~~~~~~~~~~~~~~~~~ +Set up Virtual Functions on host_server_1 + +.. code-block:: sh + + #!/bin/sh + # This script is run on the host 10.237.212.46 to setup the VF + + # set up Niantic VF + cat /sys/bus/pci/devices/0000\:09\:00.0/sriov_numvfs + echo 1 > /sys/bus/pci/devices/0000\:09\:00.0/sriov_numvfs + cat /sys/bus/pci/devices/0000\:09\:00.0/sriov_numvfs + rmmod ixgbevf + + # set up Fortville VF + cat /sys/bus/pci/devices/0000\:02\:00.0/sriov_numvfs + echo 1 > /sys/bus/pci/devices/0000\:02\:00.0/sriov_numvfs + cat /sys/bus/pci/devices/0000\:02\:00.0/sriov_numvfs + rmmod i40evf + +vm_virtio_vf_one_212_46.sh +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Setup Virtual Machine on host_server_1 + +.. code-block:: sh + + #!/bin/sh + + # Path to KVM tool + KVM_PATH="/usr/bin/qemu-system-x86_64" + + # Guest Disk image + DISK_IMG="/home/username/disk_image/virt1_sml.disk" + + # Number of guest cpus + VCPUS_NR="4" + + # Memory + MEM=1536 + + taskset -c 1-5 $KVM_PATH \ + -enable-kvm \ + -m $MEM \ + -smp $VCPUS_NR \ + -cpu host \ + -name VM1 \ + -no-reboot \ + -net none \ + -vnc none -nographic \ + -hda $DISK_IMG \ + -netdev type=tap,id=net1,script=no,downscript=no,ifname=tap1 \ + -device virtio-net-pci,netdev=net1,mac=CC:BB:BB:BB:BB:BB \ + -device pci-assign,host=09:10.0,id=vf1 \ + -monitor telnet::3333,server,nowait + +setup_bridge_on_212_46.sh +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Setup bridge on host_server_1 + +.. code-block:: sh + + #!/bin/sh + # This script is run on the host 10.237.212.46 to setup the bridge + # for the Tap device and the PF device. + # This enables traffic to go from the PF to the Tap to the Virtio PMD in the VM. + + # ens3f0 is the Niantic NIC + # ens6f0 is the Fortville NIC + + ifconfig ens3f0 down + ifconfig tap1 down + ifconfig ens6f0 down + ifconfig virbr0 down + + brctl show virbr0 + brctl addif virbr0 ens3f0 + brctl addif virbr0 ens6f0 + brctl addif virbr0 tap1 + brctl show virbr0 + + ifconfig ens3f0 up + ifconfig tap1 up + ifconfig ens6f0 up + ifconfig virbr0 up + +connect_to_qemu_mon_on_host.sh +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: sh + + #!/bin/sh + # This script is run on both hosts when the VM is up, + # to connect to the Qemu Monitor. + + telnet 0 3333 + +setup_vf_on_212_131.sh +~~~~~~~~~~~~~~~~~~~~~~ + +Set up Virtual Functions on host_server_2 + +.. code-block:: sh + + #!/bin/sh + # This script is run on the host 10.237.212.131 to setup the VF + + # set up Niantic VF + cat /sys/bus/pci/devices/0000\:06\:00.0/sriov_numvfs + echo 1 > /sys/bus/pci/devices/0000\:06\:00.0/sriov_numvfs + cat /sys/bus/pci/devices/0000\:06\:00.0/sriov_numvfs + rmmod ixgbevf + + # set up Fortville VF + cat /sys/bus/pci/devices/0000\:03\:00.0/sriov_numvfs + echo 1 > /sys/bus/pci/devices/0000\:03\:00.0/sriov_numvfs + cat /sys/bus/pci/devices/0000\:03\:00.0/sriov_numvfs + rmmod i40evf + +vm_virtio_one_migrate.sh +~~~~~~~~~~~~~~~~~~~~~~~~ + +Setup Virtual Machine on host_server_2 + +.. code-block:: sh + + #!/bin/sh + # Start the VM on host_server_2 with the same parameters except without the VF + # parameters, as the VM on host_server_1, in migration-listen mode + # (-incoming tcp:0:5555) + + # Path to KVM tool + KVM_PATH="/usr/bin/qemu-system-x86_64" + + # Guest Disk image + DISK_IMG="/home/username/disk_image/virt1_sml.disk" + + # Number of guest cpus + VCPUS_NR="4" + + # Memory + MEM=1536 + + taskset -c 1-5 $KVM_PATH \ + -enable-kvm \ + -m $MEM \ + -smp $VCPUS_NR \ + -cpu host \ + -name VM1 \ + -no-reboot \ + -net none \ + -vnc none -nographic \ + -hda $DISK_IMG \ + -netdev type=tap,id=net1,script=no,downscript=no,ifname=tap1 \ + -device virtio-net-pci,netdev=net1,mac=CC:BB:BB:BB:BB:BB \ + -incoming tcp:0:5555 \ + -monitor telnet::3333,server,nowait + +setup_bridge_on_212_131.sh +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Setup bridge on host_server_2 + +.. code-block:: sh + + #!/bin/sh + # This script is run on the host to setup the bridge + # for the Tap device and the PF device. + # This enables traffic to go from the PF to the Tap to the Virtio PMD in the VM. + + # ens4f0 is the Niantic NIC + # ens5f0 is the Fortville NIC + + ifconfig ens4f0 down + ifconfig tap1 down + ifconfig ens5f0 down + ifconfig virbr0 down + + brctl show virbr0 + brctl addif virbr0 ens4f0 + brctl addif virbr0 ens5f0 + brctl addif virbr0 tap1 + brctl show virbr0 + + ifconfig ens4f0 up + ifconfig tap1 up + ifconfig ens5f0 up + ifconfig virbr0 up + +.. _lm_bond_virtio_sriov_vm_scripts: + +Sample VM scripts +----------------- + +setup_dpdk_in_vm.sh +~~~~~~~~~~~~~~~~~~~ + +Set up DPDK in the Virtual Machine + +.. code-block:: sh + + #!/bin/sh + # this script matches the vm_virtio_vf_one script + # virtio port is 03 + # vf port is 04 + + cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages + echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages + cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages + + ifconfig -a + /root/dpdk/usertools/dpdk-devbind.py --status + + rmmod virtio-pci ixgbevf + + modprobe uio + insmod /root/dpdk/x86_64-default-linuxapp-gcc/kmod/igb_uio.ko + + /root/dpdk/usertools/dpdk-devbind.py -b igb_uio 0000:00:03.0 + /root/dpdk/usertools/dpdk-devbind.py -b igb_uio 0000:00:04.0 + + /root/dpdk/usertools/dpdk-devbind.py --status + +run_testpmd_bonding_in_vm.sh +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Run testpmd in the Virtual Machine. + +.. code-block:: sh + + #!/bin/sh + # Run testpmd in the VM + + # The test system has 8 cpus (0-7), use cpus 2-7 for VM + # Use taskset -pc + + # use for bonding of virtio and vf tests in VM + + /root/dpdk/x86_64-default-linuxapp-gcc/app/testpmd \ + -l 0-3 -n 4 --socket-mem 350 -- --i --port-topology=chained + +.. _lm_bond_virtio_sriov_switch_conf: + +Sample switch configuration +--------------------------- + +The Intel switch is used to connect the traffic generator to the +NIC's on host_server_1 and host_server_2. + +In order to run the switch configuration two console windows are required. + +Log in as root in both windows. + +TestPointShared, run_switch.sh and load /root/switch_config must be executed +in the sequence below. + +On Switch: Terminal 1 +~~~~~~~~~~~~~~~~~~~~~ + +run TestPointShared + +.. code-block:: console + + /usr/bin/TestPointShared + +On Switch: Terminal 2 +~~~~~~~~~~~~~~~~~~~~~ + +execute run_switch.sh + +.. code-block:: console + + /root/run_switch.sh + +On Switch: Terminal 1 +~~~~~~~~~~~~~~~~~~~~~ + +load switch configuration + +.. code-block:: console + + load /root/switch_config + +Sample switch configuration script +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The ``/root/switch_config`` script: + +.. code-block:: sh + + # TestPoint History + show port 1,5,9,13,17,21,25 + set port 1,5,9,13,17,21,25 up + show port 1,5,9,13,17,21,25 + del acl 1 + create acl 1 + create acl-port-set + create acl-port-set + add port port-set 1 0 + add port port-set 5,9,13,17,21,25 1 + create acl-rule 1 1 + add acl-rule condition 1 1 port-set 1 + add acl-rule action 1 1 redirect 1 + apply acl + create vlan 1000 + add vlan port 1000 1,5,9,13,17,21,25 + set vlan tagging 1000 1,5,9,13,17,21,25 tag + set switch config flood_ucast fwd + show port stats all 1,5,9,13,17,21,25 diff --git a/src/seastar/dpdk/doc/guides/howto/lm_virtio_vhost_user.rst b/src/seastar/dpdk/doc/guides/howto/lm_virtio_vhost_user.rst new file mode 100644 index 00000000..9402ed8d --- /dev/null +++ b/src/seastar/dpdk/doc/guides/howto/lm_virtio_vhost_user.rst @@ -0,0 +1,469 @@ +.. BSD LICENSE + Copyright(c) 2016 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + + +Live Migration of VM with Virtio on host running vhost_user +=========================================================== + +Overview +-------- + +Live Migration of a VM with DPDK Virtio PMD on a host which is +running the Vhost sample application (vhost-switch) and using the DPDK PMD (ixgbe or i40e). + +The Vhost sample application uses VMDQ so SRIOV must be disabled on the NIC's. + +The following sections show an example of how to do this migration. + +Test Setup +---------- + +To test the Live Migration two servers with identical operating systems installed are used. +KVM and QEMU is also required on the servers. + +QEMU 2.5 is required for Live Migration of a VM with vhost_user running on the hosts. + +In this example, the servers have Niantic and or Fortville NIC's installed. +The NIC's on both servers are connected to a switch +which is also connected to the traffic generator. + +The switch is configured to broadcast traffic on all the NIC ports. + +The ip address of host_server_1 is 10.237.212.46 + +The ip address of host_server_2 is 10.237.212.131 + +.. _figure_lm_vhost_user: + +.. figure:: img/lm_vhost_user.* + +Live Migration steps +-------------------- + +The sample scripts mentioned in the steps below can be found in the +:ref:`Sample host scripts ` and +:ref:`Sample VM scripts ` sections. + +On host_server_1: Terminal 1 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Setup DPDK on host_server_1 + +.. code-block:: console + + cd /root/dpdk/host_scripts + ./setup_dpdk_on_host.sh + +On host_server_1: Terminal 2 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Bind the Niantic or Fortville NIC to igb_uio on host_server_1. + +For Fortville NIC. + +.. code-block:: console + + cd /root/dpdk/usertools + ./dpdk-devbind.py -b igb_uio 0000:02:00.0 + +For Niantic NIC. + +.. code-block:: console + + cd /root/dpdk/usertools + ./dpdk-devbind.py -b igb_uio 0000:09:00.0 + +On host_server_1: Terminal 3 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For Fortville and Niantic NIC's reset SRIOV and run the +vhost_user sample application (vhost-switch) on host_server_1. + +.. code-block:: console + + cd /root/dpdk/host_scripts + ./reset_vf_on_212_46.sh + ./run_vhost_switch_on_host.sh + +On host_server_1: Terminal 1 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Start the VM on host_server_1 + +.. code-block:: console + + ./vm_virtio_vhost_user.sh + +On host_server_1: Terminal 4 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Connect to the QEMU monitor on host_server_1. + +.. code-block:: console + + cd /root/dpdk/host_scripts + ./connect_to_qemu_mon_on_host.sh + (qemu) + +On host_server_1: Terminal 1 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**In VM on host_server_1:** + +Setup DPDK in the VM and run testpmd in the VM. + +.. code-block:: console + + cd /root/dpdk/vm_scripts + ./setup_dpdk_in_vm.sh + ./run_testpmd_in_vm.sh + + testpmd> show port info all + testpmd> set fwd mac retry + testpmd> start tx_first + testpmd> show port stats all + +Virtio traffic is seen at P1 and P2. + +On host_server_2: Terminal 1 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Set up DPDK on the host_server_2. + +.. code-block:: console + + cd /root/dpdk/host_scripts + ./setup_dpdk_on_host.sh + +On host_server_2: Terminal 2 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Bind the Niantic or Fortville NIC to igb_uio on host_server_2. + +For Fortville NIC. + +.. code-block:: console + + cd /root/dpdk/usertools + ./dpdk-devbind.py -b igb_uio 0000:03:00.0 + +For Niantic NIC. + +.. code-block:: console + + cd /root/dpdk/usertools + ./dpdk-devbind.py -b igb_uio 0000:06:00.0 + +On host_server_2: Terminal 3 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For Fortville and Niantic NIC's reset SRIOV, and run +the vhost_user sample application on host_server_2. + +.. code-block:: console + + cd /root/dpdk/host_scripts + ./reset_vf_on_212_131.sh + ./run_vhost_switch_on_host.sh + +On host_server_2: Terminal 1 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Start the VM on host_server_2. + +.. code-block:: console + + ./vm_virtio_vhost_user_migrate.sh + +On host_server_2: Terminal 4 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Connect to the QEMU monitor on host_server_2. + +.. code-block:: console + + cd /root/dpdk/host_scripts + ./connect_to_qemu_mon_on_host.sh + (qemu) info status + VM status: paused (inmigrate) + (qemu) + +On host_server_1: Terminal 4 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Check that switch is up before migrating the VM. + +.. code-block:: console + + (qemu) migrate tcp:10.237.212.131:5555 + (qemu) info status + VM status: paused (postmigrate) + + (qemu) info migrate + capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off + Migration status: completed + total time: 11619 milliseconds + downtime: 5 milliseconds + setup: 7 milliseconds + transferred ram: 379699 kbytes + throughput: 267.82 mbps + remaining ram: 0 kbytes + total ram: 1590088 kbytes + duplicate: 303985 pages + skipped: 0 pages + normal: 94073 pages + normal bytes: 376292 kbytes + dirty sync count: 2 + (qemu) quit + +On host_server_2: Terminal 1 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**In VM on host_server_2:** + + Hit Enter key. This brings the user to the testpmd prompt. + +.. code-block:: console + + testpmd> + +On host_server_2: Terminal 4 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**In QEMU monitor on host_server_2** + +.. code-block:: console + + (qemu) info status + VM status: running + +On host_server_2: Terminal 1 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**In VM on host_server_2:** + +.. code-block:: console + + testomd> show port info all + testpmd> show port stats all + +Virtio traffic is seen at P0 and P1. + + +.. _lm_virtio_vhost_user_host_scripts: + +Sample host scripts +------------------- + +reset_vf_on_212_46.sh +~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: sh + + #!/bin/sh + # This script is run on the host 10.237.212.46 to reset SRIOV + + # BDF for Fortville NIC is 0000:02:00.0 + cat /sys/bus/pci/devices/0000\:02\:00.0/max_vfs + echo 0 > /sys/bus/pci/devices/0000\:02\:00.0/max_vfs + cat /sys/bus/pci/devices/0000\:02\:00.0/max_vfs + + # BDF for Niantic NIC is 0000:09:00.0 + cat /sys/bus/pci/devices/0000\:09\:00.0/max_vfs + echo 0 > /sys/bus/pci/devices/0000\:09\:00.0/max_vfs + cat /sys/bus/pci/devices/0000\:09\:00.0/max_vfs + +vm_virtio_vhost_user.sh +~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: sh + + #/bin/sh + # Script for use with vhost_user sample application + # The host system has 8 cpu's (0-7) + + # Path to KVM tool + KVM_PATH="/usr/bin/qemu-system-x86_64" + + # Guest Disk image + DISK_IMG="/home/user/disk_image/virt1_sml.disk" + + # Number of guest cpus + VCPUS_NR="6" + + # Memory + MEM=1024 + + VIRTIO_OPTIONS="csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off" + + # Socket Path + SOCKET_PATH="/root/dpdk/host_scripts/usvhost" + + taskset -c 2-7 $KVM_PATH \ + -enable-kvm \ + -m $MEM \ + -smp $VCPUS_NR \ + -object memory-backend-file,id=mem,size=1024M,mem-path=/mnt/huge,share=on \ + -numa node,memdev=mem,nodeid=0 \ + -cpu host \ + -name VM1 \ + -no-reboot \ + -net none \ + -vnc none \ + -nographic \ + -hda $DISK_IMG \ + -chardev socket,id=chr0,path=$SOCKET_PATH \ + -netdev type=vhost-user,id=net1,chardev=chr0,vhostforce \ + -device virtio-net-pci,netdev=net1,mac=CC:BB:BB:BB:BB:BB,$VIRTIO_OPTIONS \ + -chardev socket,id=chr1,path=$SOCKET_PATH \ + -netdev type=vhost-user,id=net2,chardev=chr1,vhostforce \ + -device virtio-net-pci,netdev=net2,mac=DD:BB:BB:BB:BB:BB,$VIRTIO_OPTIONS \ + -monitor telnet::3333,server,nowait + +connect_to_qemu_mon_on_host.sh +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: sh + + #!/bin/sh + # This script is run on both hosts when the VM is up, + # to connect to the Qemu Monitor. + + telnet 0 3333 + +reset_vf_on_212_131.sh +~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: sh + + #!/bin/sh + # This script is run on the host 10.237.212.131 to reset SRIOV + + # BDF for Ninatic NIC is 0000:06:00.0 + cat /sys/bus/pci/devices/0000\:06\:00.0/max_vfs + echo 0 > /sys/bus/pci/devices/0000\:06\:00.0/max_vfs + cat /sys/bus/pci/devices/0000\:06\:00.0/max_vfs + + # BDF for Fortville NIC is 0000:03:00.0 + cat /sys/bus/pci/devices/0000\:03\:00.0/max_vfs + echo 0 > /sys/bus/pci/devices/0000\:03\:00.0/max_vfs + cat /sys/bus/pci/devices/0000\:03\:00.0/max_vfs + +vm_virtio_vhost_user_migrate.sh +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: sh + + #/bin/sh + # Script for use with vhost user sample application + # The host system has 8 cpu's (0-7) + + # Path to KVM tool + KVM_PATH="/usr/bin/qemu-system-x86_64" + + # Guest Disk image + DISK_IMG="/home/user/disk_image/virt1_sml.disk" + + # Number of guest cpus + VCPUS_NR="6" + + # Memory + MEM=1024 + + VIRTIO_OPTIONS="csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off" + + # Socket Path + SOCKET_PATH="/root/dpdk/host_scripts/usvhost" + + taskset -c 2-7 $KVM_PATH \ + -enable-kvm \ + -m $MEM \ + -smp $VCPUS_NR \ + -object memory-backend-file,id=mem,size=1024M,mem-path=/mnt/huge,share=on \ + -numa node,memdev=mem,nodeid=0 \ + -cpu host \ + -name VM1 \ + -no-reboot \ + -net none \ + -vnc none \ + -nographic \ + -hda $DISK_IMG \ + -chardev socket,id=chr0,path=$SOCKET_PATH \ + -netdev type=vhost-user,id=net1,chardev=chr0,vhostforce \ + -device virtio-net-pci,netdev=net1,mac=CC:BB:BB:BB:BB:BB,$VIRTIO_OPTIONS \ + -chardev socket,id=chr1,path=$SOCKET_PATH \ + -netdev type=vhost-user,id=net2,chardev=chr1,vhostforce \ + -device virtio-net-pci,netdev=net2,mac=DD:BB:BB:BB:BB:BB,$VIRTIO_OPTIONS \ + -incoming tcp:0:5555 \ + -monitor telnet::3333,server,nowait + +.. _lm_virtio_vhost_user_vm_scripts: + +Sample VM scripts +----------------- + +setup_dpdk_virtio_in_vm.sh +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: sh + + #!/bin/sh + # this script matches the vm_virtio_vhost_user script + # virtio port is 03 + # virtio port is 04 + + cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages + echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages + cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages + + ifconfig -a + /root/dpdk/usertools/dpdk-devbind.py --status + + rmmod virtio-pci + + modprobe uio + insmod /root/dpdk/x86_64-default-linuxapp-gcc/kmod/igb_uio.ko + + /root/dpdk/usertools/dpdk-devbind.py -b igb_uio 0000:00:03.0 + /root/dpdk/usertools/dpdk-devbind.py -b igb_uio 0000:00:04.0 + + /root/dpdk/usertools/dpdk-devbind.py --status + +run_testpmd_in_vm.sh +~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: sh + + #!/bin/sh + # Run testpmd for use with vhost_user sample app. + # test system has 8 cpus (0-7), use cpus 2-7 for VM + + /root/dpdk/x86_64-default-linuxapp-gcc/app/testpmd \ + -l 0-5 -n 4 --socket-mem 350 -- --burst=64 --i --disable-hw-vlan-filter diff --git a/src/seastar/dpdk/doc/guides/howto/pvp_reference_benchmark.rst b/src/seastar/dpdk/doc/guides/howto/pvp_reference_benchmark.rst new file mode 100644 index 00000000..228b4a25 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/howto/pvp_reference_benchmark.rst @@ -0,0 +1,398 @@ +.. BSD LICENSE + Copyright(c) 2016 Red Hat, Inc. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + + +PVP reference benchmark setup using testpmd +=========================================== + +This guide lists the steps required to setup a PVP benchmark using testpmd as +a simple forwarder between NICs and Vhost interfaces. The goal of this setup +is to have a reference PVP benchmark without using external vSwitches (OVS, +VPP, ...) to make it easier to obtain reproducible results and to facilitate +continuous integration testing. + +The guide covers two ways of launching the VM, either by directly calling the +QEMU command line, or by relying on libvirt. It has been tested with DPDK +v16.11 using RHEL7 for both host and guest. + + +Setup overview +-------------- + +.. _figure_pvp_2nics: + +.. figure:: img/pvp_2nics.* + + PVP setup using 2 NICs + +In this diagram, each red arrow represents one logical core. This use-case +requires 6 dedicated logical cores. A forwarding configuration with a single +NIC is also possible, requiring 3 logical cores. + + +Host setup +---------- + +In this setup, we isolate 6 cores (from CPU2 to CPU7) on the same NUMA +node. Two cores are assigned to the VM vCPUs running testpmd and four are +assigned to testpmd on the host. + + +Host tuning +~~~~~~~~~~~ + +#. On BIOS, disable turbo-boost and hyper-threads. + +#. Append these options to Kernel command line: + + .. code-block:: console + + intel_pstate=disable mce=ignore_ce default_hugepagesz=1G hugepagesz=1G hugepages=6 isolcpus=2-7 rcu_nocbs=2-7 nohz_full=2-7 iommu=pt intel_iommu=on + +#. Disable hyper-threads at runtime if necessary or if BIOS is not accessible: + + .. code-block:: console + + cat /sys/devices/system/cpu/cpu*[0-9]/topology/thread_siblings_list \ + | sort | uniq \ + | awk -F, '{system("echo 0 > /sys/devices/system/cpu/cpu"$2"/online")}' + +#. Disable NMIs: + + .. code-block:: console + + echo 0 > /proc/sys/kernel/nmi_watchdog + +#. Exclude isolated CPUs from the writeback cpumask: + + .. code-block:: console + + echo ffffff03 > /sys/bus/workqueue/devices/writeback/cpumask + +#. Isolate CPUs from IRQs: + + .. code-block:: console + + clear_mask=0xfc #Isolate CPU2 to CPU7 from IRQs + for i in /proc/irq/*/smp_affinity + do + echo "obase=16;$(( 0x$(cat $i) & ~$clear_mask ))" | bc > $i + done + + +Qemu build +~~~~~~~~~~ + +Build Qemu: + + .. code-block:: console + + git clone git://git.qemu.org/qemu.git + cd qemu + mkdir bin + cd bin + ../configure --target-list=x86_64-softmmu + make + + +DPDK build +~~~~~~~~~~ + +Build DPDK: + + .. code-block:: console + + git clone git://dpdk.org/dpdk + cd dpdk + export RTE_SDK=$PWD + make install T=x86_64-native-linuxapp-gcc DESTDIR=install + + +Testpmd launch +~~~~~~~~~~~~~~ + +#. Assign NICs to DPDK: + + .. code-block:: console + + modprobe vfio-pci + $RTE_SDK/install/sbin/dpdk-devbind -b vfio-pci 0000:11:00.0 0000:11:00.1 + + .. Note:: + + The Sandy Bridge family seems to have some IOMMU limitations giving poor + performance results. To achieve good performance on these machines + consider using UIO instead. + +#. Launch the testpmd application: + + .. code-block:: console + + $RTE_SDK/install/bin/testpmd -l 0,2,3,4,5 --socket-mem=1024 -n 4 \ + --vdev 'net_vhost0,iface=/tmp/vhost-user1' \ + --vdev 'net_vhost1,iface=/tmp/vhost-user2' -- \ + --portmask=f --disable-hw-vlan -i --rxq=1 --txq=1 \ + --nb-cores=4 --forward-mode=io + + With this command, isolated CPUs 2 to 5 will be used as lcores for PMD threads. + +#. In testpmd interactive mode, set the portlist to obtain the correct port + chaining: + + .. code-block:: console + + set portlist 0,2,1,3 + start + + +VM launch +~~~~~~~~~ + +The VM may be launched either by calling QEMU directly, or by using libvirt. + +Qemu way +^^^^^^^^ + +Launch QEMU with two Virtio-net devices paired to the vhost-user sockets +created by testpmd. Below example uses default Virtio-net options, but options +may be specified, for example to disable mergeable buffers or indirect +descriptors. + + .. code-block:: console + + /bin/x86_64-softmmu/qemu-system-x86_64 \ + -enable-kvm -cpu host -m 3072 -smp 3 \ + -chardev socket,id=char0,path=/tmp/vhost-user1 \ + -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \ + -device virtio-net-pci,netdev=mynet1,mac=52:54:00:02:d9:01,addr=0x10 \ + -chardev socket,id=char1,path=/tmp/vhost-user2 \ + -netdev type=vhost-user,id=mynet2,chardev=char1,vhostforce \ + -device virtio-net-pci,netdev=mynet2,mac=52:54:00:02:d9:02,addr=0x11 \ + -object memory-backend-file,id=mem,size=3072M,mem-path=/dev/hugepages,share=on \ + -numa node,memdev=mem -mem-prealloc \ + -net user,hostfwd=tcp::1002$1-:22 -net nic \ + -qmp unix:/tmp/qmp.socket,server,nowait \ + -monitor stdio .qcow2 + +You can use this `qmp-vcpu-pin `_ +script to pin vCPUs. + +It can be used as follows, for example to pin 3 vCPUs to CPUs 1, 6 and 7, +where isolated CPUs 6 and 7 will be used as lcores for Virtio PMDs: + + .. code-block:: console + + export PYTHONPATH=$PYTHONPATH:/scripts/qmp + ./qmp-vcpu-pin -s /tmp/qmp.socket 1 6 7 + +Libvirt way +^^^^^^^^^^^ + +Some initial steps are required for libvirt to be able to connect to testpmd's +sockets. + +First, SELinux policy needs to be set to permissive, since testpmd is +generally run as root (note, as reboot is required): + + .. code-block:: console + + cat /etc/selinux/config + + # This file controls the state of SELinux on the system. + # SELINUX= can take one of these three values: + # enforcing - SELinux security policy is enforced. + # permissive - SELinux prints warnings instead of enforcing. + # disabled - No SELinux policy is loaded. + SELINUX=permissive + + # SELINUXTYPE= can take one of three two values: + # targeted - Targeted processes are protected, + # minimum - Modification of targeted policy. + # Only selected processes are protected. + # mls - Multi Level Security protection. + SELINUXTYPE=targeted + + +Also, Qemu needs to be run as root, which has to be specified in +``/etc/libvirt/qemu.conf``: + + .. code-block:: console + + user = "root" + +Once the domain created, the following snippet is an extract of he most +important information (hugepages, vCPU pinning, Virtio PCI devices): + + .. code-block:: xml + + + 3145728 + 3145728 + + + + + + + 3 + + + + + + + + + + + hvm + + + + + + + + + + + + + + +
+ + + + + + +
+ + + + + +Guest setup +----------- + + +Guest tuning +~~~~~~~~~~~~ + +#. Append these options to the Kernel command line: + + .. code-block:: console + + default_hugepagesz=1G hugepagesz=1G hugepages=1 intel_iommu=on iommu=pt isolcpus=1,2 rcu_nocbs=1,2 nohz_full=1,2 + +#. Disable NMIs: + + .. code-block:: console + + echo 0 > /proc/sys/kernel/nmi_watchdog + +#. Exclude isolated CPU1 and CPU2 from the writeback cpumask: + + .. code-block:: console + + echo 1 > /sys/bus/workqueue/devices/writeback/cpumask + +#. Isolate CPUs from IRQs: + + .. code-block:: console + + clear_mask=0x6 #Isolate CPU1 and CPU2 from IRQs + for i in /proc/irq/*/smp_affinity + do + echo "obase=16;$(( 0x$(cat $i) & ~$clear_mask ))" | bc > $i + done + + +DPDK build +~~~~~~~~~~ + +Build DPDK: + + .. code-block:: console + + git clone git://dpdk.org/dpdk + cd dpdk + export RTE_SDK=$PWD + make install T=x86_64-native-linuxapp-gcc DESTDIR=install + + +Testpmd launch +~~~~~~~~~~~~~~ + +Probe vfio module without iommu: + + .. code-block:: console + + modprobe -r vfio_iommu_type1 + modprobe -r vfio + modprobe vfio enable_unsafe_noiommu_mode=1 + cat /sys/module/vfio/parameters/enable_unsafe_noiommu_mode + modprobe vfio-pci + +Bind the virtio-net devices to DPDK: + + .. code-block:: console + + $RTE_SDK/usertools/dpdk-devbind.py -b vfio-pci 0000:00:10.0 0000:00:11.0 + +Start testpmd: + + .. code-block:: console + + $RTE_SDK/install/bin/testpmd -l 0,1,2 --socket-mem 1024 -n 4 \ + --proc-type auto --file-prefix pg -- \ + --portmask=3 --forward-mode=macswap --port-topology=chained \ + --disable-hw-vlan --disable-rss -i --rxq=1 --txq=1 \ + --rxd=256 --txd=256 --nb-cores=2 --auto-start + +Results template +---------------- + +Below template should be used when sharing results: + + .. code-block:: none + + Traffic Generator: + Acceptable Loss: % + Validation run time: min + Host DPDK version/commit: + Guest DPDK version/commit: + Patches applied: + QEMU version/commit: + Virtio features: + CPU: , + NIC: + Result: Mpps diff --git a/src/seastar/dpdk/doc/guides/howto/vfd.rst b/src/seastar/dpdk/doc/guides/howto/vfd.rst new file mode 100644 index 00000000..6f083b87 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/howto/vfd.rst @@ -0,0 +1,407 @@ +.. BSD LICENSE + Copyright(c) 2017 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + + +VF daemon (VFd) +=============== + +VFd (the VF daemon) is a mechanism which can be used to configure features on +a VF (SR-IOV Virtual Function) without direct access to the PF (SR-IOV +Physical Function). VFd is an *EXPERIMENTAL* feature which can only be used in +the scenario of DPDK PF with a DPDK VF. If the PF port is driven by the Linux +kernel driver then the VFd feature will not work. Currently VFd is only +supported by the ixgbe and i40e drivers. + +In general VF features cannot be configured directly by an end user +application since they are under the control of the PF. The normal approach to +configuring a feature on a VF is that an application would call the APIs +provided by the VF driver. If the required feature cannot be configured by the +VF directly (the most common case) the VF sends a message to the PF through +the mailbox on ixgbe and i40e. This means that the availability of the feature +depends on whether the appropriate mailbox messages are defined. + +DPDK leverages the mailbox interface defined by the Linux kernel driver so +that compatibility with the kernel driver can be guaranteed. The downside of +this approach is that the availability of messages supported by the kernel +become a limitation when the user wants to configure features on the VF. + +VFd is a new method of controlling the features on a VF. The VF driver doesn't +talk directly to the PF driver when configuring a feature on the VF. When a VF +application (i.e., an application using the VF ports) wants to enable a VF +feature, it can send a message to the PF application (i.e., the application +using the PF port, which can be the same as the VF application). The PF +application will configure the feature for the VF. Obviously, the PF +application can also configure the VF features without a request from the VF +application. + +.. _VF_daemon_overview: + +.. figure:: img/vf_daemon_overview.* + + VF daemon (VFd) Overview + +Compared with the traditional approach the VFd moves the negotiation between +VF and PF from the driver level to application level. So the application +should define how the negotiation between the VF and PF works, or even if the +control should be limited to the PF. + +It is the application's responsibility to use VFd. Consider for example a KVM +migration, the VF application may transfer from one VM to another. It is +recommended in this case that the PF control the VF features without +participation from the VF. Then the VF application has no capability to +configure the features. So the user doesn't need to define the interface +between the VF application and the PF application. The service provider should +take the control of all the features. + +The following sections describe the VFd functionality. + +.. Note:: + + Although VFd is supported by both ixgbe and i40e, please be aware that + since the hardware capability is different, the functions supported by + ixgbe and i40e are not the same. + + +Preparing +--------- + +VFd only can be used in the scenario of DPDK PF + DPDK VF. Users should bind +the PF port to ``igb_uio``, then create the VFs based on the DPDK PF host. + +The typical procedure to achieve this is as follows: + +#. Boot the system without iommu, or with ``iommu=pt``. + +#. Bind the PF port to ``igb_uio``, for example:: + + dpdk-devbind.py -b igb_uio 01:00.0 + +#. Create a Virtual Function:: + + echo 1 > /sys/bus/pci/devices/0000:01:00.0/max_vfs + +#. Start a VM with the new VF port bypassed to it. + +#. Run a DPDK application on the PF in the host:: + + testpmd -l 0-7 -n 4 -- -i --txqflags=0 + +#. Bind the VF port to ``igb_uio`` in the VM:: + + dpdk-devbind.py -b igb_uio 03:00.0 + +#. Run a DPDK application on the VF in the VM:: + + testpmd -l 0-7 -n 4 -- -i --txqflags=0 + + +Common functions of IXGBE and I40E +---------------------------------- + +The following sections show how to enable PF/VF functionality based on the +above testpmd setup. + + +TX loopback +~~~~~~~~~~~ + +Run a testpmd runtime command on the PF to set TX loopback:: + + set tx loopback 0 on|off + +This sets whether the PF port and all the VF ports that belong to it are +allowed to send the packets to other virtual ports. + +Although it is a VFd function, it is the global setting for the whole +physical port. When using this function, the PF and all the VFs TX loopback +will be enabled/disabled. + + +VF MAC address setting +~~~~~~~~~~~~~~~~~~~~~~ + +Run a testpmd runtime command on the PF to set the MAC address for a VF port:: + + set vf mac addr 0 0 A0:36:9F:7B:C3:51 + +This testpmd runtime command will change the MAC address of the VF port to +this new address. If any other addresses are set before, they will be +overwritten. + + +VF MAC anti-spoofing +~~~~~~~~~~~~~~~~~~~~ + +Run a testpmd runtime command on the PF to enable/disable the MAC +anti-spoofing for a VF port:: + + set vf mac antispoof 0 0 on|off + +When enabling the MAC anti-spoofing, the port will not forward packets whose +source MAC address is not the same as the port. + + +VF VLAN anti-spoofing +~~~~~~~~~~~~~~~~~~~~~ + +Run a testpmd runtime command on the PF to enable/disable the VLAN +anti-spoofing for a VF port:: + + set vf vlan antispoof 0 0 on|off + +When enabling the VLAN anti-spoofing, the port will not send packets whose +VLAN ID does not belong to VLAN IDs that this port can receive. + + +VF VLAN insertion +~~~~~~~~~~~~~~~~~ + +Run a testpmd runtime command on the PF to set the VLAN insertion for a VF +port:: + + set vf vlan insert 0 0 1 + +When using this testpmd runtime command, an assigned VLAN ID can be inserted +to the transmitted packets by the hardware. + +The assigned VLAN ID can be 0. It means disabling the VLAN insertion. + + +VF VLAN stripping +~~~~~~~~~~~~~~~~~ + +Run a testpmd runtime command on the PF to enable/disable the VLAN stripping +for a VF port:: + + set vf vlan stripq 0 0 on|off + +This testpmd runtime command is used to enable/disable the RX VLAN stripping +for a specific VF port. + + +VF VLAN filtering +~~~~~~~~~~~~~~~~~ + +Run a testpmd runtime command on the PF to set the VLAN filtering for a VF +port:: + + rx_vlan add 1 port 0 vf 1 + rx_vlan rm 1 port 0 vf 1 + +These two testpmd runtime commands can be used to add or remove the VLAN +filter for several VF ports. When the VLAN filters are added only the packets +that have the assigned VLAN IDs can be received. Other packets will be dropped +by hardware. + + +The IXGBE specific VFd functions +-------------------------------- + +The functions in this section are specific to the ixgbe driver. + + +All queues drop +~~~~~~~~~~~~~~~ + +Run a testpmd runtime command on the PF to enable/disable the all queues +drop:: + + set all queues drop on|off + +This is a global setting for the PF and all the VF ports of the physical port. + +Enabling the ``all queues drop`` feature means that when there is no available +descriptor for the received packets they are dropped. The ``all queues drop`` +feature should be enabled in SR-IOV mode to avoid one queue blocking others. + + +VF packet drop +~~~~~~~~~~~~~~ + +Run a testpmd runtime command on the PF to enable/disable the packet drop for +a specific VF:: + + set vf split drop 0 0 on|off + +This is a similar function as ``all queues drop``. The difference is that this +function is per VF setting and the previous function is a global setting. + + +VF rate limit +~~~~~~~~~~~~~ + +Run a testpmd runtime command on the PF to all queues' rate limit for a +specific VF:: + + set port 0 vf 0 rate 10 queue_mask 1 + +This is a function to set the rate limit for all the queues in the +``queue_mask`` bitmap. It is not used to set the summary of the rate +limit. The rate limit of every queue will be set equally to the assigned rate +limit. + + +VF RX enabling +~~~~~~~~~~~~~~ + +Run a testpmd runtime command on the PF to enable/disable packet receiving for +a specific VF:: + + set port 0 vf 0 rx on|off + +This function can be used to stop/start packet receiving on a VF. + + +VF TX enabling +~~~~~~~~~~~~~~ + +Run a testpmd runtime command on the PF to enable/disable packet transmitting +for a specific VF:: + + set port 0 vf 0 tx on|off + +This function can be used to stop/start packet transmitting on a VF. + + +VF RX mode setting +~~~~~~~~~~~~~~~~~~ + +Run a testpmd runtime command on the PF to set the RX mode for a specific VF:: + + set port 0 vf 0 rxmode AUPE|ROPE|BAM|MPE on|off + +This function can be used to enable/disable some RX modes on the VF, including: + +* If it accept untagged packets. +* If it accepts packets matching the MAC filters. +* If it accept MAC broadcast packets, +* If it enables MAC multicast promiscuous mode. + + +The I40E specific VFd functions +------------------------------- + +The functions in this section are specific to the i40e driver. + + +VF statistics +~~~~~~~~~~~~~ + +This provides an API to get the a specific VF's statistic from PF. + + +VF statistics resetting +~~~~~~~~~~~~~~~~~~~~~~~ + +This provides an API to rest the a specific VF's statistic from PF. + + +VF link status change notification +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This provide an API to let a specific VF know if the physical link status +changed. + +Normally if a VF received this notification, the driver should notify the +application to reset the VF port. + + +VF MAC broadcast setting +~~~~~~~~~~~~~~~~~~~~~~~~ + +Run a testpmd runtime command on the PF to enable/disable MAC broadcast packet +receiving for a specific VF:: + + set vf broadcast 0 0 on|off + + +VF MAC multicast promiscuous mode +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Run a testpmd runtime command on the PF to enable/disable MAC multicast +promiscuous mode for a specific VF:: + + set vf allmulti 0 0 on|off + + +VF MAC unicast promiscuous mode +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Run a testpmd runtime command on the PF to enable/disable MAC unicast +promiscuous mode for a specific VF:: + + set vf promisc 0 0 on|off + + +VF max bandwidth +~~~~~~~~~~~~~~~~ + +Run a testpmd runtime command on the PF to set the TX maximum bandwidth for a +specific VF:: + + set vf tx max-bandwidth 0 0 2000 + +The maximum bandwidth is an absolute value in Mbps. + + +VF TC bandwidth allocation +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Run a testpmd runtime command on the PF to set the TCs (traffic class) TX +bandwidth allocation for a specific VF:: + + set vf tc tx min-bandwidth 0 0 (20,20,20,40) + +The allocated bandwidth should be set for all the TCs. The allocated bandwidth +is a relative value as a percentage. The sum of all the bandwidth should +be 100. + + +VF TC max bandwidth +~~~~~~~~~~~~~~~~~~~ + +Run a testpmd runtime command on the PF to set the TCs TX maximum bandwidth +for a specific VF:: + + set vf tc tx max-bandwidth 0 0 0 10000 + +The maximum bandwidth is an absolute value in Mbps. + + +TC strict priority scheduling +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Run a testpmd runtime command on the PF to enable/disable several TCs TX +strict priority scheduling:: + + set tx strict-link-priority 0 0x3 + +The 0 in the TC bitmap means disabling the strict priority scheduling for this +TC. To enable use a value of 1. diff --git a/src/seastar/dpdk/doc/guides/howto/virtio_user_as_exceptional_path.rst b/src/seastar/dpdk/doc/guides/howto/virtio_user_as_exceptional_path.rst new file mode 100644 index 00000000..0bbcd3fd --- /dev/null +++ b/src/seastar/dpdk/doc/guides/howto/virtio_user_as_exceptional_path.rst @@ -0,0 +1,142 @@ +.. BSD LICENSE + Copyright(c) 2016 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +.. _virtio_user_as_excpetional_path: + +Virtio_user as Exceptional Path +=============================== + +The virtual device, virtio-user, was originally introduced with vhost-user +backend, as a high performance solution for IPC (Inter-Process Communication) +and user space container networking. + +Virtio_user with vhost-kernel backend is a solution for exceptional path, +such as KNI which exchanges packets with kernel networking stack. This +solution is very promising in: + +* Maintenance + + All kernel modules needed by this solution, vhost and vhost-net (kernel), + are upstreamed and extensively used kernel module. + +* Features + + vhost-net is born to be a networking solution, which has lots of networking + related featuers, like multi queue, tso, multi-seg mbuf, etc. + +* Performance + + similar to KNI, this solution would use one or more kthreads to + send/receive packets from user space DPDK applications, which has little + impact on user space polling thread (except that it might enter into kernel + space to wake up those kthreads if necessary). + +The overview of an application using virtio-user as exceptional path is shown +in :numref:`figure_virtio_user_as_exceptional_path`. + +.. _figure_virtio_user_as_exceptional_path: + +.. figure:: img/virtio_user_as_exceptional_path.* + + Overview of a DPDK app using virtio-user as excpetional path + + +Sample Usage +------------ + +As a prerequisite, the vhost/vhost-net kernel CONFIG should be chosen before +compiling the kernel and those kernel modules should be inserted. + +#. Compile DPDK and bind a physical NIC to igb_uio/uio_pci_generic/vfio-pci. + + This physical NIC is for communicating with outside. + +#. Run testpmd. + + .. code-block:: console + + $(testpmd) -l 2-3 -n 4 \ + --vdev=virtio_user0,path=/dev/vhost-net,queue_size=1024 \ + -- -i --txqflags=0x0 --disable-hw-vlan --enable-lro \ + --enable-rx-cksum --rxd=1024 --txd=1024 + + This command runs testpmd with two ports, one physical NIC to communicate + with outside, and one virtio-user to communicate with kernel. + +* ``--enable-lro`` + + This is used to negotiate VIRTIO_NET_F_GUEST_TSO4 and + VIRTIO_NET_F_GUEST_TSO6 feature so that large packets from kernel can be + transmitted DPDK application and further TSOed by physical NIC. + +* ``--enable-rx-cksum`` + + This is used to negotiate VIRTIO_NET_F_GUEST_CSUM so that packets from + kernel can be deemed as valid Rx checksumed. + +* ``queue_size`` + + 256 by default. To avoid shortage of descriptors, we can increase it to 1024. + +* ``queues`` + + Number of multi-queues. Each qeueue will be served by a kthread. For example: + + .. code-block:: console + + $(testpmd) -l 2-3 -n 4 \ + --vdev=virtio_user0,path=/dev/vhost-net,queues=2,queue_size=1024 \ + -- -i --txqflags=0x0 --disable-hw-vlan --enable-lro \ + --enable-rx-cksum --txq=2 --rxq=2 --rxd=1024 \ + --txd=1024 + +#. Start testpmd: + + .. code-block:: console + + (testpmd) start + +#. Configure IP address and start tap: + + .. code-block:: console + + ifconfig tap0 1.1.1.1/24 up + +.. note:: + + The tap device will be named tap0, tap1, etc, by kernel. + +Then, all traffic from physical NIC can be forwarded into kernel stack, and all +traffic on the tap0 can be sent out from physical NIC. + +Limitations +----------- + +This solution is only available on Linux systems. diff --git a/src/seastar/dpdk/doc/guides/howto/virtio_user_for_container_networking.rst b/src/seastar/dpdk/doc/guides/howto/virtio_user_for_container_networking.rst new file mode 100644 index 00000000..f71d0718 --- /dev/null +++ b/src/seastar/dpdk/doc/guides/howto/virtio_user_for_container_networking.rst @@ -0,0 +1,144 @@ +.. BSD LICENSE + Copyright(c) 2016 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +.. _virtio_user_for_container_networking: + +Virtio_user for Container Networking +==================================== + +Container becomes more and more popular for strengths, like low overhead, fast +boot-up time, and easy to deploy, etc. How to use DPDK to accelerate container +networking becomes a common question for users. There are two use models of +running DPDK inside containers, as shown in +:numref:`figure_use_models_for_running_dpdk_in_containers`. + +.. _figure_use_models_for_running_dpdk_in_containers: + +.. figure:: img/use_models_for_running_dpdk_in_containers.* + + Use models of running DPDK inside container + +This page will only cover aggregation model. + +Overview +-------- + +The virtual device, virtio-user, with unmodified vhost-user backend, is designed +for high performance user space container networking or inter-process +communication (IPC). + +The overview of accelerating container networking by virtio-user is shown +in :numref:`figure_virtio_user_for_container_networking`. + +.. _figure_virtio_user_for_container_networking: + +.. figure:: img/virtio_user_for_container_networking.* + + Overview of accelerating container networking by virtio-user + +Different virtio PCI devices we usually use as a para-virtualization I/O in the +context of QEMU/VM, the basic idea here is to present a kind of virtual devices, +which can be attached and initialized by DPDK. The device emulation layer by +QEMU in VM's context is saved by just registering a new kind of virtual device +in DPDK's ether layer. And to minimize the change, we reuse already-existing +virtio PMD code (driver/net/virtio/). + +Virtio, in essence, is a shm-based solution to transmit/receive packets. How is +memory shared? In VM's case, qemu always shares the whole physical layout of VM +to vhost backend. But it's not feasible for a container, as a process, to share +all virtual memory regions to backend. So only those virtual memory regions +(aka, hugepages initialized in DPDK) are sent to backend. It restricts that only +addresses in these areas can be used to transmit or receive packets. + +Sample Usage +------------ + +Here we use Docker as container engine. It also applies to LXC, Rocket with +some minor changes. + +#. Compile DPDK. + + .. code-block:: console + + make install RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc + +#. Write a Dockerfile like below. + + .. code-block:: console + + cat <> Dockerfile + FROM ubuntu:latest + WORKDIR /usr/src/dpdk + COPY . /usr/src/dpdk + ENV PATH "$PATH:/usr/src/dpdk/x86_64-native-linuxapp-gcc/app/" + EOT + +#. Build a Docker image. + + .. code-block:: console + + docker build -t dpdk-app-testpmd . + +#. Start a testpmd on the host with a vhost-user port. + + .. code-block:: console + + $(testpmd) -l 0-1 -n 4 --socket-mem 1024,1024 \ + --vdev 'eth_vhost0,iface=/tmp/sock0' \ + --file-prefix=host --no-pci -- -i + +#. Start a container instance with a virtio-user port. + + .. code-block:: console + + docker run -i -t -v /tmp/sock0:/var/run/usvhost \ + -v /dev/hugepages:/dev/hugepages \ + dpdk-app-testpmd testpmd -l 6-7 -n 4 -m 1024 --no-pci \ + --vdev=virtio_user0,path=/var/run/usvhost \ + --file-prefix=container \ + -- -i --txqflags=0xf00 --disable-hw-vlan + +Note: If we run all above setup on the host, it's a shm-based IPC. + +Limitations +----------- + +We have below limitations in this solution: + * Cannot work with --huge-unlink option. As we need to reopen the hugepage + file to share with vhost backend. + * Cannot work with --no-huge option. Currently, DPDK uses anonymous mapping + under this option which cannot be reopened to share with vhost backend. + * Cannot work when there are more than VHOST_MEMORY_MAX_NREGIONS(8) hugepages. + In another word, do not use 2MB hugepage so far. + * Applications should not use file name like HUGEFILE_FMT ("%smap_%d"). That + will bring confusion when sharing hugepage files with backend by name. + * Root privilege is a must. DPDK resolves physical addresses of hugepages + which seems not necessary, and some discussions are going on to remove this + restriction. -- cgit v1.2.3