summaryrefslogtreecommitdiffstats
path: root/doc/dev/dpdk.rst
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--doc/dev/dpdk.rst172
1 files changed, 172 insertions, 0 deletions
diff --git a/doc/dev/dpdk.rst b/doc/dev/dpdk.rst
new file mode 100644
index 000000000..cdc31a7c6
--- /dev/null
+++ b/doc/dev/dpdk.rst
@@ -0,0 +1,172 @@
+=========================
+Ceph messenger DPDKStack
+=========================
+
+Compiling DPDKStack
+===================
+
+Ceph dpdkstack is not compiled by default. Therefore, you need to recompile and
+enable the DPDKstack component.
+Optionally install ``dpdk-devel`` or ``dpdk-dev`` on distros with precompiled DPDK packages, and compile
+
+.. prompt:: bash $
+
+ do_cmake.sh -DWITH_DPDK=ON
+
+
+Setting the DPDK Network Adapter
+================================
+
+Most mainstream NICs support SR-IOV and can be virtualized into multiple VF NICs.
+Each OSD uses some dedicated NICs through DPDK. The mon, mgr and client use the PF NICs
+through the POSIX protocol stack.
+
+Load the driver on which DPDK depends:
+
+.. prompt:: bash #
+
+ modprobe vfio
+ modprobe vfio_pci
+
+Configure Hugepage by editing ``/etc/sysctl.conf`` ::
+
+ vm.nr_hugepages = xxx
+
+Configure the number of VFs based on the number of OSDs:
+
+.. prompt:: bash #
+
+ echo $numvfs > /sys/class/net/$port/device/sriov_numvfs
+
+Binding NICs to DPDK Applications:
+
+.. prompt:: bash #
+
+ dpdk-devbind.py -b vfio-pci 0000:xx:yy.z
+
+
+Configuring OSD DPDKStack
+==========================
+
+By default, the DPDK RTE initialization process requires the root privileges
+for accessing various resources in system. To grant the root access to
+the ``ceph`` user:
+
+.. prompt:: bash #
+
+ usermod -G root ceph
+
+The OSD selects the NICs using ``ms_dpdk_devs_allowlist``:
+
+#. Configure a single NIC.
+
+ .. code-block:: ini
+
+ ms_dpdk_devs_allowlist=-a 0000:7d:010
+
+ or
+
+ .. code-block:: ini
+
+ ms_dpdk_devs_allowlist=--allow=0000:7d:010
+
+#. Configure the Bond Network Adapter
+
+ .. code-block:: ini
+
+ ms_dpdk_devs_allowlist=--allow=0000:7d:01.0 --allow=0000:7d:02.6 --vdev=net_bonding0,mode=2,slave=0000:7d:01.0,slave=0000:7d:02.6
+
+DPDK-related configuration items are as follows:
+
+.. code-block:: ini
+
+ [osd]
+ ms_type=async+dpdk
+ ms_async_op_threads=1
+
+ ms_dpdk_port_id=0
+ ms_dpdk_gateway_ipv4_addr=172.19.36.1
+ ms_dpdk_netmask_ipv4_addr=255.255.255.0
+ ms_dpdk_hugepages=/dev/hugepages
+ ms_dpdk_hw_flow_control=false
+ ms_dpdk_lro=false
+ ms_dpdk_enable_tso=false
+ ms_dpdk_hw_queue_weight=1
+ ms_dpdk_memory_channel=2
+ ms_dpdk_debug_allow_loopback = true
+
+ [osd.x]
+ ms_dpdk_coremask=0xf0
+ ms_dpdk_host_ipv4_addr=172.19.36.51
+ public_addr=172.19.36.51
+ cluster_addr=172.19.36.51
+ ms_dpdk_devs_allowlist=--allow=0000:7d:01.1
+
+Debug and Optimization
+======================
+
+Locate faults based on logs and adjust logs to a proper level:
+
+.. code-block:: ini
+
+ debug_dpdk=xx
+ debug_ms=xx
+
+if the log contains a large number of retransmit messages,reduce the value of ms_dpdk_tcp_wmem.
+
+Run the perf dump command to view DPDKStack statistics:
+
+.. prompt:: bash $
+
+ ceph daemon osd.$i perf dump | grep dpdk
+
+
+if the ``dpdk_device_receive_nombuf_errors`` keeps increasing, check whether the
+throttling exceeds the limit:
+
+.. prompt:: bash $
+
+ ceph daemon osd.$i perf dump | grep throttle-osd_client -A 7 | grep "get_or_fail_fail"
+ ceph daemon osd.$i perf dump | grep throttle-msgr_dispatch_throttler -A 7 | grep "get_or_fail_fail"
+
+if the throttling exceeds the threshold, increase the throttling threshold or
+disable the throttling.
+
+Check whether the network adapter is faulty or abnormal.Run the following
+command to obtain the network adapter status and statistics:
+
+.. prompt:: bash $
+
+ ceph daemon osd.$i show_pmd_stats
+ ceph daemon osd.$i show_pmd_xstats
+
+Some DPDK versions (eg. dpdk-20.11-3.e18.aarch64) or NIC TSOs are abnormal,
+try disabling tso:
+
+.. code-block:: ini
+
+ ms_dpdk_enable_tso=false
+
+if VF NICs support multiple queues, more NIC queues can be allocated to a
+single core to improve performance:
+
+.. code-block:: ini
+
+ ms_dpdk_hw_queues_per_qp=4
+
+
+Status and Future Work
+======================
+
+Compared with POSIX Stack, in the multi-concurrency test, DPDKStack has the same
+4K random write performance, 8K random write performance is improved by 28%, and
+1 MB packets are unstable. In the single-latency test,the 4K and 8K random write
+latency is reduced by 15% (the lower the latency is, the better).
+
+At a high level, our future work plan is:
+
+ OSD multiple network support (public network and cluster network)
+ The public and cluster network adapters can be configured.When connecting or
+ listening,the public or cluster network adapters can be selected based on the
+ IP address.During msgr-work initialization,initialize both the public and cluster
+ network adapters and create two DPDKQueuePairs.