summaryrefslogtreecommitdiffstats
path: root/doc/userguide/capture-hardware/af-xdp.rst
blob: ebe858548b5b84d37d32bfa7739fbbb6ab2d266e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
AF_XDP
======

AF_XDP (eXpress Data Path) is a high speed capture framework for Linux that was
introduced in Linux v4.18. AF_XDP aims at improving capture performance by
redirecting ingress frames to user-space memory rings, thus bypassing the network
stack.

Note that during ``af_xdp`` operation the selected interface cannot be used for
regular network usage.

Further reading:

    - https://www.kernel.org/doc/html/latest/networking/af_xdp.html

Compiling Suricata
------------------

Linux
~~~~~

libxdp and libpbf are required for this feature. When building from source the
development files will also be required.

Example::

    dnf -y install libxdp-devel libbpf-devel

This feature is enabled provided the libraries above are installed, the user
does not need to add any additional command line options. 

The command line option ``--disable-af-xdp`` can be used to disable this
feature.

Example::

    ./configure --disable-af-xdp

Starting Suricata
-----------------

IDS
~~~

Suricata can be started as follows to use af-xdp:

::

  af-xdp:
    suricata --af-xdp=<interface>
    suricata --af-xdp=igb0

In the above example Suricata will start reading from the `igb0` network interface.

AF_XDP Configuration
--------------------

Each of these settings can be configured under ``af-xdp`` within the "Configure
common capture settings" section of suricata.yaml configuration file.

The number of threads created can be configured in the suricata.yaml configuration
file. It is recommended to use threads equal to NIC queues/CPU cores.

Another option is to select ``auto`` which will allow Suricata to configure the
number of threads based on the number of RSS queues available on the NIC.

With ``auto`` selected, Suricata spawns receive threads equal to the number of
configured RSS queues on the interface.

::

  af-xdp:
    threads: <number>
    threads: auto
    threads: 8

Advanced setup
---------------

af-xdp capture source will operate using the default configuration settings.
However, these settings are available in the suricata.yaml configuration file.

Available configuration options are:

force-xdp-mode
~~~~~~~~~~~~~~

There are two operating modes employed when loading the XDP program, these are:

- XDP_DRV: Mode chosen when the driver supports AF_XDP
- XDP_SKB: Mode chosen when no AF_XDP support is unavailable

XDP_DRV mode is the preferred mode, used to ensure best performance.

::

  af-xdp:
    force-xdp-mode: <value> where: value = <skb|drv|none>
    force-xdp-mode: drv

force-bind-mode
~~~~~~~~~~~~~~~

During binding the kernel will first attempt to use zero-copy (preferred). If
zero-copy support is unavailable it will fallback to copy mode, copying all
packets out to user space.

::

  af-xdp:
    force-bind-mode: <value> where: value = <copy|zero|none>
    force-bind-mode: zero

For both options, the kernel will attempt the 'preferred' option first and
fallback upon failure. Therefore the default (none) means the kernel has
control of which option to apply. By configuring these options the user
is forcing said option. Note that if enabled, the bind will only attempt
this option, upon failure the bind will fail i.e. no fallback.

mem-unaligned
~~~~~~~~~~~~~~~~

AF_XDP can operate in two memory alignment modes, these are:

- Aligned chunk mode
- Unaligned chunk mode

Aligned chunk mode is the default option which ensures alignment of the
data within the UMEM.

Unaligned chunk mode uses hugepages for the UMEM.
Hugepages start at the size of 2MB but they can be as large as 1GB.
Lower count of pages (memory chunks) allows faster lookup of page entries.
The hugepages need to be allocated on the NUMA node where the NIC and CPU resides.
Otherwise, if the hugepages are allocated only on NUMA node 0 and the NIC is
connected to NUMA node 1, then the application will fail to start.
Therefore, it is recommended to first find out to which NUMA node the NIC is
connected to and only then allocate hugepages and set CPU cores affinity
to the given NUMA node.

Memory assigned per socket/thread is 16MB, so each worker thread requires at least
16MB of free space. As stated above hugepages can be of various sizes, consult the
OS to confirm with ``cat /proc/meminfo``.

Example ::
  
    8 worker threads * 16Mb = 128Mb
    hugepages = 2048 kB
    so: pages required = 62.5 (63) pages

See https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt for detailed
description.

To enable unaligned chunk mode:

::

  af-xdp:
    mem-unaligned: <yes/no>
    mem-unaligned: yes

Introduced from Linux v5.11 a ``SO_PREFER_BUSY_POLL`` option has been added to
AF_XDP that allows a true polling of the socket queues. This feature has
been introduced to reduce context switching and improve CPU reaction time
during traffic reception.

Enabled by default, this feature will apply the following options, unless
disabled (see below). The following options are used to configure this feature.

enable-busy-poll
~~~~~~~~~~~~~~~~

Enables or disables busy polling.

::

  af-xdp:
    enable-busy-poll: <yes/no>
    enable-busy-poll: yes

busy-poll-time
~~~~~~~~~~~~~~

Sets the approximate time in microseconds to busy poll on a ``blocking receive``
when there is no data.

::

  af-xdp:
    busy-poll-time: <time>
    busy-poll-time: 20

busy-poll-budget
~~~~~~~~~~~~~~~~

Budget allowed for batching of ingress frames. Larger values means more
frames can be stored/read. It is recommended to test this for performance.

::

  af-xdp:
    busy-poll-budget: <budget>
    busy-poll-budget: 64

Linux tunables
~~~~~~~~~~~~~~~

The ``SO_PREFER_BUSY_POLL`` option works in concert with the following two Linux
knobs to ensure best capture performance. These are not socket options:

- gro-flush-timeout
- napi-defer-hard-irq

The purpose of these two knobs is to defer interrupts and to allow the
NAPI context to be scheduled from a watchdog timer instead.

The ``gro-flush-timeout`` indicates the timeout period for the watchdog
timer. When no traffic is received for ``gro-flush-timeout`` the timer will
exit and softirq handling will resume.

The ``napi-defer-hard-irq`` indicates the number of queue scan attempts
before exiting to interrupt context. When enabled, the softirq NAPI context will
exit early, allowing busy polling.

::

  af-xdp:
    gro-flush-timeout: 2000000
    napi-defer-hard-irq: 2


Hardware setup
---------------

Intel NIC setup
~~~~~~~~~~~~~~~

Intel network cards don't support symmetric hashing but it is possible to emulate
it by using a specific hashing function.

Follow these instructions closely for desired result::

 ifconfig eth3 down

Enable symmetric hashing ::

 ifconfig eth3 down 
 ethtool -L eth3 combined 16 # if you have at least 16 cores
 ethtool -K eth3 rxhash on 
 ethtool -K eth3 ntuple on
 ifconfig eth3 up
 ./set_irq_affinity 0-15 eth3
 ethtool -X eth3 hkey 6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A equal 16
 ethtool -x eth3
 ethtool -n eth3

In the above setup you are free to use any recent ``set_irq_affinity`` script. It is available in any Intel x520/710 NIC sources driver download.

**NOTE:**
We use a special low entropy key for the symmetric hashing. `More info about the research for symmetric hashing set up <http://www.ndsl.kaist.edu/~kyoungsoo/papers/TR-symRSS.pdf>`_

Disable any NIC offloading
~~~~~~~~~~~~~~~~~~~~~~~~~~

Suricata shall disable NIC offloading based on configuration parameter ``disable-offloading``, which is enabled by default.
See ``capture`` section of yaml file.

::

  capture:
    # disable NIC offloading. It's restored when Suricata exits.
    # Enabled by default.
    #disable-offloading: false

Balance as much as you can
~~~~~~~~~~~~~~~~~~~~~~~~~~

Try to use the network card's flow balancing as much as possible ::
 
 for proto in tcp4 udp4 ah4 esp4 sctp4 tcp6 udp6 ah6 esp6 sctp6; do 
    /sbin/ethtool -N eth3 rx-flow-hash $proto sd
 done

This command triggers load balancing using only source and destination IPs. This may be not optimal
in terms of load balancing fairness but this ensures all packets of a flow will reach the same thread
even in the case of IP fragmentation (where source and destination port will not be available for
some fragmented packets).