summaryrefslogtreecommitdiffstats
path: root/src/seastar/dpdk/doc/guides/sample_app_ug/ip_reassembly.rst
blob: cc9e5911abc9f261efa674da77dd28467b86c153 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
..  BSD LICENSE
    Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
    All rights reserved.

    Redistribution and use in source and binary forms, with or without
    modification, are permitted provided that the following conditions
    are met:

    * Redistributions of source code must retain the above copyright
    notice, this list of conditions and the following disclaimer.
    * Redistributions in binary form must reproduce the above copyright
    notice, this list of conditions and the following disclaimer in
    the documentation and/or other materials provided with the
    distribution.
    * Neither the name of Intel Corporation nor the names of its
    contributors may be used to endorse or promote products derived
    from this software without specific prior written permission.

    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

IP Reassembly Sample Application
================================

The L3 Forwarding application is a simple example of packet processing using the DPDK.
The application performs L3 forwarding with reassembly for fragmented IPv4 and IPv6 packets.

Overview
--------

The application demonstrates the use of the DPDK libraries to implement packet forwarding
with reassembly for IPv4 and IPv6 fragmented packets.
The initialization and run- time paths are very similar to those of the :doc:`l2_forward_real_virtual`.
The main difference from the L2 Forwarding sample application is that
it reassembles fragmented IPv4 and IPv6 packets before forwarding.
The maximum allowed size of reassembled packet is 9.5 KB.

There are two key differences from the L2 Forwarding sample application:

*   The first difference is that the forwarding decision is taken based on information read from the input packet's IP header.

*   The second difference is that the application differentiates between IP and non-IP traffic by means of offload flags.

The Longest Prefix Match (LPM for IPv4, LPM6 for IPv6) table is used to store/lookup an outgoing port number, associated with that IPv4 address. Any unmatched packets are forwarded to the originating port.Compiling the Application
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

To compile the application:

#.  Go to the sample application directory:

   .. code-block:: console

        export RTE_SDK=/path/to/rte_sdk
        cd ${RTE_SDK}/examples/ip_reassembly

#.  Set the target (a default target is used if not specified). For example:

   .. code-block:: console

        export RTE_TARGET=x86_64-native-linuxapp-gcc

See the *DPDK Getting Started Guide* for possible RTE_TARGET values.

#.  Build the application:

   .. code-block:: console

        make

Running the Application
-----------------------

The application has a number of command line options:

.. code-block:: console

    ./build/ip_reassembly [EAL options] -- -p PORTMASK [-q NQ] [--maxflows=FLOWS>] [--flowttl=TTL[(s|ms)]]

where:

*   -p PORTMASK: Hexadecimal bitmask of ports to configure

*   -q NQ: Number of RX queues per lcore

*   --maxflows=FLOWS: determines maximum number of active fragmented flows (1-65535). Default value: 4096.

*   --flowttl=TTL[(s|ms)]: determines maximum Time To Live for fragmented packet.
    If all fragments of the packet wouldn't appear within given time-out,
    then they are considered as invalid and will be dropped.
    Valid range is 1ms - 3600s. Default value: 1s.

To run the example in linuxapp environment with 2 lcores (2,4) over 2 ports(0,2) with 1 RX queue per lcore:

.. code-block:: console

    ./build/ip_reassembly -l 2,4 -n 3 -- -p 5
    EAL: coremask set to 14
    EAL: Detected lcore 0 on socket 0
    EAL: Detected lcore 1 on socket 1
    EAL: Detected lcore 2 on socket 0
    EAL: Detected lcore 3 on socket 1
    EAL: Detected lcore 4 on socket 0
    ...

    Initializing port 0 on lcore 2... Address:00:1B:21:76:FA:2C, rxq=0 txq=2,0 txq=4,1
    done: Link Up - speed 10000 Mbps - full-duplex
    Skipping disabled port 1
    Initializing port 2 on lcore 4... Address:00:1B:21:5C:FF:54, rxq=0 txq=2,0 txq=4,1
    done: Link Up - speed 10000 Mbps - full-duplex
    Skipping disabled port 3IP_FRAG: Socket 0: adding route 100.10.0.0/16 (port 0)
    IP_RSMBL: Socket 0: adding route 100.20.0.0/16 (port 1)
    ...

    IP_RSMBL: Socket 0: adding route 0101:0101:0101:0101:0101:0101:0101:0101/48 (port 0)
    IP_RSMBL: Socket 0: adding route 0201:0101:0101:0101:0101:0101:0101:0101/48 (port 1)
    ...

    IP_RSMBL: entering main loop on lcore 4
    IP_RSMBL: -- lcoreid=4 portid=2
    IP_RSMBL: entering main loop on lcore 2
    IP_RSMBL: -- lcoreid=2 portid=0

To run the example in linuxapp environment with 1 lcore (4) over 2 ports(0,2) with 2 RX queues per lcore:

.. code-block:: console

    ./build/ip_reassembly -l 4 -n 3 -- -p 5 -q 2

To test the application, flows should be set up in the flow generator that match the values in the
l3fwd_ipv4_route_array and/or l3fwd_ipv6_route_array table.

Please note that in order to test this application,
the traffic generator should be generating valid fragmented IP packets.
For IPv6, the only supported case is when no other extension headers other than
fragment extension header are present in the packet.

The default l3fwd_ipv4_route_array table is:

.. code-block:: c

    struct l3fwd_ipv4_route l3fwd_ipv4_route_array[] = {
        {IPv4(100, 10, 0, 0), 16, 0},
        {IPv4(100, 20, 0, 0), 16, 1},
        {IPv4(100, 30, 0, 0), 16, 2},
        {IPv4(100, 40, 0, 0), 16, 3},
        {IPv4(100, 50, 0, 0), 16, 4},
        {IPv4(100, 60, 0, 0), 16, 5},
        {IPv4(100, 70, 0, 0), 16, 6},
        {IPv4(100, 80, 0, 0), 16, 7},
    };

The default l3fwd_ipv6_route_array table is:

.. code-block:: c

    struct l3fwd_ipv6_route l3fwd_ipv6_route_array[] = {
        {{1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}, 48, 0},
        {{2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}, 48, 1},
        {{3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}, 48, 2},
        {{4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}, 48, 3},
        {{5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}, 48, 4},
        {{6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}, 48, 5},
        {{7, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}, 48, 6},
        {{8, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}, 48, 7},
    };

For example, for the fragmented input IPv4 packet with destination address: 100.10.1.1,
a reassembled IPv4 packet be sent out from port #0 to the destination address 100.10.1.1
once all the fragments are collected.

Explanation
-----------

The following sections provide some explanation of the sample application code.
As mentioned in the overview section, the initialization and run-time paths are very similar to those of the :doc:`l2_forward_real_virtual`.
The following sections describe aspects that are specific to the IP reassemble sample application.

IPv4 Fragment Table Initialization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This application uses the rte_ip_frag library. Please refer to Programmer's Guide for more detailed explanation of how to use this library.
Fragment table maintains information about already received fragments of the packet.
Each IP packet is uniquely identified by triple <Source IP address>, <Destination IP address>, <ID>.
To avoid lock contention, each RX queue has its own Fragment Table,
e.g. the application can't handle the situation when different fragments of the same packet arrive through different RX queues.
Each table entry can hold information about packet consisting of up to RTE_LIBRTE_IP_FRAG_MAX_FRAGS fragments.

.. code-block:: c

    frag_cycles = (rte_get_tsc_hz() + MS_PER_S - 1) / MS_PER_S * max_flow_ttl;

    if ((qconf->frag_tbl[queue] = rte_ip_frag_tbl_create(max_flow_num, IPV4_FRAG_TBL_BUCKET_ENTRIES, max_flow_num, frag_cycles, socket)) == NULL)
    {
        RTE_LOG(ERR, IP_RSMBL, "ip_frag_tbl_create(%u) on " "lcore: %u for queue: %u failed\n",  max_flow_num, lcore, queue);
        return -1;
    }

Mempools Initialization
~~~~~~~~~~~~~~~~~~~~~~~

The reassembly application demands a lot of mbuf's to be allocated.
At any given time up to (2 \* max_flow_num \* RTE_LIBRTE_IP_FRAG_MAX_FRAGS \* <maximum number of mbufs per packet>)
can be stored inside Fragment Table waiting for remaining fragments.
To keep mempool size under reasonable limits and to avoid situation when one RX queue can starve other queues,
each RX queue uses its own mempool.

.. code-block:: c

    nb_mbuf = RTE_MAX(max_flow_num, 2UL * MAX_PKT_BURST) * RTE_LIBRTE_IP_FRAG_MAX_FRAGS;
    nb_mbuf *= (port_conf.rxmode.max_rx_pkt_len + BUF_SIZE - 1) / BUF_SIZE;
    nb_mbuf *= 2; /* ipv4 and ipv6 */
    nb_mbuf += RTE_TEST_RX_DESC_DEFAULT + RTE_TEST_TX_DESC_DEFAULT;
    nb_mbuf = RTE_MAX(nb_mbuf, (uint32_t)NB_MBUF);

    snprintf(buf, sizeof(buf), "mbuf_pool_%u_%u", lcore, queue);

    if ((rxq->pool = rte_mempool_create(buf, nb_mbuf, MBUF_SIZE, 0, sizeof(struct rte_pktmbuf_pool_private), rte_pktmbuf_pool_init, NULL,
        rte_pktmbuf_init, NULL, socket, MEMPOOL_F_SP_PUT | MEMPOOL_F_SC_GET)) == NULL) {

            RTE_LOG(ERR, IP_RSMBL, "mempool_create(%s) failed", buf);
            return -1;
    }

Packet Reassembly and Forwarding
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For each input packet, the packet forwarding operation is done by the l3fwd_simple_forward() function.
If the packet is an IPv4 or IPv6 fragment, then it calls rte_ipv4_reassemble_packet() for IPv4 packets,
or rte_ipv6_reassemble_packet() for IPv6 packets.
These functions either return a pointer to valid mbuf that contains reassembled packet,
or NULL (if the packet can't be reassembled for some reason).
Then l3fwd_simple_forward() continues with the code for the packet forwarding decision
(that is, the identification of the output interface for the packet) and
actual transmit of the packet.

The rte_ipv4_reassemble_packet() or rte_ipv6_reassemble_packet() are responsible for:

#.  Searching the Fragment Table for entry with packet's <IP Source Address, IP Destination Address, Packet ID>

#.  If the entry is found, then check if that entry already timed-out.
    If yes, then free all previously received fragments,
    and remove information about them from the entry.

#.  If no entry with such key is found, then try to create a new one by one of two ways:

    #.  Use as empty entry

    #.  Delete a timed-out entry, free mbufs associated with it mbufs and store a new entry with specified key in it.

#.  Update the entry with new fragment information and check
    if a packet can be reassembled (the packet's entry contains all fragments).

    #.  If yes, then, reassemble the packet, mark table's entry as empty and return the reassembled mbuf to the caller.

    #.  If no, then just return a NULL to the caller.

If at any stage of packet processing a reassembly function encounters an error
(can't insert new entry into the Fragment table, or invalid/timed-out fragment),
then it will free all associated with the packet fragments,
mark the table entry as invalid and return NULL to the caller.

Debug logging and Statistics Collection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The RTE_LIBRTE_IP_FRAG_TBL_STAT controls statistics collection for the IP Fragment Table.
This macro is disabled by default.
To make ip_reassembly print the statistics to the standard output,
the user must send either an USR1, INT or TERM signal to the process.
For all of these signals, the ip_reassembly process prints Fragment table statistics for each RX queue,
plus the INT and TERM will cause process termination as usual.