doc/README.heuristic


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244

This file is a HOWTO for Wireshark developers. It describes how Wireshark
heuristic protocol dissectors work and how to write them.

This file is compiled to give in depth information on Wireshark.
It is by no means all inclusive and complete. Please feel free to send
remarks and patches to the developer mailing list.


Prerequisites
-------------
As this file is an addition to README.dissector, it is essential to read
and understand that document first.


Why heuristic dissectors?
-------------------------
When Wireshark "receives" a packet, it has to find the right dissector to
start decoding the packet data. Often this can be done by known conventions,
e.g. the Ethernet type 0x0800 means "IP on top of Ethernet" - an easy and
reliable match for Wireshark.

Unfortunately, these conventions are not always available, or (accidentally
or knowingly) some protocols don't care about those conventions and "reuse"
existing "magic numbers / tokens".

For example TCP defines port 80 only for the use of HTTP traffic. But, this
convention doesn't prevent anyone from using TCP port 80 for some different
protocol, or on the other hand using HTTP on a port number different than 80.

To solve this problem, Wireshark introduced the so called heuristic dissector
mechanism to try to deal with these problems.


How Wireshark uses heuristic dissectors?
----------------------------------------
While Wireshark starts, heuristic dissectors (HD) register themselves slightly
different than "normal" dissectors, e.g. a HD can ask for any TCP packet, as
it *may* contain interesting packet data for this dissector. In reality more
than one HD will exist for e.g. TCP packet data.

So if Wireshark has to decode TCP packet data, it will first try to find a
dissector registered directly for the TCP port used in that packet. If it
finds such a registered dissector it will just hand over the packet data to it.

In case there is no such "normal" dissector, WS will hand over the packet data
to the first matching HD. Now the HD will look into the data and decide if that
data looks like something the dissector "is interested in". The return value
signals WS if the HD processed the data (so WS can stop working on that packet)
or if the heuristic didn't match (so WS tries the next HD until one matches -
or the data simply can't be processed).

Note that it is possible to configure WS through preference settings so that it
hands off a packet to the heuristic dissectors before the "normal" dissectors
are called. This allows the HD the chance to receive packets and process them
differently than they otherwise would be. Of course if no HD is interested in
the packet, then the packet will ultimately get handed off to the "normal"
dissector as if the HD wasn't involved at all. As of this writing,
16 dissectors (including DCCP, SCTP, TCP, TIPC and UDP) provide this capability
via their "Try heuristic sub-dissectors first" preference, but most of them have
this option disabled by default.

Once a packet for a particular "connection" has been identified as belonging
to a particular protocol, Wireshark must then be set up to always directly
call the dissector for that protocol. This removes the overhead of having
to identify each packet of the connection heuristically.


How do these heuristics work?
-----------------------------
It's difficult to give a general answer here. The usual heuristic works as follows:

A HD looks into the first few packet bytes and searches for common patterns that
are specific to the protocol in question. Most protocols starts with a
specific header, so a specific pattern may look like (synthetic example):

1) first byte must be 0x42
2) second byte is a type field and can only contain values between 0x20 - 0x33
3) third byte is a flag field, where the lower 4 bits always contain the value 0
4) fourth and fifth bytes contain a 16 bit length field, where the value can't
   be larger than 10000 bytes

So the heuristic dissector will check incoming packet data for all of the
4 above conditions, and only if all of the four conditions are true there is a
good chance that the packet really contains the expected protocol - and the
dissector continues to decode the packet data. If one condition fails, it's
very certainly not the protocol in question and the dissector returns to WS
immediately "this is not my protocol" - maybe some other heuristic dissector
is interested!

Obviously, this is *not* 100% bullet proof, but it's the best WS can offer to
its users here - and improving the heuristic is always possible if it turns out
that it's not good enough to distinguish between two given protocols.

Note: The heuristic code in a dissector *must not* cause an exception
      (before returning false) as this will prevent following
      heuristic dissector handoffs. In practice, this normally means
      that a test must be done to verify that the required data is
      available in the tvb before fetching from the tvb. (See the
      example below).


Heuristic Code Example
----------------------
You can find a lot of code examples in the Wireshark sources, e.g.:
grep -l heur_dissector_add epan/dissectors/*.c
returns 236 files (December 2021).

For the above example criteria, the following code example might do the work
(combine this with the dissector skeleton in README.developer):

XXX - please note: The following code examples were not tried in reality,
please report problems to the dev-list!

--------------------------------------------------------------------------------------------

static dissector_handle_t PROTOABBREV_tcp_handle;
static dissector_handle_t PROTOABBREV_pdu_handle;

/* Heuristics test */
static bool
test_PROTOABBREV(packet_info *pinfo _U_, tvbuff_t *tvb, int offset _U_, void *data _U_)
{
    /* 0) Verify needed bytes available in tvb so tvb_get...() doesn't cause exception.
    if (tvb_captured_length(tvb) < 5)
        return false;

    /* 1) first byte must be 0x42 */
    if ( tvb_get_uint8(tvb, 0) != 0x42 )
        return false;

    /* 2) second byte is a type field and only can contain values between 0x20-0x33 */
    if ( tvb_get_uint8(tvb, 1) < 0x20 || tvb_get_uint8(tvb, 1) > 0x33 )
        return false;

    /* 3) third byte is a flag field, where the lower 4 bits always contain the value 0 */
    if ( tvb_get_uint8(tvb, 2) & 0x0f )
        return false;

    /* 4) fourth and fifth bytes contains a 16 bit length field, where the value can't be longer than 10000 bytes */
    /* Assumes network byte order */
    if ( tvb_get_ntohs(tvb, 3) > 10000 )
        return false;

    /* Assume it's your packet ... */
    return true;
}

/* Dissect the complete PROTOABBREV pdu */
static int
dissect_PROTOABBREV_pdu(tvbuff_t *tvb, packet_info *pinfo, proto_tree *tree, void *data _U_)
{
    /* Dissection ... */

    return tvb_reported_length(tvb);
}

/* For tcp_dissect_pdus() */
static unsigned
get_PROTOABBREV_len(packet_info *pinfo _U_, tvbuff_t *tvb, int offset, void *data _U_)
{
    return (unsigned) tvb_get_ntohs(tvb, offset+3);
}

static int
dissect_PROTOABBREV_tcp(tvbuff_t *tvb, packet_info *pinfo, proto_tree *tree, void *data)
{
    tcp_dissect_pdus(tvb, pinfo, tree, true, 5,
                     get_PROTOABBREV_len, dissect_PROTOABBREV_pdu, data);
    return tvb_reported_length(tvb);
}

static bool
dissect_PROTOABBREV_heur_tcp(tvbuff_t *tvb, packet_info *pinfo, proto_tree *tree, void *data)
{
    if (!test_PROTOABBREV(pinfo, tvb, 0, data))
        return false;

    /* specify that dissect_PROTOABBREV is to be called directly from now on for
     * packets for this "connection" ... but only do this if your heuristic sits directly
     * on top of (was called by) a dissector which established a conversation for the
     * protocol "port type". In other words: only directly over TCP, UDP, DCCP, ...
     * otherwise you'll be overriding the dissector that called your heuristic dissector.
     */
    conversation = find_or_create_conversation(pinfo);
    conversation_set_dissector(conversation, PROTOABBREV_tcp_handle);

    /*   and do the dissection */
    dissect_PROTOABBREV_tcp(tvb, pinfo, tree, data);

    return true;
}

static int
dissect_PROTOABBREV_udp(tvbuff_t *tvb, packet_info *pinfo, proto_tree *tree, void *data)
{
    udp_dissect_pdus(tvb, pinfo, tree, true, 5, NULL,
                     get_PROTOABBREV_len, dissect_PROTOABBREV_pdu, data);
    return tvb_reported_length(tvb);
}

static bool
dissect_PROTOABBREV_heur_udp(tvbuff_t *tvb, packet_info *pinfo, proto_tree *tree, void *data)
{
...
    /*   and do the dissection */
    return (udp_dissect_pdus(tvb, pinfo, tree, true, 5, test_PROTOABBREV,
                     get_PROTOABBREV_len, dissect_PROTOABBREV_pdu, data) != 0);
}

void
proto_reg_handoff_PROTOABBREV(void)
{
    PROTOABBREV_tcp_handle = create_dissector_handle(dissect_PROTOABBREV_tcp,
                                                         proto_PROTOABBREV);
    PROTOABBREV_pdu_handle = create_dissector_handle(dissect_PROTOABBREV_pdu,
                                                         proto_PROTOABBREV);

    /* register as heuristic dissector for both TCP and UDP */
    heur_dissector_add("tcp", dissect_PROTOABBREV_heur_tcp, "PROTOABBREV over TCP",
                       "PROTOABBREV_tcp", proto_PROTOABBREV, HEURISTIC_ENABLE);
    heur_dissector_add("udp", dissect_PROTOABBREV_heur_udp, "PROTOABBREV over UDP",
                       "PROTOABBREV_udp", proto_PROTOABBREV, HEURISTIC_ENABLE);

#ifdef OPTIONAL
    /* It's possible to write a dissector to be a dual heuristic/normal dissector */
    /*  by also registering the dissector "normally".                             */
    dissector_add_uint("ip.proto", IP_PROTO_PROTOABBREV, PROTOABBREV_pdu_handle);
#endif
}


Please note, that registering a heuristic dissector is only possible for a
small variety of protocols. In most cases a heuristic is not needed, and
adding the support would only add unused code to the dissector.

TCP and UDP are prominent examples that support HDs, as there seems to be a
tendency to re-use known port numbers for new protocols. But TCP and UDP are
not the only dissectors that provide support for HDs.  You can find more
examples by searching the Wireshark sources as follows:
grep -l register_heur_dissector_list epan/dissectors/packet-*.c

There are a small number of cases where heuristic dissectors have been added
for formats that were specifically created for use with Wireshark (e.g.
LTE and NR L2 MAC, RLC and PDCP dissectors).