summaryrefslogtreecommitdiffstats
path: root/doc/README.heuristic
diff options
context:
space:
mode:
Diffstat (limited to 'doc/README.heuristic')
-rw-r--r--doc/README.heuristic244
1 files changed, 244 insertions, 0 deletions
diff --git a/doc/README.heuristic b/doc/README.heuristic
new file mode 100644
index 00000000..08e9464f
--- /dev/null
+++ b/doc/README.heuristic
@@ -0,0 +1,244 @@
+This file is a HOWTO for Wireshark developers. It describes how Wireshark
+heuristic protocol dissectors work and how to write them.
+
+This file is compiled to give in depth information on Wireshark.
+It is by no means all inclusive and complete. Please feel free to send
+remarks and patches to the developer mailing list.
+
+
+Prerequisites
+-------------
+As this file is an addition to README.dissector, it is essential to read
+and understand that document first.
+
+
+Why heuristic dissectors?
+-------------------------
+When Wireshark "receives" a packet, it has to find the right dissector to
+start decoding the packet data. Often this can be done by known conventions,
+e.g. the Ethernet type 0x0800 means "IP on top of Ethernet" - an easy and
+reliable match for Wireshark.
+
+Unfortunately, these conventions are not always available, or (accidentally
+or knowingly) some protocols don't care about those conventions and "reuse"
+existing "magic numbers / tokens".
+
+For example TCP defines port 80 only for the use of HTTP traffic. But, this
+convention doesn't prevent anyone from using TCP port 80 for some different
+protocol, or on the other hand using HTTP on a port number different than 80.
+
+To solve this problem, Wireshark introduced the so called heuristic dissector
+mechanism to try to deal with these problems.
+
+
+How Wireshark uses heuristic dissectors?
+----------------------------------------
+While Wireshark starts, heuristic dissectors (HD) register themselves slightly
+different than "normal" dissectors, e.g. a HD can ask for any TCP packet, as
+it *may* contain interesting packet data for this dissector. In reality more
+than one HD will exist for e.g. TCP packet data.
+
+So if Wireshark has to decode TCP packet data, it will first try to find a
+dissector registered directly for the TCP port used in that packet. If it
+finds such a registered dissector it will just hand over the packet data to it.
+
+In case there is no such "normal" dissector, WS will hand over the packet data
+to the first matching HD. Now the HD will look into the data and decide if that
+data looks like something the dissector "is interested in". The return value
+signals WS if the HD processed the data (so WS can stop working on that packet)
+or if the heuristic didn't match (so WS tries the next HD until one matches -
+or the data simply can't be processed).
+
+Note that it is possible to configure WS through preference settings so that it
+hands off a packet to the heuristic dissectors before the "normal" dissectors
+are called. This allows the HD the chance to receive packets and process them
+differently than they otherwise would be. Of course if no HD is interested in
+the packet, then the packet will ultimately get handed off to the "normal"
+dissector as if the HD wasn't involved at all. As of this writing,
+16 dissectors (including DCCP, SCTP, TCP, TIPC and UDP) provide this capability
+via their "Try heuristic sub-dissectors first" preference, but most of them have
+this option disabled by default.
+
+Once a packet for a particular "connection" has been identified as belonging
+to a particular protocol, Wireshark must then be set up to always directly
+call the dissector for that protocol. This removes the overhead of having
+to identify each packet of the connection heuristically.
+
+
+How do these heuristics work?
+-----------------------------
+It's difficult to give a general answer here. The usual heuristic works as follows:
+
+A HD looks into the first few packet bytes and searches for common patterns that
+are specific to the protocol in question. Most protocols starts with a
+specific header, so a specific pattern may look like (synthetic example):
+
+1) first byte must be 0x42
+2) second byte is a type field and can only contain values between 0x20 - 0x33
+3) third byte is a flag field, where the lower 4 bits always contain the value 0
+4) fourth and fifth bytes contain a 16 bit length field, where the value can't
+ be larger than 10000 bytes
+
+So the heuristic dissector will check incoming packet data for all of the
+4 above conditions, and only if all of the four conditions are true there is a
+good chance that the packet really contains the expected protocol - and the
+dissector continues to decode the packet data. If one condition fails, it's
+very certainly not the protocol in question and the dissector returns to WS
+immediately "this is not my protocol" - maybe some other heuristic dissector
+is interested!
+
+Obviously, this is *not* 100% bullet proof, but it's the best WS can offer to
+its users here - and improving the heuristic is always possible if it turns out
+that it's not good enough to distinguish between two given protocols.
+
+Note: The heuristic code in a dissector *must not* cause an exception
+ (before returning false) as this will prevent following
+ heuristic dissector handoffs. In practice, this normally means
+ that a test must be done to verify that the required data is
+ available in the tvb before fetching from the tvb. (See the
+ example below).
+
+
+Heuristic Code Example
+----------------------
+You can find a lot of code examples in the Wireshark sources, e.g.:
+grep -l heur_dissector_add epan/dissectors/*.c
+returns 236 files (December 2021).
+
+For the above example criteria, the following code example might do the work
+(combine this with the dissector skeleton in README.developer):
+
+XXX - please note: The following code examples were not tried in reality,
+please report problems to the dev-list!
+
+--------------------------------------------------------------------------------------------
+
+static dissector_handle_t PROTOABBREV_tcp_handle;
+static dissector_handle_t PROTOABBREV_pdu_handle;
+
+/* Heuristics test */
+static bool
+test_PROTOABBREV(packet_info *pinfo _U_, tvbuff_t *tvb, int offset _U_, void *data _U_)
+{
+ /* 0) Verify needed bytes available in tvb so tvb_get...() doesn't cause exception.
+ if (tvb_captured_length(tvb) < 5)
+ return false;
+
+ /* 1) first byte must be 0x42 */
+ if ( tvb_get_guint8(tvb, 0) != 0x42 )
+ return false;
+
+ /* 2) second byte is a type field and only can contain values between 0x20-0x33 */
+ if ( tvb_get_guint8(tvb, 1) < 0x20 || tvb_get_guint8(tvb, 1) > 0x33 )
+ return false;
+
+ /* 3) third byte is a flag field, where the lower 4 bits always contain the value 0 */
+ if ( tvb_get_guint8(tvb, 2) & 0x0f )
+ return false;
+
+ /* 4) fourth and fifth bytes contains a 16 bit length field, where the value can't be longer than 10000 bytes */
+ /* Assumes network byte order */
+ if ( tvb_get_ntohs(tvb, 3) > 10000 )
+ return false;
+
+ /* Assume it's your packet ... */
+ return true;
+}
+
+/* Dissect the complete PROTOABBREV pdu */
+static int
+dissect_PROTOABBREV_pdu(tvbuff_t *tvb, packet_info *pinfo, proto_tree *tree, void *data _U_)
+{
+ /* Dissection ... */
+
+ return tvb_reported_length(tvb);
+}
+
+/* For tcp_dissect_pdus() */
+static unsigned
+get_PROTOABBREV_len(packet_info *pinfo _U_, tvbuff_t *tvb, int offset, void *data _U_)
+{
+ return (unsigned) tvb_get_ntohs(tvb, offset+3);
+}
+
+static int
+dissect_PROTOABBREV_tcp(tvbuff_t *tvb, packet_info *pinfo, proto_tree *tree, void *data)
+{
+ tcp_dissect_pdus(tvb, pinfo, tree, true, 5,
+ get_PROTOABBREV_len, dissect_PROTOABBREV_pdu, data);
+ return tvb_reported_length(tvb);
+}
+
+static bool
+dissect_PROTOABBREV_heur_tcp(tvbuff_t *tvb, packet_info *pinfo, proto_tree *tree, void *data)
+{
+ if (!test_PROTOABBREV(pinfo, tvb, 0, data))
+ return false;
+
+ /* specify that dissect_PROTOABBREV is to be called directly from now on for
+ * packets for this "connection" ... but only do this if your heuristic sits directly
+ * on top of (was called by) a dissector which established a conversation for the
+ * protocol "port type". In other words: only directly over TCP, UDP, DCCP, ...
+ * otherwise you'll be overriding the dissector that called your heuristic dissector.
+ */
+ conversation = find_or_create_conversation(pinfo);
+ conversation_set_dissector(conversation, PROTOABBREV_tcp_handle);
+
+ /* and do the dissection */
+ dissect_PROTOABBREV_tcp(tvb, pinfo, tree, data);
+
+ return (true);
+}
+
+static int
+dissect_PROTOABBREV_udp(tvbuff_t *tvb, packet_info *pinfo, proto_tree *tree, void *data)
+{
+ udp_dissect_pdus(tvb, pinfo, tree, true, 5, NULL,
+ get_PROTOABBREV_len, dissect_PROTOABBREV_pdu, data);
+ return tvb_reported_length(tvb);
+}
+
+static bool
+dissect_PROTOABBREV_heur_udp(tvbuff_t *tvb, packet_info *pinfo, proto_tree *tree, void *data)
+{
+...
+ /* and do the dissection */
+ return (udp_dissect_pdus(tvb, pinfo, tree, true, 5, test_PROTOABBREV,
+ get_PROTOABBREV_len, dissect_PROTOABBREV_pdu, data) != 0);
+}
+
+void
+proto_reg_handoff_PROTOABBREV(void)
+{
+ PROTOABBREV_tcp_handle = create_dissector_handle(dissect_PROTOABBREV_tcp,
+ proto_PROTOABBREV);
+ PROTOABBREV_pdu_handle = create_dissector_handle(dissect_PROTOABBREV_pdu,
+ proto_PROTOABBREV);
+
+ /* register as heuristic dissector for both TCP and UDP */
+ heur_dissector_add("tcp", dissect_PROTOABBREV_heur_tcp, "PROTOABBREV over TCP",
+ "PROTOABBREV_tcp", proto_PROTOABBREV, HEURISTIC_ENABLE);
+ heur_dissector_add("udp", dissect_PROTOABBREV_heur_udp, "PROTOABBREV over UDP",
+ "PROTOABBREV_udp", proto_PROTOABBREV, HEURISTIC_ENABLE);
+
+#ifdef OPTIONAL
+ /* It's possible to write a dissector to be a dual heuristic/normal dissector */
+ /* by also registering the dissector "normally". */
+ dissector_add_uint("ip.proto", IP_PROTO_PROTOABBREV, PROTOABBREV_pdu_handle);
+#endif
+}
+
+
+Please note, that registering a heuristic dissector is only possible for a
+small variety of protocols. In most cases a heuristic is not needed, and
+adding the support would only add unused code to the dissector.
+
+TCP and UDP are prominent examples that support HDs, as there seems to be a
+tendency to re-use known port numbers for new protocols. But TCP and UDP are
+not the only dissectors that provide support for HDs. You can find more
+examples by searching the Wireshark sources as follows:
+grep -l register_heur_dissector_list epan/dissectors/packet-*.c
+
+There are a small number of cases where heuristic dissectors have been added
+for formats that were specifically created for use with Wireshark (e.g.
+LTE and NR L2 MAC, RLC and PDCP dissectors).