diff options
Diffstat (limited to 'doc/README.heuristic')
-rw-r--r-- | doc/README.heuristic | 244 |
1 files changed, 244 insertions, 0 deletions
diff --git a/doc/README.heuristic b/doc/README.heuristic new file mode 100644 index 00000000..08e9464f --- /dev/null +++ b/doc/README.heuristic @@ -0,0 +1,244 @@ +This file is a HOWTO for Wireshark developers. It describes how Wireshark +heuristic protocol dissectors work and how to write them. + +This file is compiled to give in depth information on Wireshark. +It is by no means all inclusive and complete. Please feel free to send +remarks and patches to the developer mailing list. + + +Prerequisites +------------- +As this file is an addition to README.dissector, it is essential to read +and understand that document first. + + +Why heuristic dissectors? +------------------------- +When Wireshark "receives" a packet, it has to find the right dissector to +start decoding the packet data. Often this can be done by known conventions, +e.g. the Ethernet type 0x0800 means "IP on top of Ethernet" - an easy and +reliable match for Wireshark. + +Unfortunately, these conventions are not always available, or (accidentally +or knowingly) some protocols don't care about those conventions and "reuse" +existing "magic numbers / tokens". + +For example TCP defines port 80 only for the use of HTTP traffic. But, this +convention doesn't prevent anyone from using TCP port 80 for some different +protocol, or on the other hand using HTTP on a port number different than 80. + +To solve this problem, Wireshark introduced the so called heuristic dissector +mechanism to try to deal with these problems. + + +How Wireshark uses heuristic dissectors? +---------------------------------------- +While Wireshark starts, heuristic dissectors (HD) register themselves slightly +different than "normal" dissectors, e.g. a HD can ask for any TCP packet, as +it *may* contain interesting packet data for this dissector. In reality more +than one HD will exist for e.g. TCP packet data. + +So if Wireshark has to decode TCP packet data, it will first try to find a +dissector registered directly for the TCP port used in that packet. If it +finds such a registered dissector it will just hand over the packet data to it. + +In case there is no such "normal" dissector, WS will hand over the packet data +to the first matching HD. Now the HD will look into the data and decide if that +data looks like something the dissector "is interested in". The return value +signals WS if the HD processed the data (so WS can stop working on that packet) +or if the heuristic didn't match (so WS tries the next HD until one matches - +or the data simply can't be processed). + +Note that it is possible to configure WS through preference settings so that it +hands off a packet to the heuristic dissectors before the "normal" dissectors +are called. This allows the HD the chance to receive packets and process them +differently than they otherwise would be. Of course if no HD is interested in +the packet, then the packet will ultimately get handed off to the "normal" +dissector as if the HD wasn't involved at all. As of this writing, +16 dissectors (including DCCP, SCTP, TCP, TIPC and UDP) provide this capability +via their "Try heuristic sub-dissectors first" preference, but most of them have +this option disabled by default. + +Once a packet for a particular "connection" has been identified as belonging +to a particular protocol, Wireshark must then be set up to always directly +call the dissector for that protocol. This removes the overhead of having +to identify each packet of the connection heuristically. + + +How do these heuristics work? +----------------------------- +It's difficult to give a general answer here. The usual heuristic works as follows: + +A HD looks into the first few packet bytes and searches for common patterns that +are specific to the protocol in question. Most protocols starts with a +specific header, so a specific pattern may look like (synthetic example): + +1) first byte must be 0x42 +2) second byte is a type field and can only contain values between 0x20 - 0x33 +3) third byte is a flag field, where the lower 4 bits always contain the value 0 +4) fourth and fifth bytes contain a 16 bit length field, where the value can't + be larger than 10000 bytes + +So the heuristic dissector will check incoming packet data for all of the +4 above conditions, and only if all of the four conditions are true there is a +good chance that the packet really contains the expected protocol - and the +dissector continues to decode the packet data. If one condition fails, it's +very certainly not the protocol in question and the dissector returns to WS +immediately "this is not my protocol" - maybe some other heuristic dissector +is interested! + +Obviously, this is *not* 100% bullet proof, but it's the best WS can offer to +its users here - and improving the heuristic is always possible if it turns out +that it's not good enough to distinguish between two given protocols. + +Note: The heuristic code in a dissector *must not* cause an exception + (before returning false) as this will prevent following + heuristic dissector handoffs. In practice, this normally means + that a test must be done to verify that the required data is + available in the tvb before fetching from the tvb. (See the + example below). + + +Heuristic Code Example +---------------------- +You can find a lot of code examples in the Wireshark sources, e.g.: +grep -l heur_dissector_add epan/dissectors/*.c +returns 236 files (December 2021). + +For the above example criteria, the following code example might do the work +(combine this with the dissector skeleton in README.developer): + +XXX - please note: The following code examples were not tried in reality, +please report problems to the dev-list! + +-------------------------------------------------------------------------------------------- + +static dissector_handle_t PROTOABBREV_tcp_handle; +static dissector_handle_t PROTOABBREV_pdu_handle; + +/* Heuristics test */ +static bool +test_PROTOABBREV(packet_info *pinfo _U_, tvbuff_t *tvb, int offset _U_, void *data _U_) +{ + /* 0) Verify needed bytes available in tvb so tvb_get...() doesn't cause exception. + if (tvb_captured_length(tvb) < 5) + return false; + + /* 1) first byte must be 0x42 */ + if ( tvb_get_guint8(tvb, 0) != 0x42 ) + return false; + + /* 2) second byte is a type field and only can contain values between 0x20-0x33 */ + if ( tvb_get_guint8(tvb, 1) < 0x20 || tvb_get_guint8(tvb, 1) > 0x33 ) + return false; + + /* 3) third byte is a flag field, where the lower 4 bits always contain the value 0 */ + if ( tvb_get_guint8(tvb, 2) & 0x0f ) + return false; + + /* 4) fourth and fifth bytes contains a 16 bit length field, where the value can't be longer than 10000 bytes */ + /* Assumes network byte order */ + if ( tvb_get_ntohs(tvb, 3) > 10000 ) + return false; + + /* Assume it's your packet ... */ + return true; +} + +/* Dissect the complete PROTOABBREV pdu */ +static int +dissect_PROTOABBREV_pdu(tvbuff_t *tvb, packet_info *pinfo, proto_tree *tree, void *data _U_) +{ + /* Dissection ... */ + + return tvb_reported_length(tvb); +} + +/* For tcp_dissect_pdus() */ +static unsigned +get_PROTOABBREV_len(packet_info *pinfo _U_, tvbuff_t *tvb, int offset, void *data _U_) +{ + return (unsigned) tvb_get_ntohs(tvb, offset+3); +} + +static int +dissect_PROTOABBREV_tcp(tvbuff_t *tvb, packet_info *pinfo, proto_tree *tree, void *data) +{ + tcp_dissect_pdus(tvb, pinfo, tree, true, 5, + get_PROTOABBREV_len, dissect_PROTOABBREV_pdu, data); + return tvb_reported_length(tvb); +} + +static bool +dissect_PROTOABBREV_heur_tcp(tvbuff_t *tvb, packet_info *pinfo, proto_tree *tree, void *data) +{ + if (!test_PROTOABBREV(pinfo, tvb, 0, data)) + return false; + + /* specify that dissect_PROTOABBREV is to be called directly from now on for + * packets for this "connection" ... but only do this if your heuristic sits directly + * on top of (was called by) a dissector which established a conversation for the + * protocol "port type". In other words: only directly over TCP, UDP, DCCP, ... + * otherwise you'll be overriding the dissector that called your heuristic dissector. + */ + conversation = find_or_create_conversation(pinfo); + conversation_set_dissector(conversation, PROTOABBREV_tcp_handle); + + /* and do the dissection */ + dissect_PROTOABBREV_tcp(tvb, pinfo, tree, data); + + return (true); +} + +static int +dissect_PROTOABBREV_udp(tvbuff_t *tvb, packet_info *pinfo, proto_tree *tree, void *data) +{ + udp_dissect_pdus(tvb, pinfo, tree, true, 5, NULL, + get_PROTOABBREV_len, dissect_PROTOABBREV_pdu, data); + return tvb_reported_length(tvb); +} + +static bool +dissect_PROTOABBREV_heur_udp(tvbuff_t *tvb, packet_info *pinfo, proto_tree *tree, void *data) +{ +... + /* and do the dissection */ + return (udp_dissect_pdus(tvb, pinfo, tree, true, 5, test_PROTOABBREV, + get_PROTOABBREV_len, dissect_PROTOABBREV_pdu, data) != 0); +} + +void +proto_reg_handoff_PROTOABBREV(void) +{ + PROTOABBREV_tcp_handle = create_dissector_handle(dissect_PROTOABBREV_tcp, + proto_PROTOABBREV); + PROTOABBREV_pdu_handle = create_dissector_handle(dissect_PROTOABBREV_pdu, + proto_PROTOABBREV); + + /* register as heuristic dissector for both TCP and UDP */ + heur_dissector_add("tcp", dissect_PROTOABBREV_heur_tcp, "PROTOABBREV over TCP", + "PROTOABBREV_tcp", proto_PROTOABBREV, HEURISTIC_ENABLE); + heur_dissector_add("udp", dissect_PROTOABBREV_heur_udp, "PROTOABBREV over UDP", + "PROTOABBREV_udp", proto_PROTOABBREV, HEURISTIC_ENABLE); + +#ifdef OPTIONAL + /* It's possible to write a dissector to be a dual heuristic/normal dissector */ + /* by also registering the dissector "normally". */ + dissector_add_uint("ip.proto", IP_PROTO_PROTOABBREV, PROTOABBREV_pdu_handle); +#endif +} + + +Please note, that registering a heuristic dissector is only possible for a +small variety of protocols. In most cases a heuristic is not needed, and +adding the support would only add unused code to the dissector. + +TCP and UDP are prominent examples that support HDs, as there seems to be a +tendency to re-use known port numbers for new protocols. But TCP and UDP are +not the only dissectors that provide support for HDs. You can find more +examples by searching the Wireshark sources as follows: +grep -l register_heur_dissector_list epan/dissectors/packet-*.c + +There are a small number of cases where heuristic dissectors have been added +for formats that were specifically created for use with Wireshark (e.g. +LTE and NR L2 MAC, RLC and PDCP dissectors). |