summaryrefslogtreecommitdiffstats
path: root/doc/README.developer
diff options
context:
space:
mode:
Diffstat (limited to 'doc/README.developer')
-rw-r--r--doc/README.developer995
1 files changed, 995 insertions, 0 deletions
diff --git a/doc/README.developer b/doc/README.developer
new file mode 100644
index 00000000..6d6ede97
--- /dev/null
+++ b/doc/README.developer
@@ -0,0 +1,995 @@
+This file is a HOWTO for Wireshark developers. It describes general development
+and coding practices for contributing to Wireshark no matter which part of
+Wireshark you want to work on.
+
+To learn how to write a dissector, read this first, then read the file
+README.dissector.
+
+This file is compiled to give in depth information on Wireshark.
+It is by no means all inclusive and complete. Please feel free to discuss on
+the developer mailing list or upload merge requests to gitlab.
+
+0. Prerequisites.
+
+Before starting to develop a new dissector, a "running" Wireshark build
+environment is required - there's no such thing as a standalone "dissector
+build toolkit".
+
+How to setup such an environment is platform dependent; detailed
+information about these steps can be found in the "Developer's Guide"
+(available from: https://www.wireshark.org) and in the INSTALL and
+README.md files of the sources root dir.
+
+0.1. General README files.
+
+You'll find additional information in the following README files:
+
+- doc/README.capture - the capture engine internals
+- doc/README.design - Wireshark software design - incomplete
+- doc/README.developer - this file
+- doc/README.dissector - How to dissect a packet
+- doc/README.display_filter - Display Filter Engine
+- doc/README.idl2wrs - CORBA IDL converter
+- doc/README.regression - regression testing of WS and TS
+- doc/README.stats_tree - a tree statistics counting specific packets
+- doc/README.tapping - "tap" a dissector to get protocol specific events
+- doc/README.vagrant - how to create a development VM using vagrant
+- doc/README.wslua - working with LUA
+- doc/README.xml-output - how to work with the PDML exported output
+- wiretap/README.developer - how to add additional capture file types to
+ Wiretap
+
+0.2. Dissector related README files.
+
+You'll find additional dissector related information in the file
+README.dissector as well as the following README files:
+
+- doc/README.heuristic - what are heuristic dissectors and how to write
+ them
+- doc/README.plugins - how to "pluginize" a dissector
+- doc/README.request_response_tracking - how to track req./resp. times and such
+- doc/README.wmem - how to obtain "memory leak free" memory
+
+0.3 Contributors
+
+James Coe <jammer[AT]cin.net>
+Gilbert Ramirez <gram[AT]alumni.rice.edu>
+Jeff Foster <jfoste[AT]woodward.com>
+Olivier Abad <oabad[AT]cybercable.fr>
+Laurent Deniel <laurent.deniel[AT]free.fr>
+Gerald Combs <gerald[AT]wireshark.org>
+Guy Harris <guy[AT]alum.mit.edu>
+Ulf Lamping <ulf.lamping[AT]web.de>
+
+1. Portability.
+
+Wireshark runs on many platforms, and can be compiled with a number of
+different compilers; here are some rules for writing code that will work
+on multiple platforms.
+
+Building Wireshark requires a compiler that supports C11. This includes
+reasonably recent versions of GCC and clang. Microsoft Visual Studio supports
+C11 from Visual Studio 2019 version 16.8 and later. Support requires an updated
+Universal C Runtime (UCRT) and Windows SDK version to work properly with the
+conforming preprocessor. The minimum SDK version is 10.0.20348.0 (version 2104).
+
+The C11 has some optional parts that are not a requirement to build Wireshark.
+In particular the following optional C11 features must NOT be used:
+ - Variable length arrays
+ - Bounds-checking interfaces (Annex K)
+
+We don't allow them because their value is questionable and requiring them
+would exclude a lot of compilers and runtimes that we wish to support.
+
+Don't initialize global or static variables (variables with static
+storage duration) in their declaration with non-constant values. This is not
+permitted in C. E.g., if "i" is a static or global
+variable, don't declare "i" as
+
+ uint32_t i = somearray[2];
+
+outside a function, or as
+
+ static uint32_t i = somearray[2];
+
+inside or outside a function, declare it as just
+
+ uint32_t i;
+
+or
+
+ static uint32_t i;
+
+and later, in code, initialize it with
+
+ i = somearray[2];
+
+instead. Initializations of variables with automatic storage duration -
+i.e., local variables - with non-constant values is permitted, so,
+within a function
+
+ uint32_t i = somearray[2];
+
+is allowed.
+
+Don't use zero-length arrays as structure members, use flexible array members
+instead.
+
+Don't use "uchar", "u_char", "ushort", "u_short", "uint", "u_int",
+"ulong", "u_long" or "boolean"; they aren't defined on all platforms.
+
+GLib typedefs have historically been used extensively throughout the
+codebase (gchar, guint8, gint16, etc). We are moving towards the fixed
+width integers provided in C since C99. These are defined in <stdint.h>,
+which is included in <wireshark.h>. You should choose stdint types when
+possible, but realise that until we can fully migrate our APIs, in many
+situations the GLib types still make sense.
+
+If you want an 8-bit unsigned quantity, use "uint8_t"; if you want an
+8-bit character value with the 8th bit not interpreted as a sign bit,
+use "unsigned char"; if you want a 16-bit unsigned quantity, use "uint16_t";
+if you want a 32-bit unsigned quantity, use "uint32_t"; and if you want
+an "int-sized" unsigned quantity, use "unsigned"; if you want a boolean,
+use "bool" (defined in <stdbool.h>). You don't need to explicitly include
+these headers; they are included in <wireshark.h>. Use that instead.
+
+To print fixed width integers you must use the macros provided in <inttypes.h>.
+
+ uint32_t var;
+ printf("var = " PRIu32 "\n", var);
+
+Don't use "long" to mean "signed 32-bit integer", and don't use
+"unsigned long" to mean "unsigned 32-bit integer"; "long"s are 64 bits
+long on many platforms. Use "int32_t" for signed 32-bit integers and use
+"uint32_t" for unsigned 32-bit integers.
+
+Don't use "long" to mean "signed 64-bit integer" and don't use "unsigned
+long" to mean "unsigned 64-bit integer"; "long"s are 32 bits long on
+many other platforms. Don't use "long long" or "unsigned long long",
+either, as not all platforms support them; use "int64_t" or "uint64_t",
+which will be defined as the appropriate types for 64-bit signed and
+unsigned integers.
+
+On LLP64 data model systems (notably 64-bit Windows), "int" and "long"
+are 32 bits while "size_t" and "ptrdiff_t" are 64 bits. This means that
+the following will generate a compiler warning:
+
+ int i;
+ i = strlen("hello, sailor"); /* Compiler warning */
+
+Normally, you'd just make "i" a size_t. However, many GLib and Wireshark
+functions won't accept a size_t on LLP64:
+
+ size_t i;
+ char greeting[] = "hello, sailor";
+ unsigned byte_after_greet;
+
+ i = strlen(greeting);
+ byte_after_greet = tvb_get_guint8(tvb, i); /* Compiler warning */
+
+Try to use the appropriate data type when you can. When you can't, you
+will have to cast to a compatible data type, e.g.
+
+ size_t i;
+ char greeting[] = "hello, sailor";
+ uint8_t byte_after_greet;
+
+ i = strlen(greeting);
+ byte_after_greet = tvb_get_guint8(tvb, (int) i); /* OK */
+
+or
+
+ int i;
+ char greeting[] = "hello, sailor";
+ uint8_t byte_after_greet;
+
+ i = (int) strlen(greeting);
+ byte_after_greet = tvb_get_guint8(tvb, i); /* OK */
+
+See http://www.unix.org/version2/whatsnew/lp64_wp.html for more
+information on the sizes of common types in different data models.
+
+A lot of legacy code still uses GLib types and I/O replacement API. These
+should be gradually transitioned to use the standard interfaces provided in
+C11. Sometimes it may be necessary to use an unsavory cast or two or abuse
+a macro to bridge the two codebases during the transition. Such is life,
+use your judgement and do the best possible under the circumstances.
+
+Avoid GLib synonyms like gchar and gint and especially don't
+use gpointer and gconstpointer, unless you are writing GLib callbacks
+and trying to match their signature exactly. These just obscure the
+code and gconstpointer in particular is just semantically weird and poor style.
+
+When printing or displaying the values of 64-bit integral data types,
+don't use "%lld", "%llu", "%llx", or "%llo" - not all platforms
+support "%ll" for printing 64-bit integral data types. Instead use
+the macros in <inttypes.h>, for example:
+
+ proto_tree_add_uint64_format_value(tree, hf_uint64, tvb, offset, len,
+ val, "%" PRIu64, val);
+
+For GLib routines, and only those, you can choose whichever format style
+you prefer:
+
+ uint64_t val = UINT64_C(1);
+ char *str1 = g_string_printf("%" G_GUINT64_FORMAT, val);
+ char *str2 = g_string_printf("%" PRIu64, val);
+
+These format macros will be the same modulo any GLib bugs.
+
+When specifying an integral constant that doesn't fit in 32 bits, don't
+use "LL" at the end of the constant - not all compilers use "LL" for
+that. Instead, put the constant in a call to the "INT64_C()" or "UINT64_C()"
+macro, e.g.
+
+ INT64_C(-11644473600), UINT64_C(11644473600)
+
+rather than
+
+ -11644473600LL, 11644473600ULL
+
+Don't assume that you can scan through a va_list initialized by va_start
+more than once without closing it with va_end and re-initializing it with
+va_start. This applies even if you're not scanning through it yourself,
+but are calling a routine that scans through it, such as vfprintf() or
+one of the routines in Wireshark that takes a format and a va_list as an
+argument. You must do
+
+ va_start(ap, format);
+ call_routine1(xxx, format, ap);
+ va_end(ap);
+ va_start(ap, format);
+ call_routine2(xxx, format, ap);
+ va_end(ap);
+
+rather than
+
+ va_start(ap, format);
+ call_routine1(xxx, format, ap);
+ call_routine2(xxx, format, ap);
+ va_end(ap);
+
+Don't use a label without a statement following it. For example,
+something such as
+
+ if (...) {
+
+ ...
+
+ done:
+ }
+
+will not work with all compilers - you have to do
+
+ if (...) {
+
+ ...
+
+ done:
+ ;
+ }
+
+with some statement, even if it's a null statement, after the label.
+Preferably don't do it at all.
+
+Don't use "bzero()", "bcopy()", or "bcmp()"; instead, use the ANSI C
+routines
+
+ "memset()" (with zero as the second argument, so that it sets
+ all the bytes to zero);
+
+ "memcpy()" or "memmove()" (note that the first and second
+ arguments to "memcpy()" are in the reverse order to the
+ arguments to "bcopy()"; note also that "bcopy()" is typically
+ guaranteed to work on overlapping memory regions, while
+ "memcpy()" isn't, so if you may be copying from one region to a
+ region that overlaps it, use "memmove()", not "memcpy()" - but
+ "memcpy()" might be faster as a result of not guaranteeing
+ correct operation on overlapping memory regions);
+
+ and "memcmp()" (note that "memcmp()" returns 0, 1, or -1, doing
+ an ordered comparison, rather than just returning 0 for "equal"
+ and 1 for "not equal", as "bcmp()" does).
+
+Not all platforms necessarily have "bzero()"/"bcopy()"/"bcmp()", and
+those that do might not declare them in the header file on which they're
+declared on your platform.
+
+Don't use "index()" or "rindex()"; instead, use the ANSI C equivalents,
+"strchr()" and "strrchr()". Not all platforms necessarily have
+"index()" or "rindex()", and those that do might not declare them in the
+header file on which they're declared on your platform.
+
+Don't use "tvb_get_ptr()". If you must use it, keep in mind that the pointer
+returned by a call to "tvb_get_ptr()" is not guaranteed to be aligned on any
+particular byte boundary; this means that you cannot safely cast it to any
+data type other than a pointer to "char", "unsigned char", "guint8", or other
+one-byte data types. Casting a pointer returned by tvb_get_ptr() into any
+multi-byte data type or structure may cause crashes on some platforms (even
+if it does not crash on x86-based PCs). Even if such mis-aligned accesses
+don't crash on your platform they will be slower than properly aligned
+accesses would be. Furthermore, the data in a packet is not necessarily in
+the byte order of the machine on which Wireshark is running. Use the tvbuff
+routines to extract individual items from the packet, or, better yet, use
+"proto_tree_add_item()" and let it extract the items for you.
+
+Don't use structures that overlay packet data, or into which you copy
+packet data; the C programming language does not guarantee any
+particular alignment of fields within a structure, and even the
+extensions that try to guarantee that are compiler-specific and not
+necessarily supported by all compilers used to build Wireshark. Using
+bitfields in those structures is even worse; the order of bitfields
+is not guaranteed.
+
+Don't use "ntohs()", "ntohl()", "htons()", or "htonl()"; the header
+files required to define or declare them differ between platforms, and
+you might be able to get away with not including the appropriate header
+file on your platform but that might not work on other platforms.
+Instead, use "g_ntohs()", "g_ntohl()", "g_htons()", and "g_htonl()";
+those are declared by <glib.h>, and you'll need to include that anyway,
+as Wireshark header files that all dissectors must include use stuff from
+<glib.h>.
+
+Don't fetch a little-endian value using "tvb_get_ntohs() or
+"tvb_get_ntohl()" and then using "g_ntohs()", "g_htons()", "g_ntohl()",
+or "g_htonl()" on the resulting value - the g_ routines in question
+convert between network byte order (big-endian) and *host* byte order,
+not *little-endian* byte order; not all machines on which Wireshark runs
+are little-endian, even though PCs are. Fetch those values using
+"tvb_get_letohs()" and "tvb_get_letohl()".
+
+Do not use "open()", "rename()", "mkdir()", "stat()", "unlink()", "remove()",
+"fopen()", "freopen()" directly. Instead use "ws_open()", "ws_rename()",
+"ws_mkdir()", "ws_stat()", "ws_unlink()", "ws_remove()", "ws_fopen()",
+"ws_freopen()": these wrapper functions change the path and file name from
+UTF-8 to UTF-16 on Windows allowing the functions to work correctly when the
+path or file name contain non-ASCII characters.
+
+Also, use ws_read(), ws_write(), ws_lseek(), ws_dup(), ws_fstat(), and
+ws_fdopen(), rather than read(), write(), lseek(), dup(), fstat(), and
+fdopen() on descriptors returned by ws_open().
+
+Those functions are declared in <wsutil/file_util.h>; include that
+header in any code that uses any of those routines.
+
+When opening a file with "ws_fopen()", "ws_freopen()", or "ws_fdopen()", if
+the file contains ASCII text, use "r", "w", "a", and so on as the open mode
+- but if it contains binary data, use "rb", "wb", and so on. On
+Windows, if a file is opened in a text mode, writing a byte with the
+value of octal 12 (newline) to the file causes two bytes, one with the
+value octal 15 (carriage return) and one with the value octal 12, to be
+written to the file, and causes bytes with the value octal 15 to be
+discarded when reading the file (to translate between C's UNIX-style
+lines that end with newline and Windows' DEC-style lines that end with
+carriage return/line feed).
+
+In addition, that also means that when opening or creating a binary
+file, you must use "ws_open()" (with O_CREAT and possibly O_TRUNC if the
+file is to be created if it doesn't exist), and OR in the O_BINARY flag,
+even on UN*X - O_BINARY is defined by <wsutil/file_util.h> as 0 on UN*X.
+
+Do not include <unistd.h>, <fcntl.h>, or <io.h> to declare any of the
+routines listed as replaced by routines in <wsutil/file_util.h>;
+instead, just include <wsutil/file_util.h>.
+
+If you need the declarations of other functions defined by <unistd.h>,
+don't include it without protecting it with
+
+ #ifdef HAVE_UNISTD_H
+
+ ...
+
+ #endif
+
+Don't use forward declarations of static arrays without a specified size
+in a fashion such as this:
+
+ static const value_string foo_vals[];
+
+ ...
+
+ static const value_string foo_vals[] = {
+ { 0, "Red" },
+ { 1, "Green" },
+ { 2, "Blue" },
+ { 0, NULL }
+ };
+
+as some compilers will reject the first of those statements. Instead,
+initialize the array at the point at which it's first declared, so that
+the size is known.
+
+For #define names and enum member names, prefix the names with a tag so
+as to avoid collisions with other names - this might be more of an issue
+on Windows, as it appears to #define names such as DELETE and
+OPTIONAL.
+
+Don't use the "positional parameters" extension that many UNIX printf's
+implement, e.g.:
+
+ snprintf(add_string, 30, " - (%1$d) (0x%1$04x)", value);
+
+as not all UNIX printf's implement it, and Windows printf doesn't appear
+to implement it. Use something like
+
+ snprintf(add_string, 30, " - (%d) (0x%04x)", value, value);
+
+instead.
+
+Don't use
+
+ case N ... M:
+
+as that's not supported by all compilers.
+
+Prefer the C99 output functions from <stdio.h> instead of their GLib
+replacements (note that positional format parameters are not part of C99).
+In the past we used to recommend using g_snprintf() and g_vsnprintf()
+instead but since Visual Studio 2015 native C99 implementations are
+available on all platforms we support. These are optimized better than
+the gnulib (GLib) implementation and on hot codepaths that can be a
+noticeable difference in execution speed.
+
+tmpnam() -> mkstemp()
+tmpnam is insecure and should not be used any more. Wireshark brings its
+own mkstemp implementation for use on platforms that lack mkstemp.
+Note: mkstemp does not accept NULL as a parameter.
+
+Wireshark requires minimum versions of each of the libraries it uses, in
+particular GLib 2.54.0 and Qt 5.12.0 or newer. If you require a mechanism
+that is available only in a newer version of a library then use its
+version detection macros, e.g. "#if GLIB_CHECK_VERSION(...)" and "#if
+QT_VERSION_CHECK(...)" to conditionally compile code using that
+mechanism.
+
+When different code must be used on UN*X and Win32, use a #if or #ifdef
+that tests _WIN32, not WIN32. Try to write code portably whenever
+possible, however; note that there are some routines in Wireshark with
+platform-dependent implementations and platform-independent APIs, such
+as the routines in epan/filesystem.c, allowing the code that calls it to
+be written portably without #ifdefs.
+
+We support building on Windows using MinGW-w64 (experimental) so be mindful
+of the difference between an #ifdef on _WIN32 and _MSC_VER. The first tests
+if the platform is some version of Windows and also applies to MinGW. The
+latter tests if the toolchain is Microsoft Visual Studio. Sometimes you need
+one or the other, depending on whether the condition applies to all Windows
+compilers or only Microsoft's compiler. Use #ifdef __MINGW32__ to test for
+a MinGW toolchain, including MinGW-w64. The same concern applies to CMake
+code. Depending on the particular situation you may need to use if(WIN32) or
+if(MSVC) or if(MINGW).
+
+Wireshark uses Libgcrypt as general-purpose crypto library. Some Wireshark
+specific extensions are defined in wsutil/wsgcrypt.h. You might want to
+include that file instead.
+
+2. String handling
+
+Do not use functions such as strcat() or strcpy().
+A lot of work has been done to remove the existing calls to these functions and
+we do not want any new callers of these functions.
+
+Instead use snprintf() since that function will if used correctly prevent
+buffer overflows for large strings.
+
+Be sure that all pointers passed to %s specifiers in format strings are non-
+NULL. Some implementations will automatically replace NULL pointers with the
+string "(NULL)", but most will not.
+
+When using a buffer to create a string, do not use a buffer stored on the stack.
+I.e. do not use a buffer declared as
+
+ char buffer[1024];
+
+instead allocate a buffer dynamically using the string-specific or plain wmem
+routines (see README.wmem) such as
+
+ wmem_strbuf_t *strbuf;
+ strbuf = wmem_strbuf_new(pinfo->pool, "");
+ wmem_strbuf_append_printf(strbuf, ...
+
+or
+
+ char *buffer=NULL;
+ ...
+ #define MAX_BUFFER 1024
+ buffer=wmem_alloc(pinfo->pool, MAX_BUFFER);
+ buffer[0]='\0';
+ ...
+ snprintf(buffer, MAX_BUFFER, ...
+
+This avoids the stack from being corrupted in case there is a bug in your code
+that accidentally writes beyond the end of the buffer.
+
+
+If you write a routine that will create and return a pointer to a filled in
+string and if that buffer will not be further processed or appended to after
+the routine returns (except being added to the proto tree),
+do not preallocate the buffer to fill in and pass as a parameter instead
+pass a pointer to a pointer to the function and return a pointer to a
+wmem-allocated buffer that will be automatically freed. (see README.wmem)
+
+I.e. do not write code such as
+ static void
+ foo_to_str(char *string, ... ){
+ <fill in string>
+ }
+ ...
+ char buffer[1024];
+ ...
+ foo_to_str(buffer, ...
+ proto_tree_add_string(... buffer ...
+
+instead write the code as
+ static void
+ foo_to_str(char **buffer, ...
+ #define MAX_BUFFER x
+ *buffer=wmem_alloc(pinfo->pool, MAX_BUFFER);
+ <fill in *buffer>
+ }
+ ...
+ char *buffer;
+ ...
+ foo_to_str(&buffer, ...
+ proto_tree_add_string(... *buffer ...
+
+Use wmem_ allocated buffers. They are very fast and nice. These buffers are all
+automatically free()d when the dissection of the current packet ends so you
+don't have to worry about free()ing them explicitly in order to not leak memory.
+Please read README.wmem.
+
+Source files can use UTF-8 encoding, but characters outside the ASCII
+range should be used sparingly. It should be safe to use non-ASCII
+characters in comments and strings, but some compilers (such as GCC
+versions prior to 10) may not support extended identifiers very well.
+There is also no guarantee that a developer's text editor will interpret
+the characters the way you intend them to be interpreted.
+
+The majority of Wireshark encodes strings as UTF-8. The main exception
+is the code that uses the Qt API, which uses UTF-16. Console output is
+UTF-8, but as with the source code extended characters should be used
+sparingly since some consoles (most notably Windows' cmd.exe) have
+limited support for UTF-8.
+
+3. Robustness.
+
+Wireshark is not guaranteed to read only network traces that contain correctly-
+formed packets. Wireshark is commonly used to track down networking
+problems, and the problems might be due to a buggy protocol implementation
+sending out bad packets.
+
+Therefore, code does not only have to be able to handle
+correctly-formed packets without, for example, crashing or looping
+infinitely, they also have to be able to handle *incorrectly*-formed
+packets without crashing or looping infinitely.
+
+Here are some suggestions for making code more robust in the face
+of incorrectly-formed packets:
+
+Do *NOT* use "ws_assert()" or "ws_assert_not_reached()" with input data in dissectors.
+*NO* value in a packet's data should be considered "wrong" in the sense
+that it's a problem with the dissector if found; if it cannot do
+anything else with a particular value from a packet's data, the
+dissector should put into the protocol tree an indication that the
+value is invalid, and should return. The "expert" mechanism should be
+used for that purpose.
+
+Use assertions to catch logic errors in your program. A failed assertion
+indicates a bug in the code. Use ws_assert() instead of g_assert() to
+test a logic condition. Note that ws_assert() can be removed at compile
+time. Therefore assertions should not have any side-effects,
+otherwise the program may behave inconsistently.
+
+Use ws_assert_not_reached() instead of g_assert_not_reached() for
+unreachable error conditions. For example if (and only if) you know
+'myvar' can only have the values 1 and 2 do:
+ switch(myvar) {
+ case 1:
+ (...)
+ break;
+ case 2:
+ (...)
+ break;
+ default:
+ ws_assert_not_reached();
+ break;
+ }
+
+For dissectors use DISSECTOR_ASSERT() and DISSECTOR_ASSERT_NOT_REACHED()
+instead, with the same caveats as above.
+
+You should continue to use g_assert_true(), g_assert_cmpstr(), etc for
+"test code", such as unit testing. These assertions are always active.
+See the GLib Testing API documentation for the details on each of those
+functions.
+
+If there is a case where you are checking not for an invalid data item
+in the packet, but for a bug in the dissector (for example, an
+assumption being made at a particular point in the code about the
+internal state of the dissector), use the DISSECTOR_ASSERT macro for
+that purpose; this will put into the protocol tree an indication that
+the dissector has a bug in it, and will not crash the application.
+
+If you are allocating a chunk of memory to contain data from a packet,
+or to contain information derived from data in a packet, and the size of
+the chunk of memory is derived from a size field in the packet, make
+sure all the data is present in the packet before allocating the buffer.
+Doing so means that:
+
+ 1) Wireshark won't leak that chunk of memory if an attempt to
+ fetch data not present in the packet throws an exception.
+
+and
+
+ 2) it won't crash trying to allocate an absurdly-large chunk of
+ memory if the size field has a bogus large value.
+
+If you're fetching into such a chunk of memory a sequence of bytes from
+the buffer, and the sequence has a specified size, you can use
+"tvb_memdup()", which will check whether the entire sequence is present
+before allocating a buffer for it.
+
+Otherwise, you can check whether the data is present by using
+"tvb_ensure_bytes_exist()" although this frequently is not needed: the
+TVB-accessor routines can handle requests to read data beyond the end of
+the TVB (by throwing an exception which will either mark the frame as
+truncated--not all the data was captured--or as malformed).
+
+If you're fetching a string only to add it to the tree, you should
+generally be using "proto_tree_add_item()" instead. If you also need
+the string, you can use the variant "proto_tree_add_item_ret_string()"
+or "proto_tree_add_item_ret_string_and_length()" forms.
+
+If you must fetch it from the tvbuff, and the string has a specified
+size and known encoding, you can use "tvb_get_string_enc()" for most
+encodings, which will check whether the entire string is present before
+allocating a buffer for the string, will put a trailing '\0' at the end
+of the buffer, and will also check for invalid characters in the supplied
+encoding and convert the string to UTF-8. The "tvb_get_*_string()" set of
+functions is available as well, and must be used for some encodings,
+primarily non byte aligned ones. If the string has a known encoding and
+is null terminated, the "stringz" variants can be used. (Note that these
+functions are called with memory allocators, and if called with a NULL
+allocator you are required to free the string when finished with it.)
+
+If the string has a known encoding but requires token parsing or other
+text manipulation to determine the offset and size, do so by calling
+tvb_*() functions on the tvbuff that perform bounds checking if possible.
+Only extract the bytes into a newly allocated buffer to extract a string
+if absolutely necessary. If you do so, then you *must* ensure that the
+string is valid UTF-8 when passing it to a libwireshark API function
+such as proto_tree_add_string(). (Cf. 7.5: Unicode and string encoding
+best practices.)
+
+Conversion to UTF-8 can produce a string with a length longer than
+that of the string in the original packet data; this includes strings
+encoded in ASCII or UTF-8 itself if they have invalid character sequences
+that are replaced with the 3 byte UTF-8 REPLACEMENT CHARACTER. Truncating
+a valid UTF-8 string to an arbitrary number of bytes does not guarantee
+that the result is a valid UTF-8 string, because a multibyte character
+might span the boundary.
+
+Note also that you should only fetch string data into a fixed-length
+buffer if the code ensures that no more bytes than will fit into the
+buffer are fetched ("the protocol ensures" isn't good enough, as
+protocol specifications can't ensure only packets that conform to the
+specification will be transmitted or that only packets for the protocol
+in question will be interpreted as packets for that protocol by
+Wireshark).
+
+If you have gotten a pointer using "tvb_get_ptr()" (which you should not
+have: you should seriously consider a better alternative to this function),
+you must make sure that you do not refer to any data past the length passed
+as the last argument to "tvb_get_ptr()"; while the various "tvb_get"
+routines perform bounds checking and throw an exception if you refer to data
+not available in the tvbuff, direct references through a pointer gotten from
+"tvb_get_ptr()" do not do any bounds checking.
+
+If you have a loop that dissects a sequence of items, each of which has
+a length field, with the offset in the tvbuff advanced by the length of
+the item, then, if the length field is the total length of the item, and
+thus can be zero, you *MUST* check for a zero-length item and abort the
+loop if you see one. Otherwise, a zero-length item could cause the
+dissector to loop infinitely. You should also check that the offset,
+after having the length added to it, is greater than the offset before
+the length was added to it, if the length field is greater than 24 bits
+long, so that, if the length value is *very* large and adding it to the
+offset causes an overflow, that overflow is detected.
+
+If you have a
+
+ for (i = {start}; i < {end}; i++)
+
+loop, make sure that the type of the loop index variable is large enough
+to hold the maximum {end} value plus 1; otherwise, the loop index
+variable can overflow before it ever reaches its maximum value. In
+particular, be very careful when using int8_t, uint8_t, int16_t, or uint16_t
+(or the deprecated Glib synonyms gint8, guint8, gint16, or guint16)
+variables as loop indices; you almost always want to use an "int"/"gint"
+or "unsigned"/"guint" as the loop index rather than a shorter type.
+
+If you are fetching a length field from the buffer, corresponding to the
+length of a portion of the packet, and subtracting from that length a
+value corresponding to the length of, for example, a header in the
+packet portion in question, *ALWAYS* check that the value of the length
+field is greater than or equal to the length you're subtracting from it,
+and report an error in the packet and stop dissecting the packet if it's
+less than the length you're subtracting from it. Otherwise, the
+resulting length value will be negative, which will either cause errors
+in the dissector or routines called by the dissector, or, if the value
+is interpreted as an unsigned integer, will cause the value to be
+interpreted as a very large positive value.
+
+Any tvbuff offset that is added to as processing is done on a packet
+should be stored in a 32-bit variable, such as an "int"; if you store it
+in an 8-bit or 16-bit variable, you run the risk of the variable
+overflowing.
+
+sprintf() -> snprintf()
+Prevent yourself from using the sprintf() function, as it does not test the
+length of the given output buffer and might be writing into unintended memory
+areas. This function is one of the main causes of security problems like buffer
+exploits and many other bugs that are very hard to find. It's much better to
+use the snprintf() function declared by <stdio.h> instead.
+
+You should test your dissector against incorrectly-formed packets. This
+can be done using the randpkt and editcap utilities that come with the
+Wireshark distribution. Testing using randpkt can be done by generating
+output at the same layer as your protocol, and forcing Wireshark/TShark
+to decode it as your protocol, e.g. if your protocol sits on top of UDP:
+
+ randpkt -c 50000 -t dns randpkt.pcap
+ tshark -nVr randpkt.pcap -d udp.port==53,<myproto>
+
+Testing using editcap can be done using preexisting capture files and the
+"-E" flag, which introduces errors in a capture file. E.g.:
+
+ editcap -E 0.03 infile.pcap outfile.pcap
+ tshark -nVr outfile.pcap
+
+tools/fuzz-test.sh is available to help automate these tests.
+
+4. Name convention.
+
+Wireshark uses the underscore_convention rather than the InterCapConvention for
+function names, so new code should probably use underscores rather than
+intercaps for functions and variable names. This is especially important if you
+are writing code that will be called from outside your code. We are just
+trying to keep things consistent for other developers.
+
+C symbols exported from libraries shipped with Wireshark should start with a
+prefix that helps avoiding name collision with public symbols from other shared
+libraries. The current suggested prefixes for newly added symbols are
+ws_, wslua_, wmem_ and wtap_.
+
+5. White space convention.
+
+Most of the C and C++ files in Wireshark use 4-space or 2-space indentation.
+When creating new files you are you are strongly encouraged to use 4-space
+indentation for source code in order to ensure consistency between files.
+
+Please avoid using tab expansions different from 8 column widths, as not all
+text editors in use by the developers support this. For a detailed discussion
+of tabs, spaces, and indentation, see
+
+ http://www.jwz.org/doc/tabs-vs-spaces.html
+
+We use EditorConfig (http://editorconfig.org) files to provide formatting
+hints. Most editors and IDEs support EditorConfig, either directly or via
+a plugin. If yours requires a plugin we encourage you to install it. Our
+default EditorConfig indentation style for C and C++ files is 4 spaces.
+
+Many files also have a short comment (modelines) on the indentation logic at
+the end of the file. This was required in the past but has been superseded by
+EditorConfig. See
+
+ https://www.wireshark.org/tools/modelines.html
+
+for more information.
+
+Please do not leave trailing whitespace (spaces/tabs) on lines.
+
+Quite a bit of our source code has varying indentation styles. When editing an
+existing file, try following the existing indentation logic. If you wish to
+convert a file to 4 space indentation, please do so in its own commit and be
+sure to remove its .editorconfig entry so that the default setting takes
+effect.
+
+6. Compiler warnings
+
+You should write code that is free of compiler warnings. Such warnings will
+often indicate questionable code and sometimes even real bugs, so it's best
+to avoid warnings at all.
+
+The compiler flags in the Makefiles are set to "treat warnings as errors",
+so your code won't even compile when warnings occur.
+
+7. General observations about architecture
+
+7.1 The global header "wireshark.h"
+
+You should include the global header <wireshark.h> in your code. However
+there are some things to keep in mind when using it and especially
+if you are considering modifying it.
+
+** wireshark.h needs to be minimal: for efficiency reasons, to reduce the
+error surface and because every time this header changes everything must be
+rebuilt. Consider carefully if another header/module should be included
+globally with every project file and exported as public header.
+
+** No configuration: configuration is specific to the build environment
+and target machine. wireshark.h must not depend on that.
+
+** Only wireshark system headers allowed: plugins use this header and
+cannot depend on any header (even indirectly) that is not installed on the
+target system.
+
+** Only global definitions allowed: for example it is acceptable to include
+'wsutil' headers in wireshark.h because every component of Wireshark is allowed
+to depend on wsutil. wiretap is not acceptable because we cannot introduce
+dependencies on wiretap globally (and wireshark.h must be usable everywhere).
+
+7.2 Best practices using headers
+
+C files can be categorized in three types: source files, private headers and
+public headers.
+
+A module "foobar" can have only a private header, only a public header, or
+both. If it's only one it is named "foobar.h" in both cases. If it is both they
+are named "foobar-int.h" and "foobar.h" respectively.
+
+In general the order of #include's for a C module source files (foobar.c),
+assuming foobar implements any kind of interface should be:
+
+ #include "config.h"
+ #define WS_LOG_DOMAIN "mydomain"
+ #include "foobar-int.h"
+
+ followed by <system headers>
+ followed by <wireshark public headers>
+ followed by <wireshark private headers>
+
+For header files (private and public) config.h must NOT be included. A public
+header file (foobar.h) looks like this:
+
+ #ifndef __FOOBAR_H__
+ #define __FOOBAR_H__
+ #include <wireshark.h>
+ followed by <system headers>
+ followed by <wireshark public headers>
+
+ #ifdef __cplusplus
+ extern "C" {
+ #endif
+ (declarations)
+ #ifdef __cplusplus
+ }
+ #endif
+ #endif /* FOOBAR_H */
+
+A private header (foobar-int.h) is the public header plus the declarations
+with private scope:
+
+ #ifndef __FOOBAR_INT_H__
+ #define __FOOBAR_INT_H__
+ #include "foobar.h"
+ followed by <system headers>
+ followed by <wireshark public headers>
+ followed by <wireshark private headers>
+ (etc.)
+
+Again if there are only public or private declarations the name foobar-int.h
+is not used. The macro symbol WS_LOG_DOMAIN can be defined in source files or
+private headers as long as it comes before wireshark.h.
+
+7.3 Wireshark internal and external API policy
+
+Wireshark has several APIs. We need to distinguish between internal
+Wireshark library APIs and external Wireshark APIs. Wireshark the project is
+composed of many different programs and these executable binaries use a number
+of internal libraries to share code efficiently. These internal shared
+libraries need to be installed on the system to run the programs (wireshark,
+tshark, etc).
+
+A library's public API includes the symbols exported by the DSO (wsutil,
+libwireshark, etc). The internal API is made available in the shared libraries
+and exists to support the goals of the project. It is public from the point
+of view of Wireshark programs (client users of the internal API). The
+external API exists to support plugins (client users of the external API)
+and is a loosely defined subset of the internal API plus any infrastructure
+required to support a plugin system. Note that these two uses of shared
+libraries coexist with a lot of overlap, but are nevertheless distinct.
+
+The internal (public) API is not considered to be stable and will regularly
+change as a normal part of development to support new features, remove cruft,
+and whatever else is necessary to make the project sustainable and ease the
+burden on developers. There is less freedom to change something that could
+break a lot of plugins but this is also acceptable (with cause).
+
+The plugin ABI policy is to be compatible only between micro releases (also
+called patch releases). That means we try to make it unnecessary to recompile
+plugins with each micro release (on a best-effort basis). For major.minor
+releases it is explicitly required to recompile plugins. There is no stable
+ABI contract of any kind in that case.
+
+Keep in mind that APIs can exist in different scopes and levels of abstraction.
+Don't get stuck thinking the words public/private have a very specific
+meaning, like being decorated or not with WS_DLL_PUBLIC, although that is a
+big part of it usually.
+
+Also the Wireshark developers have historically tried to keep the Lua API
+very stable and provide strong backward-compatibility guarantees. Under this
+policy moving from Lua 5.2 is unlikely to happen in the foreseeable future.
+
+7.4 libwireshark is not a single monolithic entity
+
+One day we might conceivably wish to load dissectors on demand and do other
+more sophisticated kinds of unit test. Plus other scenarios not immediately
+obvious. For this to be possible it is important that the code in epan/ does
+not depend on code in epan/dissectors, i.e it is possible to compile epan
+without linking with dissector code. It helps to view dissectors as clients
+of an API provided by epan (libwireshark being constituted by two distinct
+components "epan" and "dissectors" bundled together, plus other bits and
+pieces). The reverse is not* true; epan should not be the client of an API
+provided by dissectors.
+
+The main way this separation of concerns is achieved is by using runtime
+registration interfaces in epan for dissectors, preferences, etc. that are
+dynamic and do not have any dissector routines hard coded. Naturally this
+is also an essential component of a plugin system (libwireshark has plugins
+for taps, dissectors and an experimental interface to augment dissection with
+new extension languages).
+
+7.5 Unicode and string encoding best practices
+
+Wireshark strings are always encoded in UTF-8 internally, regardless of the
+platform where it is running. The C datatype used is "pointer to char" and this
+is assumed to point to a valid UTF-8 string. Sometimes older code uses char to
+point to opaque byte strings but this archaic usage should be avoided. A better
+data type for that is uint8_t.
+
+Every untrusted string needs to be validated for correct and error-free UTF-8
+encoding, or converted from the source encoding to UTF-8. This should be done
+at the periphery of the code. This means converting input during dissection or
+when reading input generally. To reiterate: all the Wireshark APIs expect to
+receive valid UTF-8 strings. These include proto_tree_add_string(),
+proto_item_append_text() and col_append_fstr() just to name a few.
+
+If a dissector uses standard API functions to handle strings, such as
+proto_tree_add_item() with an FT_STRING header field type, the API will
+transparently handle the conversion from the source encoding to UTF-8 and
+nothing else needs to be done to ensure valid string input.
+
+If your dissector does text manipulation, token parsing and such and generally
+extracts text strings from the TVBuff or tries to do line oriented input from
+TVBuffs it *must* make sure it passes only valid UTF-8 to libwireshark APIs.
+This should be done using tvb_get_string_enc() to extract a string from a TVbuff
+or get_utf_8_string() to validate a string after it has been constructed.
+
+The Qt API uses UTF-16 for its QString class; when converting between a
+QString and a pointer to char, functions that convert to or from UTF-8
+encoded pointers to char (or QByteArrays) such as toUtf8() should be used,
+not toLocal8Bit() or toLatin1().
+
+8. Miscellaneous notes
+
+Each commit in your branch corresponds to a different VCSVERSION string
+automatically defined in the header 'vcs_version.h' during the build. If you happen
+to find it convenient to disable this feature it can be done using:
+
+ touch .git/wireshark-disable-versioning
+
+i.e., the file 'wireshark-disable-versioning' must exist in the git repo dir.
+
+/*
+ * Editor modelines - https://www.wireshark.org/tools/modelines.html
+ *
+ * Local variables:
+ * c-basic-offset: 4
+ * tab-width: 8
+ * indent-tabs-mode: nil
+ * End:
+ *
+ * vi: set shiftwidth=4 tabstop=8 expandtab:
+ * :indentSize=4:tabSize=8:noTabs=true:
+ */