summaryrefslogtreecommitdiffstats
path: root/doc/README.wmem
diff options
context:
space:
mode:
Diffstat (limited to 'doc/README.wmem')
-rw-r--r--doc/README.wmem402
1 files changed, 402 insertions, 0 deletions
diff --git a/doc/README.wmem b/doc/README.wmem
new file mode 100644
index 00000000..8473a23b
--- /dev/null
+++ b/doc/README.wmem
@@ -0,0 +1,402 @@
+1. Introduction
+
+The 'wmem' memory manager is Wireshark's memory management framework, replacing
+the old 'emem' framework which was removed in Wireshark 2.0.
+
+In order to make memory management easier and to reduce the probability of
+memory leaks, Wireshark provides its own memory management API. This API is
+implemented inside wsutil/wmem/ and provides memory pools and functions that make
+it easy to manage memory even in the face of exceptions (which many dissector
+functions can raise). Memory scopes for dissection are defined in epan/wmem_scopes.h.
+
+Correct use of these functions will make your code faster, and greatly reduce
+the chances that it will leak memory in exceptional cases.
+
+Wmem was originally conceived in this email to the wireshark-dev mailing list:
+https://www.wireshark.org/lists/wireshark-dev/201210/msg00178.html
+
+2. Usage for Consumers
+
+If you're writing a dissector, or other "userspace" code, then using wmem
+should be very similar to using malloc or g_malloc or whatever else you're used
+to. All you need to do is include the header (epan/wmem_scopes.h) and optionally
+get a handle to a memory pool (if you want to *create* a memory pool, see the
+section "3. Usage for Producers" below).
+
+A memory pool is an opaque pointer to an object of type wmem_allocator_t, and
+it is the very first parameter passed to almost every call you make to wmem.
+Other than that parameter (and the fact that functions are prefixed wmem_)
+usage is very similar to glib and other utility libraries. For example:
+
+ wmem_alloc(myPool, 20);
+
+allocates 20 bytes in the pool pointed to by myPool.
+
+2.1 Memory Pool Lifetimes
+
+Every memory pool should have a defined lifetime, or scope, after which all the
+memory in that pool is unconditionally freed. When you choose to allocate memory
+in a pool, you *must* be aware of its lifetime: if the lifetime is shorter than
+you need, your code will contain use-after-free bugs; if the lifetime is longer
+than you need, your code may contain undetectable memory leaks. In either case,
+the risks outweigh the benefits.
+
+If no pool exists whose lifetime matches the lifetime of your memory, you have
+two options: create a new pool (see section 3 of this document) or use the NULL
+pool. Any function that takes a pointer to a wmem_allocator_t can also be passed
+NULL instead, in which case the memory is managed manually (just like malloc or
+g_malloc). Memory allocated like this *must* be manually passed to wmem_free()
+in order to prevent memory leaks (however these memory leaks will at least show
+up in valgrind). Note that passing wmem_allocated memory directly to free()
+or g_free() is not safe; the backing type of manually managed memory may be
+changed without warning.
+
+2.2 Wireshark Global Pools
+
+Dissectors that include the wmem_scopes.h header file will have three pools available
+to them automatically: pinfo->pool, wmem_file_scope() and
+wmem_epan_scope(); there is also a wmem_packet_scope() for cases when the
+`pinfo` argument is not accessible, but pinfo->pool should be preferred.
+
+The pinfo pool is scoped to the dissection of each packet, meaning that any
+memory allocated in it will be automatically freed at the end of the current
+packet. The file pool is similarly scoped to the dissection of each file,
+meaning that any memory allocated in it will be automatically freed when the
+current capture file is closed.
+
+NB: Using these pools outside of the appropriate scope (e.g. using the file
+ pool when there isn't a file open) will throw an assertion.
+ See the comment in epan/wmem_scopes.c for details.
+
+The epan pool is scoped to the library's lifetime - memory allocated in it is
+not freed until epan_cleanup() is called, which is typically but not necessarily
+at the very end of the program.
+
+2.3 The Pinfo Pool
+
+Certain allocations (such as AT_STRINGZ address allocations and anything that
+might end up being passed to add_new_data_source) need their memory to stick
+around a little longer than the usual packet scope - basically until the
+next packet is dissected. This is, in fact, the scope of Wireshark's pinfo
+structure, so the pinfo struct has a 'pool' member which is a wmem pool scoped
+to the lifetime of the pinfo struct.
+
+2.4 API
+
+Full documentation for each function (parameters, return values, behaviours)
+lives (or will live) in Doxygen-format in the header files for those functions.
+This is just an overview of which header files you should be looking at.
+
+2.4.1 Core API
+
+wmem_core.h
+ - Basic memory management functions (wmem_alloc, wmem_realloc, wmem_free).
+
+2.4.2 Strings
+
+wmem_strutl.h
+ - Utility functions for manipulating null-terminated C-style strings.
+ Functions like strdup and strdup_printf.
+
+wmem_strbuf.h
+ - A managed string object implementation, similar to std::string in C++ or
+ GString from Glib.
+
+2.4.3 Container Data Structures
+
+wmem_array.h
+ - A growable array (AKA vector) implementation.
+
+wmem_list.h
+ - A doubly-linked list implementation.
+
+wmem_map.h
+ - A hash map (AKA hash table) implementation.
+
+wmem_multimap.h
+ - A hash multimap (map that can store multiple values with the same key)
+ implementation.
+
+wmem_queue.h
+ - A queue implementation (first-in, first-out).
+
+wmem_stack.h
+ - A stack implementation (last-in, first-out).
+
+wmem_tree.h
+ - A balanced binary tree (red-black tree) implementation.
+
+2.4.4 Miscellaneous Utilities
+
+wmem_miscutl.h
+ - Misc. utility functions like memdup.
+
+2.5 Callbacks
+
+WARNING: You probably don't actually need these; use them only when you're
+ sure you understand the dangers.
+
+Sometimes (though hopefully rarely) it may be necessary to store data in a wmem
+pool that requires additional cleanup before it is freed. For example, perhaps
+you have a pointer to a file-handle that needs to be closed. In this case, you
+can register a callback with the wmem_register_callback function
+declared in wmem_user_cb.h. Every time the memory in a pool is freed, all
+registered cleanup functions are called first.
+
+Note that callback calling order is not defined, you cannot rely on a
+certain callback being called before or after another.
+
+WARNING: Manually freeing or moving memory (with wmem_free or wmem_realloc)
+ will NOT trigger any callbacks. It is an error to call either of
+ those functions on memory if you have a callback registered to deal
+ with the contents of that memory.
+
+3. Usage for Producers
+
+NB: If you're just writing a dissector, you probably don't need to read
+ this section.
+
+One of the problems with the old emem framework was that there were basically
+two allocator backends (glib and mmap) that were all mixed together in a mess
+of if statements, environment variables and #ifdefs. In wmem the different
+allocator backends are cleanly separated out, and it's up to the owner of the
+pool to pick one.
+
+3.1 Available Allocator Back-Ends
+
+Each available allocator type has a corresponding entry in the
+wmem_allocator_type_t enumeration defined in wmem_core.h. See the doxygen
+comments in that header file for details on each type.
+
+3.2 Creating a Pool
+
+To create a pool, include the regular wmem header and call the
+wmem_allocator_new() function with the appropriate type value.
+For example:
+
+ #include <wsutil/wmem/wmem.h>
+
+ wmem_allocator_t *myPool;
+ myPool = wmem_allocator_new(WMEM_ALLOCATOR_SIMPLE);
+
+From here on in, you don't need to remember which type of allocator you used
+(although allocator authors are welcome to expose additional allocator-specific
+helper functions in their headers). The "myPool" variable can be passed around
+and used as normal in allocation requests as described in section 2 of this
+document.
+
+3.3 Destroying a Pool
+
+Regardless of which allocator you used to create a pool, it can be destroyed
+with a call to the function wmem_destroy_allocator(). For example:
+
+ #include <wsutil/wmem/wmem.h>
+
+ wmem_allocator_t *myPool;
+
+ myPool = wmem_allocator_new(WMEM_ALLOCATOR_SIMPLE);
+
+ /* Allocate some memory in myPool ... */
+
+ wmem_destroy_allocator(myPool);
+
+Destroying a pool will free all the memory allocated in it.
+
+3.4 Reusing a Pool
+
+It is possible to free all the memory in a pool without destroying it,
+allowing it to be reused later. Depending on the type of allocator, doing this
+(by calling wmem_free_all()) can be significantly cheaper than fully destroying
+and recreating the pool. This method is therefore recommended, especially when
+the pool would otherwise be scoped to a single iteration of a loop. For example:
+
+ #include <wsutil/wmem/wmem.h>
+
+ wmem_allocator_t *myPool;
+
+ myPool = wmem_allocator_new(WMEM_ALLOCATOR_SIMPLE);
+ for (...) {
+
+ /* Allocate some memory in myPool ... */
+
+ /* Free the memory, faster than destroying and recreating
+ the pool each time through the loop. */
+ wmem_free_all(myPool);
+ }
+ wmem_destroy_allocator(myPool);
+
+4. Internal Design
+
+Despite being written in Wireshark's standard C90, wmem follows a fairly
+object-oriented design pattern. Although efficiency is always a concern, the
+primary goals in writing wmem were maintainability and preventing memory
+leaks.
+
+4.1 struct _wmem_allocator_t
+
+The heart of wmem is the _wmem_allocator_t structure defined in the
+wmem_allocator.h header file. This structure uses C function pointers to
+implement a common object-oriented design pattern known as an interface (also
+known as an abstract class to those who are more familiar with C++).
+
+Different allocator implementations can provide exactly the same interface by
+assigning their own functions to the members of an instance of the structure.
+The structure has eight members in three groups.
+
+4.1.1 Implementation Details
+
+ - private_data
+ - type
+
+The private_data pointer is a void pointer that the allocator implementation can
+use to store whatever internal structures it needs. A pointer to private_data is
+passed to almost all of the other functions that the allocator implementation
+must define.
+
+The type field is an enumeration of type wmem_allocator_type_t (see
+section 3.1). Its value is set by the wmem_allocator_new() function, not
+by the implementation-specific constructor. This field should be considered
+read-only by the allocator implementation.
+
+4.1.2 Consumer Functions
+
+ - walloc()
+ - wfree()
+ - wrealloc()
+
+These function pointers should be set to functions with semantics obviously
+similar to their standard-library namesakes. Each one takes an extra parameter
+that is a copy of the allocator's private_data pointer.
+
+Note that wrealloc() and wfree() are not expected to be called directly by user
+code in most cases - they are primarily optimizations for use by data
+structures that wmem might want to implement (it's inefficient, for example, to
+implement a dynamically sized array without some form of realloc).
+
+Also note that allocators do not have to handle NULL pointers or 0-length
+requests in any way - those checks are done in an allocator-agnostic way
+higher up in wmem. Allocator authors can assume that all incoming pointers
+(to wrealloc and wfree) are non-NULL, and that all incoming lengths (to walloc
+and wrealloc) are non-0.
+
+4.1.3 Producer/Manager Functions
+
+ - free_all()
+ - gc()
+ - cleanup()
+
+All of these functions take only one parameter, which is the allocator's
+private_data pointer.
+
+The free_all() function should free all the memory currently allocated in the
+pool. Note that this is not necessarily exactly the same as calling free()
+on all the allocated blocks - free_all() is allowed to do additional cleanup
+or to make use of optimizations not available when freeing one block at a time.
+
+The gc() function should do whatever it can to reduce excess memory usage in
+the dissector by returning unused blocks to the OS, optimizing internal data
+structures, etc.
+
+The cleanup() function should do any final cleanup and free any and all memory.
+It is basically the equivalent of a destructor function. For simplicity, wmem
+is guaranteed to call free_all() immediately before calling this function. There
+is no such guarantee that gc() has (ever) been called.
+
+4.2 Pool-Agnostic API
+
+One of the issues with emem was that the API (including the public data
+structures) required wrapper functions for each scope implemented. Even
+if there was a stack implementation in emem, it wasn't necessarily available
+for use with file-scope memory unless someone took the time to write se_stack_
+wrapper functions for the interface.
+
+In wmem, all public APIs take the pool as the first argument, so that they can
+be written once and used with any available memory pool. Data structures like
+wmem's stack implementation only take the pool when created - the provided
+pointer is stored internally with the data structure, and subsequent calls
+(like push and pop) will take the stack itself instead of the pool.
+
+4.3 Debugging
+
+The primary debugging control for wmem is the WIRESHARK_DEBUG_WMEM_OVERRIDE
+environment variable. If set, this value forces all calls to
+wmem_allocator_new() to return the same type of allocator, regardless of which
+type is requested normally by the code. It currently has four valid values:
+
+ - The value "simple" forces the use of WMEM_ALLOCATOR_SIMPLE. The valgrind
+ script currently sets this value, since the simple allocator is the only
+ one whose memory allocations are trackable properly by valgrind.
+
+ - The value "strict" forces the use of WMEM_ALLOCATOR_STRICT. The fuzz-test
+ script currently sets this value, since the goal when fuzz-testing is to find
+ as many errors as possible.
+
+ - The value "block" forces the use of WMEM_ALLOCATOR_BLOCK. This is not
+ currently used by any scripts, but is useful for stress-testing the block
+ allocator.
+
+ - The value "block_fast" forces the use of WMEM_ALLOCATOR_BLOCK_FAST. This is
+ not currently used by any scripts, but is useful for stress-testing the fast
+ block allocator.
+
+Note that regardless of the value of this variable, it will always be safe to
+call allocator-specific helpers functions. They are required to be safe no-ops
+if the allocator argument is of the wrong type.
+
+4.4 Testing
+
+There is a simple test suite for wmem that lives in the file wmem_test.c and
+should get automatically built into the binary 'wmem_test' when building
+Wireshark. It contains at least basic tests for all existing functionality.
+The suite is run automatically by the build-bots via the shell script
+test/test.py which calls out to test/suite_unittests.py.
+
+New features added to wmem (allocators, data structures, utility
+functions, etc.) MUST also have tests added to this suite.
+
+The test suite could potentially use a clean-up by someone more
+intimately familiar with Glib's testing framework, but it does the job.
+
+5. A Note on Performance
+
+Because of my own bad judgment, there is the persistent idea floating around
+that wmem is somehow magically faster than other allocators in the general case.
+This is false.
+
+First, wmem supports multiple different allocator backends (see sections 3 and 4
+of this document), so it is confusing and misleading to try and compare the
+performance of "wmem" in general with another system anyways.
+
+Second, any modern system-provided malloc already has a very clever and
+efficient allocator algorithm that makes use of blocks, arenas and all sorts of
+other fancy tricks. Trying to be faster than libc's allocator is generally a
+waste of time unless you have a specific allocation pattern to optimize for.
+
+Third, while there were historically arguments to be made for putting something
+in front of the kernel to reduce the number of context-switches, modern libc
+implementations should already do that. Making a dynamic library call is still
+marginally more expensive than calling a locally-defined linker-optimized
+function, but it's a difference too small to care about.
+
+With all that said, it is true that *some* of wmem's allocators can be
+substantially faster than your standard libc malloc, in *some* use cases:
+ - The BLOCK and BLOCK_FAST allocators both provide very efficient free_all
+ operations, which can be many orders of magnitude faster than calling free()
+ on each individual allocation.
+ - The BLOCK_FAST allocator in particular is optimized for Wireshark's packet
+ scope pool. It has an extremely short, well-defined lifetime, and a very
+ regular pattern of allocations; I was able to use that knowledge to beat libc
+ rather handily, *in that specific use case*.
+
+/*
+ * Editor modelines - https://www.wireshark.org/tools/modelines.html
+ *
+ * Local variables:
+ * c-basic-offset: 4
+ * tab-width: 8
+ * indent-tabs-mode: nil
+ * End:
+ *
+ * vi: set shiftwidth=4 tabstop=8 expandtab:
+ * :indentSize=4:tabSize=8:noTabs=true:
+ */