summaryrefslogtreecommitdiffstats
path: root/doc/wiki/Design.Memory.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/wiki/Design.Memory.txt')
-rw-r--r--doc/wiki/Design.Memory.txt193
1 files changed, 193 insertions, 0 deletions
diff --git a/doc/wiki/Design.Memory.txt b/doc/wiki/Design.Memory.txt
new file mode 100644
index 0000000..64ad013
--- /dev/null
+++ b/doc/wiki/Design.Memory.txt
@@ -0,0 +1,193 @@
+Memory Allocations
+==================
+
+C language requires explicitly allocating and freeing memory. The main two
+problems with this are:
+
+ 1. A lot of allocations and frees cause memory fragmentation. The longer a
+ process runs, the more it could have leaked memory because there are tiny
+ unused free spots all around in heap.
+ 2. Freeing memory is easy to forget, causing memory leaks. Sometimes it can be
+ accidentally done multiple times, causing a potential security hole. A lot
+ of free() calls all around in the code also makes the code more difficult
+ to read and write.
+
+The second problem could be solved with Boehm garbage collector, which Dovecot
+can use optionally (prior to 2.3), but it's not very efficient. It also doesn't
+help with the first problem.
+
+To reduce the problems caused by these issues, Dovecot has several ways to do
+memory management.
+
+Common Design Decisions
+-----------------------
+
+All memory allocations (with some exceptions in data stack) return memory
+filled with NULs. This is also true for new memory when growing an allocated
+memory with realloc. The zeroing reduces accidental use of uninitialized memory
+and makes the code simpler since there is no need to explicitly set all fields
+in allocated structs to zero/NULL. (I guess assuming that this works correctly
+for NULLs isn't strictly ANSI-C compliant, but I don't see this assumption
+breaking in any system anyone would really use Dovecot.) The zeroing is cheap
+anyway.
+
+In out-of-memory situations memory allocation functions die internally by
+calling 'i_fatal_status(FATAL_OUTOFMEM, ..)'. There are several reasons for
+this:
+
+ * Trying to handle malloc() failures explicitly would add a lot of error
+ handling code paths and make the code much more complex than necessary.
+ * In most systems malloc() rarely actually fails because the system has run
+ out of memory. Instead the kernel will just start killing processes.
+ * Typically when malloc() does fail, it's because the process's address space
+ limit is reached. Dovecot enforces these limits by default. Reaching it
+ could mean that the process was leaking memory and it should be killed. It
+ could also mean that the process is doing more work than anticipated and
+ that the limit should probably be increased.
+ * Even with perfect out-of-memory handling, the result isn't much better
+ anyway than the process dying. User isn't any happier by seeing "out of
+ memory" error than "server disconnected".
+
+When freeing memory, most functions usually also change the pointer to NULL.
+This is also the reason why most APIs' deinit functions take pointer-to-pointer
+parameter, so that when they're done they can change the original pointer to
+NULL.
+
+malloc() Replacements
+---------------------
+
+'lib/imem.h' has replacements for all the common memory allocation functions:
+
+ * 'malloc', 'calloc' -> 'i_malloc()'
+ * 'realloc()' -> 'i_realloc()'
+ * 'strdup()' -> 'i_strdup()'
+ * 'free()' -> 'i_free'
+ * etc.
+
+All memory allocation functions that begin with 'i_' prefix require that the
+memory is later freed with 'i_free()'. If you actually want the freed pointer
+to be set to NULL, use 'i_free_and_null()'. Currently 'i_free()' also changes
+the pointer to NULL, but in future it might change to something else.
+
+Memory Pools
+------------
+
+'lib/mempool.h' defines API for allocating memory through memory pools. All
+memory allocations actually go through memory pools. Even the 'i_*()' functions
+get called through 'default_pool', which by default is 'system_pool' but can be
+changed to another pool if wanted. All memory allocation functions that begin
+with 'p_' prefix have a memory pool parameter, which it uses to allocate the
+memory.
+
+Dovecot has many APIs that require you to specify a memory pool. Usually (but
+not always) they don't bother freeing any memory from the pool, instead they
+assume that more memory can be just allocated from the pool and the whole pool
+is freed later. These pools are usually alloconly-pools, but can also be data
+stack pools. See below.
+
+Alloc-only Pools
+----------------
+
+'pool_alloconly_create()' creates an allocate-only memory pool with a given
+initial size.
+
+As the name says, alloconly-pools only support allocating more memory. As a
+special case its last allocation can be freed.'p_realloc()' also tries to grow
+the existing allocation only if it's the last allocation, otherwise it'll just
+allocates new memory area and copies the data there.
+
+Initial memory pool sizes are often optimized in Dovecot to be set large enough
+that in most situations the pool doesn't need to be grown. To make this easier,
+when Dovecot is configured with --enable-devel-checks, it logs a warning each
+time a memory pool is grown. The initial pool size shouldn't of course be made
+too large, so usually I just pick some small initial guessed value and later if
+I get too many "growing memory pool" warnings I start growing the pool sizes.
+Sometimes there's just no good way to set the initial pool size and avoid the
+warnings, in that situation you can prefix the pool's name with MEMPOOL_GROWING
+and it doesn't log warnings.
+
+Alloconly-pools are commonly used for an object that builds its state from many
+memory allocations, but doesn't change (much of) its state. It's a lot easier
+when you can do a lot of small memory allocations and when destroying the
+object you'll just free the memory pool.
+
+Data Stack
+----------
+
+'lib/data-stack.h' describes the low-level data stack functions. Data stack
+works a bit like C's control stack.'alloca()' is quite near to what it does,
+but there's one major difference: In data stack the stack frames are explicitly
+defined, so functions can return values allocated from data
+stack.'t_strdup_printf()' call is an excellent example of why this is useful.
+Rather than creating some arbitrary sized buffer and using snprintf(), which
+might truncate the value, you can just use t_strdup_printf() without worrying
+about buffer sizes being large enough.
+
+Try to keep the allocations from data stack small, since the data stack's
+highest memory usage size is kept for the rest of the process's lifetime. The
+initial data stack size is 32kB, which should be enough in normal use. If
+Dovecot is configured with --enable-devel-checks, it logs a warning each time
+the data stack needs to be grown.
+
+Stack frames are preferably created using T_BEGIN/T_END block, for example:
+
+---%<-------------------------------------------------------------------------
+T_BEGIN {
+ string_t *str1 = t_str_new(256);
+ string_t *str2 = t_str_new(256);
+ /* .. */
+} T_END;
+---%<-------------------------------------------------------------------------
+
+In the above example two strings are allocated from data stack. They get freed
+once the code goes past T_END, that's why the variables are preferably declared
+inside the T_BEGIN/T_END block so they won't accidentally be used after they're
+freed.
+
+T_BEGIN and T_END expand to 't_push()' and 't_pop()' calls and they must be
+synchronized. Returning from the block without going past T_END is going to
+cause Dovecot to panic in next T_END call with "Leaked t_pop() call" error.
+
+Memory allocations have similar disadvantages to alloc-only memory pools.
+Allocations can't be grown, so with the above example if str1 grows past 256
+characters, it needs to be reallocated, which will cause it to forget about the
+original 256 bytes and allocate 512 bytes more.
+
+Memory allocations from data stack often begin with 't_' prefix, meaning
+"temporary". There are however many other functions that allocate memory from
+data stack without mentioning it. Memory allocated from data stack is usually
+returned as a const pointer, so that the caller doesn't try to free it (which
+would cause a compiler warning).
+
+When should T_BEGIN/T_END used and when not? This is kind of black magic. In
+general they shouldn't be used unless it's really necessary, because they make
+the code more complex. But if the code is going through loops with many
+iterations, where each iteration is allocating memory from data stack, running
+each iteration inside its own stack frame would be a good idea to avoid
+excessive memory usage. It's also difficult to guess how public APIs are being
+used, so I've tried to make such API functions use their own private stack
+frames. Dovecot's ioloop code also wraps all I/O callbacks and timeout
+callbacks into their own stack frames, so you don't need to worry about them.
+
+You can create temporary memory pools from data stack too. Usually you should
+be calling 'pool_datastack_create()' to generate a new pool, which also tries
+to track that it's not being used unsafely across different stack frames. Some
+low-level functions can also use 'unsafe_data_stack_pool' as the pool, which
+doesn't do such tracking.
+
+Data stack's advantages over malloc():
+
+ * FAST, most of the time allocating memory means only updating a couple of
+ pointers and integers. Freeing memory all at once also is a fast operation.
+ * No need to 'free()' each allocation resulting in prettier code
+ * No memory leaks
+ * No memory fragmentation
+
+It also has some disadvantages:
+
+ * Allocating memory inside loops can accidentally allocate a lot of memory
+ * Memory allocated from data stack can be accidentally stored into a permanent
+ location and accessed after it's already been freed.
+ * Debugging invalid memory usage may be difficult using existing tools
+
+(This file was created from the wiki on 2019-06-19 12:42)