summaryrefslogtreecommitdiffstats
path: root/doc/wiki/Design.Code.txt
blob: 5fbddbacf2bcb46a77490b9b38fe76cf72af339f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
Code Design
===========

Generally Dovecot follows Linux kernel coding style
[https://www.kernel.org/doc/Documentation/CodingStyle].

Most of the coding style design is about getting as many compiler warnings and
errors as possible. Dovecot already has some patched-Clang-specific features to
get more warnings, and will likely have more in future. Issues found by static
analyzers should also be fixed in some way, e.g. adding extra asserts.

Runtime errors should be caught by asserts or NULL pointer dereferences, which
cleanly crash the program instead of it continuing and possibly corrupting data
or causing other bad things. Dovecot's master process restarts the crashed
processes anyway. Obviously the master process should be very careful to avoid
crashing, but even then in some OSes a crashed Dovecot master gets restarted by
the init process.

Dovecot's bottlenecks are primarily disk I/O and secondarily memory usage, so
using extra CPU for asserts and other extra checks costs practically nothing.

Types
-----

 * Unsigned types are used whenever the value isn't allowed to be negative.
   This makes it easier to do "value too large" checks when you don't also have
   to check for negative values. Also in arithmetic it's better to have the
   value wrap (and hopefully checked later!) than cause undefined behavior with
   a signed integer overflow.
 * 'char *' always points to a NUL-terminated string, 'unsigned char *'
   doesn't.
 * 'size_t' should generally be used when pointing to a large memory area,
   especially for 'mmap()'. Since 'size_t' can be slower to access than
   'unsigned int' (or at least use memory), it's fine to use 'unsigned int'
   when it's "very unlikely" that the size ever goes beyond 4 GB
   (e.g.'string_t').
 * 'uoff_t' is used for file offsets/sizes. This is usually 64bit, even with
   32bit machines.
 * 'uint32_t' vs. 'unsigned int': Use 'uint32_t' when the type really should be
   32bit, but don't spend too much energy trying to avoid mixing it with
   'unsigned int', since they are going to be the same types probably for the
   rest of Dovecot's life..
 * 'uint8_t' vs. 'unsigned char': I doubt Dovecot will ever be compiled
   anywhere where these differ from one another, but for readability use
   'uint8_t' for binary data and 'unsigned char' for text data.

Function parameters
-------------------

 * Try to avoid using/allowing NULL pointers. For example for a public API
   instead of having 'foo(struct bar *bar)' where bar can be NULL, you could
   have both 'foo(void)' and 'foo_with_bar(struct bar *bar)'. Of course don't
   try too hard if it makes the API otherwise ugly.
    * Dovecot v2.2+ marks all such parameters with ATTR_NULL. These can be
      verified with a patched clang. Unfortunately the API isn't very nice, it
      would be better to mark each parameter separately with ATTR_NULL instead
      of having a numbered list. This will likely change in future.
    * Also unless you know that a parameter can be NULL, don't bother wasting
      code on checking if it is. Ideally those are noticed by the patched clang
      check, but even if not, it's not that bad to crash on NULL pointer
      dereference.
 * '_r' suffix in parameter is used for values that are returned (e.g.
   'foo(const char **error_r)').
 * Use const for pointers pretty much whenever possible.
 * Some day in future I'd like compiler to give warnings if function parameters
   are modified by the function itself. There's probably a lot of code that
   does this currently, but try avoiding it in future.

Function return values
----------------------

 * 'int' type is commonly used for return values. Usually this is used to mean
   one of:
    * -1 = error, 0 = ok
    * -1 = error, 0 = unfinished (e.g. non-blocking call), 1 = finished
    * unfortunately there are also other uses. I've been wondering about using
      different types for these some day, such as "err" and "err3", but haven't
      figured out an easy enough to use design, especially one where compiler
      verifies that types aren't accidentally mixed.
 * 'bool' shouldn't be used as return value for ok/error, use int 0/-1 instead.
   (Old code has some of these. Sometimes it's also not quite clear if
   something is an "error" or not, so this rule isn't perfect.)
 * Functions returning pointers shouldn't use NULL pointer to mean an error,
   only for things like "not found". For example instead of 'if ((ctx = init())
   == NULL)' use 'if (init(&ctx) < 0)'
 * Usually when calling a function, either save/check the return value or
   explicitly say that you don't care about it by saying '(void)func();'
    * Some functions where the return value can nearly always be ignored it's
      annoying to add the '(void)' prefix. You can avoid these by adding
      ATTR_NOWARN_UNUSED_RESULT to such function's prototype. A patched clang
      can be used to find missing return value checks.
 * If a system call fails unexpectedly, always log an error about it, possibly
   even kill the process if it's vital for correct functionality
   (e.g.'gettimeofday()'). '%m' in Dovecot's 'printf()'-style functions expands
   to 'strerror(errno)'.

Boolean expressions
-------------------

Try to use boolean expressions the way they work in Java. C doesn't require
this, but I think it makes the code easier to understand and reduces bugs in
some cases (e.g.'if (!foo())' when thinking foo() returns bool/FALSE, but
actually returns int/-1 on error). We've a clang patch
[http://dovecot.org/patches/clang/] to give warnings in these cases. As
expected, it found quite a lot of bugs (some real bugs and a lot of "it just
accidentally worked"
[https://github.com/dovecot/core/commit/d9a7e950a9cd21f2b4a90ec7759fca9e8fcc7995]).

 * 'bool x' and 'bool x:1' are the boolean types
 * TRUE and FALSE are the only valid explicit boolean values (not 0 or 1, and
   currently also not true/false although that could be changed)
 * !=, ==, <, >, etc. comparisons create a boolean
 * if, for, while, etc. require a boolean

So:

 * 'if (!ptr)' -> 'if (ptr == NULL)'
 * 'if (!num)' -> 'if (num == 0)'
 * 'if (flags & FLAG_FOO)' -> 'if ((flags & FLAG_FOO) != 0)'

Memory
------

Memory is always allocated through one of Dovecot's wrappers, e.g. 'i_malloc()'
or 'i_new()'. All of Dovecot's memory allocations always succeed or kill the
process. There's no point in writing a lot of code to check for memory
allocation failures that happen just about never. The only reason some memory
allocations fail in Dovecot currently is because a process VSZ limit is
reached, which usually indicates either a memory leak or trying to access a
mailbox that is too large. In either of these cases it's better to just
completely restart the process than try to limp along without getting anything
useful done anymore.

Memory allocations can be assumed to be zero-initialized. All of the memory
allocation functions do it, except 't_malloc()' and 't_buffer_get()', which you
should almost never use directly anyway. The code currently also assumes that
pointers in zero-initialized memory area are NULL, which isn't guaranteed by
ANSI-C, but practically Dovecot isn't going to be run in systems where it's not
true and you're not going to remember to NULL-initialize all of your pointers
anyway without compiler/runtime failure.

When using a struct, always zero-initialize it with 'memset()' instead of
setting each field separately to 0. It's too easy to cause bugs by adding a new
field to the struct and forgetting to initialize it.

Double-frees in C are bad. Better to crash with NULL pointer dereference than
possibly allow an attacker to exploit heap corruption and run executable code.
Most of the pointers are set to NULL when they are freed:

 * 'i_free_and_null()' is a macro to free the memory and set the pointer to
   NULL.
 * 'i_free()' also currently sets the pointer to NULL, but another thought was
   to set this pointing to some other invalid pointer. But arguably this should
   be defined to be exactly the same as 'i_free_and_null()'.
 * Most deinit() functions take a pointer-to-pointer parameter and set the
   original one to NULL. There's no need to explicitly set the same pointer to
   NULL afterwards.

Buffers
-------

Use dynamically growing strings/buffers wherever necessary instead of a static
sized buffer, where on larger input the function fails or truncates the data.
It's of course not good to allow users to infinitely grow memory usage, so
there should be some limits added, but it shouldn't fail even if the limit is
set to infinite.

Avoid explicitly calculating memory usage for allocations. If you do, mark it
with '/* @UNSAFE */' comment unless it's the calculation is "so obvious that
you see it's correct at the first glance". If in doubt, just mark it UNSAFE.
The idea is that anyone can easily grep for these and verify their correctness.

Type safety
-----------

 * Try to avoid using void pointers.
 * Try to avoid casting types to other types, especially if the cast isn't
   necessary to avoid a compiler warning.

It's better if compiler can give a warning when something is accidentally used
wrong.

Callback functions
------------------

Callback functions make the code more difficult to follow, especially when a
callback calls another callback, or when using function pointers to jump to
different callbacks depending on state. Of course with asynchronous C code it's
pretty much impossible to avoid callbacks. Still, try to avoid them where
possible to keep the code readable. lib-fs/fs-api.h is an example API which
supports async operations but with a single common "do more now" callback
rather than every single operation having its own callback parameter. This
makes it similar to async network IO with read()/write() EAGAIN handling.

Often callback functions can be avoided by creating iterator functions instead.
For example instead of 'parse(callback, context' use 'ctx = parse_init(); while
(parse_next(ctx)) { .. } parse_deinit(&ctx);'

Dovecot has some helper macros to make callbacks' context parameters type-safe.
In v2.2+ see 'CALLBACK_TYPECHECK()' macro and for example 'io_add()' for
example usage.

(This file was created from the wiki on 2019-06-19 12:42)