summaryrefslogtreecommitdiffstats
path: root/doc/wiki/Design.Indexes.txt
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-15 17:36:47 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-15 17:36:47 +0000
commit0441d265f2bb9da249c7abf333f0f771fadb4ab5 (patch)
tree3f3789daa2f6db22da6e55e92bee0062a7d613fe /doc/wiki/Design.Indexes.txt
parentInitial commit. (diff)
downloaddovecot-0441d265f2bb9da249c7abf333f0f771fadb4ab5.tar.xz
dovecot-0441d265f2bb9da249c7abf333f0f771fadb4ab5.zip
Adding upstream version 1:2.3.21+dfsg1.upstream/1%2.3.21+dfsg1
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/wiki/Design.Indexes.txt')
-rw-r--r--doc/wiki/Design.Indexes.txt70
1 files changed, 70 insertions, 0 deletions
diff --git a/doc/wiki/Design.Indexes.txt b/doc/wiki/Design.Indexes.txt
new file mode 100644
index 0000000..00b55a2
--- /dev/null
+++ b/doc/wiki/Design.Indexes.txt
@@ -0,0 +1,70 @@
+Dovecot's index files
+=====================
+
+Dovecot's index files consist of three different files:
+
+ * <Main index file> [Design.Indexes.MainIndex.txt] ('dovecot.index')
+ * <Transaction log> [Design.Indexes.TransactionLog.txt] ('dovecot.index.log'
+ and 'dovecot.index.log.2')
+ * <Cache file> [Design.Indexes.Cache.txt] ('dovecot.index.cache')
+
+See <IndexFiles.txt> for more generic information about what they contain and
+why.
+
+The index files can be accessed using <mail-index.h API>
+[Design.Indexes.MailIndexApi.txt].
+
+Locking
+-------
+
+The index files are designed so that readers cannot block a writer, and write
+locks are always short enough not to cause other processes to wait too long.
+Dovecot v0.99's index files didn't do this, and it was common to get lock
+timeouts when using multiple connections to the same large mailbox.
+
+The main index file is the only file which has read locks. They can however
+block the writer only for two seconds (and even this could be changed to not
+block at all). The writes are locked only for the duration of the mailbox
+synchronization.
+
+Transaction logs don't require read locks. The writing is locked for the
+duration of the mailbox synchronization, and also for single transaction
+appends.
+
+Cache files doesn't require read locks. They're locked for writing only for the
+duration of allocating space inside the file. The actual writing inside the
+allocated space is done without any locks being held.
+
+In future these could be improved even further. For example there's no need to
+keep any index files locked while synchronizing, as long the mailbox backend
+takes care of the locking issues. Also writing to transaction log could work in
+a similar way to cache files: Lock, allocate space, unlock, write.
+
+Lockless integers
+-----------------
+
+Dovecot uses several different techniques to allow reading files without
+locking them. One of them uses fields in a "lockless integer" format. Initially
+these fields have "unset" value. They can be set to a wanted value in range
+0..2^28 (with 32bit fields) once, but they cannot be changed. It would be
+possible to set them back to "unset", but setting them the second time isn't
+safe anymore, so Dovecot never does this.
+
+The lockless integers work by allocating one bit from each byte of the value to
+"this value is set" flag. The reader then verifies that the flag is set for the
+value's all bytes. If all of them aren't set, the value is still "unset".
+Dovecot uses the highest bit for this flag. So for example:
+
+ * 0x00000000: The value is unset
+ * 0xFFFF7FFF: The value is unset, because one of the bytes didn't have the
+ highest bit set
+ * 0xFFFFFFFF: The value is 2^28-1
+ * 0x80808080: The value is 0
+ * 0x80808180: The value is 0x80
+
+Dovecot contains 'mail_index_uint32_to_offset()' and
+'mail_index_offset_to_uint32()' functions to translate values between integers
+and lockless integers. The "unset" value is returned as 0, so it's not possible
+to differentiate between "unset" and "set" 0 values.
+
+(This file was created from the wiki on 2019-06-19 12:42)