summaryrefslogtreecommitdiffstats
path: root/doc/wiki/Design.Indexes.MainIndex.txt
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-15 17:36:47 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-15 17:36:47 +0000
commit0441d265f2bb9da249c7abf333f0f771fadb4ab5 (patch)
tree3f3789daa2f6db22da6e55e92bee0062a7d613fe /doc/wiki/Design.Indexes.MainIndex.txt
parentInitial commit. (diff)
downloaddovecot-0441d265f2bb9da249c7abf333f0f771fadb4ab5.tar.xz
dovecot-0441d265f2bb9da249c7abf333f0f771fadb4ab5.zip
Adding upstream version 1:2.3.21+dfsg1.upstream/1%2.3.21+dfsg1
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/wiki/Design.Indexes.MainIndex.txt')
-rw-r--r--doc/wiki/Design.Indexes.MainIndex.txt284
1 files changed, 284 insertions, 0 deletions
diff --git a/doc/wiki/Design.Indexes.MainIndex.txt b/doc/wiki/Design.Indexes.MainIndex.txt
new file mode 100644
index 0000000..db0fb0f
--- /dev/null
+++ b/doc/wiki/Design.Indexes.MainIndex.txt
@@ -0,0 +1,284 @@
+Main index
+==========
+
+The main index can be used to quickly look up messages' UIDs, flags, keywords
+and extension-specific data, such as cache file or mbox file offsets.
+
+Reading, writing and locking
+----------------------------
+
+Reading 'dovecot.index' file requires locking, unfortunately. Shared read
+locking is done using the standard index locking method specified in
+'lock_method' setting ('lock_method' parameter for 'mail_index_open()').
+
+Writing to index files requires transaction log to be exclusively locked first.
+This way the index locking only has to worry about existing read locks. The
+locking works by first trying to lock the index with the standard locking
+method, but if it couldn't acquire the lock in two seconds, it'll fallback to
+copying the index file to a temporary file, and when unlocking it'll 'rename()'
+the temporary file over the 'dovecot.index' file. Note that this is safe only
+because of the exclusive transaction log lock. This way the writers are never
+blocked by readers who are allowed to keep the shared lock as long as they
+want.
+
+The copy-locking is used always when doing anything that could corrupt the
+index file if it crashed in the middle of an operation. For example if the
+header or record size changes, or if messages are expunged. New messages can be
+appended however, because the message count in the header is updated last.
+Expunging the last messages would probably be safe also (because only the
+header needs updating), but it's not done currently.
+
+The index file should never be directly modified. Everything should go through
+the transaction log, and the only time the index needs to be write-locked is
+when transactions are written to it.
+
+Currently the index file is updated whenever the backend mailbox is
+synchronized. This isn't necessary, because an old index file can be updated
+using the transaction log. In future there could be some smarter decisions
+about when writing to the index isn't worth the extra disk writes.
+
+Header
+------
+
+---%<-------------------------------------------------------------------------
+struct mail_index_header {
+ uint8_t major_version;
+ uint8_t minor_version;
+
+ uint16_t base_header_size;
+ uint32_t header_size;
+ uint32_t record_size;
+
+ uint8_t compat_flags;
+ uint8_t unused[3];
+
+ uint32_t indexid;
+ uint32_t flags;
+
+ uint32_t uid_validity;
+ uint32_t next_uid;
+
+ uint32_t messages_count;
+ uint32_t unused_old_recent_messages_count;
+ uint32_t seen_messages_count;
+ uint32_t deleted_messages_count;
+
+ uint32_t first_recent_uid;
+ uint32_t first_unseen_uid_lowwater;
+ uint32_t first_deleted_uid_lowwater;
+
+ uint32_t log_file_seq;
+ uint32_t log_file_tail_offset;
+ uint32_t log_file_head_offset;
+
+ uint64_t unused_old_sync_size;
+ uint32_t unused_old_sync_stamp;
+
+ uint32_t day_stamp;
+ uint32_t day_first_uid[8];
+}
+---%<-------------------------------------------------------------------------
+
+Fields that won't change without recreating the index:
+
+major_version:
+ If this doesn't match 'MAIL_INDEX_MAJOR_VERSION', don't try to read the
+ index. Dovecot recreates the index file then.
+
+minor_version:
+ If this doesn't match 'MAIL_INDEX_MINOR_VERSION' there are some backwards
+ compatible changes in the index file (typically header fields). Try to
+ preserve the headers and the minor version when updating the index file.
+
+base_header_size:
+ Extension headers begin after the base headers. This is normally the same as
+ 'sizeof(struct mail_index_header)'.
+
+header_size:
+ Records begin after base and extension headers.
+
+record_size:
+ Size of each record and its extensions. Initially the same as 'sizeof(struct
+ mail_index_record)'.
+
+compat_flags:
+ Currently there is just one compatibility flag:
+ 'MAIL_INDEX_COMPAT_LITTLE_ENDIAN'. Dovecot doesn't try to bother to read
+ different endianess files, they're simply recreated.
+
+indexid:
+ Unique index file ID. This is used to make sure that the main index,
+ transaction log and cache file are all part of the same index.
+
+Header flags:
+
+MAIL_INDEX_HDR_FLAG_CORRUPTED:
+ Set whenever the index file is found to be corrupted. If the reader notices
+ this flag, it shouldn't try to continue using the index.
+
+MAIL_INDEX_HDR_FLAG_HAVE_DIRTY:
+ This index has records with 'MAIL_INDEX_MAIL_FLAG_DIRTY' flag set.
+
+MAIL_INDEX_HDR_FLAG_FSCK:
+ Call 'mail_index_fsck()' as soon as possible. This flag isn't actually set
+ anywhere currently.
+
+Message UIDs and counters:
+
+uid_validity:
+ IMAP UIDVALIDITY field. Initially can be 0, but after it's set we don't
+ currently try to even handle the case of UIDVALIDITY changing. It's done by
+ marking the index file corrupted and recreating it. That's a bit ugly, but
+ typically the UIDVALIDITY never changes.
+
+next_uid:
+ UID given to the next appended message. Only increases.
+
+messages_count:
+ Number of records in the index file.
+
+recent_messages_count:
+ Number of records with 'MAIL_RECENT' flag set.
+
+seen_messages_count:
+ Number of records with 'MAIL_SEEN' flag set.
+
+deleted_messages_count:
+ Number of records with 'MAIL_DELETED' flag set.
+
+first_recent_uid_lowwater:
+ There are no UIDs lower than this with 'MAIL_RECENT' flag set.
+
+first_unseen_uid_lowwater:
+ There are no UIDs lower than this *without* 'MAIL_SEEN' flag set.
+
+first_deleted_uid_lowwater:
+ There are no UIDs lower than this with 'MAIL_DELETE' flag set.
+
+The lowwater fields are used to optimize searching messages with/without a
+specific flag.
+
+Fields related to syncing:
+
+log_file_seq:
+ Log file the log_*_offset fields point to.
+
+log_file_int_offset, log_file_ext_offset:
+ All the internal/external transactions before this offset in the log file are
+ synced to the index. External transactions are synced more often than
+ internal, so 'log_file_int_offset' <= 'log_file_ext_offset'.
+
+sync_size, sync_stamp:
+ Used by the mailbox backends to store their synchronization information. Some
+ day these should be removed and replaced with extension headers.
+
+Then there are day fields:
+
+day_stamp:
+ UNIX timestamp to the beginning of the day when new records were last added
+ to the index file.
+
+day_first_uid[8]:
+ These fields are updated when 'day_stamp' < today. The [0..6] are first moved
+ to [1..7], then [0] is set to the first appended UID. So they contain the
+ first UID of the day for last 8 days when messages were appended.
+
+The 'day_first_uid[]' fields are used by cache file compression to decide when
+to drop 'MAIL_CACHE_DECISION_TEMP' data.
+
+Extension headers
+-----------------
+
+After the base header comes a list of extensions and their headers. The first
+extension begins from 'mail_index_header.base_header_size' offset. The second
+begins after the first one's 'data[]' and so on. The extensions always begin
+64bit aligned however, so you may need to skip a few bytes always. Read the
+extensions as long as the offset is smaller than
+'mail_index_header.header_size'.
+
+---%<-------------------------------------------------------------------------
+struct mail_index_ext_header {
+ uint32_t hdr_size; /* size of data[] */
+ uint32_t reset_id;
+ uint16_t record_offset;
+ uint16_t record_size;
+ uint16_t record_align;
+ uint16_t name_size;
+ /* unsigned char name[name_size] */
+ /* unsigned char data[hdr_size] (starting 64bit aligned) */
+};
+---%<-------------------------------------------------------------------------
+
+'reset_id', record offset, size and alignment is explained in
+<Design.Indexes.TransactionLog.txt>'s 'struct mail_transaction_ext_intro'.
+
+Records
+-------
+
+There are 'hdr.messages_count' records in the file. Each record contains at
+least two fields: Record UID and flags. The UID is always increasing for the
+records, so it's possible to find a record by its UID with binary search. The
+record size is specified by 'mail_index_header.record_size'.
+
+The flags are a combination of 'enum mail_flags' and 'enum
+mail_index_mail_flags'. There exists only one index flag currently:
+'MAIL_INDEX_MAIL_FLAG_DIRTY'. If a record has this flag set, it means that the
+mailbox syncing code should ignore the flag in the mailbox and use the flag in
+the index file instead. This is used for example with mbox and
+'mbox_lazy_writes=yes'. It also allows having modifiable flags for read-only
+mailboxes.
+
+The rest data is stored in record extensions.
+
+Keywords
+--------
+
+The keywords are stored in record extensions, but for better performance and
+lower disk space usage in transaction logs, they are quite tightly integrated
+to the index file code.
+
+The list of keywords is stored in "keywords" extension header:
+
+---%<-------------------------------------------------------------------------
+struct mail_index_keyword_header {
+ uint32_t keywords_count;
+ /* struct mail_index_keyword_header_rec[] */
+ /* char name[][] */
+};
+struct mail_index_keyword_header_rec {
+ uint32_t unused; /* for backwards compatibility */
+ uint32_t name_offset; /* relative to beginning of name[] */
+};
+---%<-------------------------------------------------------------------------
+
+The unused field originally contained 'count' field, but while writing this
+documentation I noticed it's not actually used anywhere. Apparently it was
+added there accidentally. It'll be removed in later versions.
+
+So there exists 'keywords_count' keywords, each listed in a NUL-terminated
+string beginning from 'name_offset'.
+
+Since crashing in the middle of updating the keywords list pretty much breaks
+the keywords, adding new keywords causes the index file to be always copied to
+a temporary file and be replaced.
+
+The keywords in the records are stored in a "keywords" extension bitfield. So
+the nth bit in the bitfield points to the nth keyword listed in the header.
+
+It's not currently possible to safely remove existing keywords.
+
+Extensions
+----------
+
+The extensions only specify their wanted size and alignment, the index file
+syncing code is free to assign any offset inside the record to them. The
+extensions may be reordered at any time.
+
+Dovecot's current extension ordering code works pretty well, but it's not
+perfect. If the extension size isn't the same as its alignment, it may create
+larger records than necessary. This will be fixed later.
+
+The records size is always divisible by the maximum alignment requirement. This
+isn't strictly necessary either, so it could be fixed later as well.
+
+(This file was created from the wiki on 2019-06-19 12:42)