diff options
Diffstat (limited to 'doc/wiki/Design.Indexes.MainIndex.txt')
-rw-r--r-- | doc/wiki/Design.Indexes.MainIndex.txt | 284 |
1 files changed, 284 insertions, 0 deletions
diff --git a/doc/wiki/Design.Indexes.MainIndex.txt b/doc/wiki/Design.Indexes.MainIndex.txt new file mode 100644 index 0000000..db0fb0f --- /dev/null +++ b/doc/wiki/Design.Indexes.MainIndex.txt @@ -0,0 +1,284 @@ +Main index +========== + +The main index can be used to quickly look up messages' UIDs, flags, keywords +and extension-specific data, such as cache file or mbox file offsets. + +Reading, writing and locking +---------------------------- + +Reading 'dovecot.index' file requires locking, unfortunately. Shared read +locking is done using the standard index locking method specified in +'lock_method' setting ('lock_method' parameter for 'mail_index_open()'). + +Writing to index files requires transaction log to be exclusively locked first. +This way the index locking only has to worry about existing read locks. The +locking works by first trying to lock the index with the standard locking +method, but if it couldn't acquire the lock in two seconds, it'll fallback to +copying the index file to a temporary file, and when unlocking it'll 'rename()' +the temporary file over the 'dovecot.index' file. Note that this is safe only +because of the exclusive transaction log lock. This way the writers are never +blocked by readers who are allowed to keep the shared lock as long as they +want. + +The copy-locking is used always when doing anything that could corrupt the +index file if it crashed in the middle of an operation. For example if the +header or record size changes, or if messages are expunged. New messages can be +appended however, because the message count in the header is updated last. +Expunging the last messages would probably be safe also (because only the +header needs updating), but it's not done currently. + +The index file should never be directly modified. Everything should go through +the transaction log, and the only time the index needs to be write-locked is +when transactions are written to it. + +Currently the index file is updated whenever the backend mailbox is +synchronized. This isn't necessary, because an old index file can be updated +using the transaction log. In future there could be some smarter decisions +about when writing to the index isn't worth the extra disk writes. + +Header +------ + +---%<------------------------------------------------------------------------- +struct mail_index_header { + uint8_t major_version; + uint8_t minor_version; + + uint16_t base_header_size; + uint32_t header_size; + uint32_t record_size; + + uint8_t compat_flags; + uint8_t unused[3]; + + uint32_t indexid; + uint32_t flags; + + uint32_t uid_validity; + uint32_t next_uid; + + uint32_t messages_count; + uint32_t unused_old_recent_messages_count; + uint32_t seen_messages_count; + uint32_t deleted_messages_count; + + uint32_t first_recent_uid; + uint32_t first_unseen_uid_lowwater; + uint32_t first_deleted_uid_lowwater; + + uint32_t log_file_seq; + uint32_t log_file_tail_offset; + uint32_t log_file_head_offset; + + uint64_t unused_old_sync_size; + uint32_t unused_old_sync_stamp; + + uint32_t day_stamp; + uint32_t day_first_uid[8]; +} +---%<------------------------------------------------------------------------- + +Fields that won't change without recreating the index: + +major_version: + If this doesn't match 'MAIL_INDEX_MAJOR_VERSION', don't try to read the + index. Dovecot recreates the index file then. + +minor_version: + If this doesn't match 'MAIL_INDEX_MINOR_VERSION' there are some backwards + compatible changes in the index file (typically header fields). Try to + preserve the headers and the minor version when updating the index file. + +base_header_size: + Extension headers begin after the base headers. This is normally the same as + 'sizeof(struct mail_index_header)'. + +header_size: + Records begin after base and extension headers. + +record_size: + Size of each record and its extensions. Initially the same as 'sizeof(struct + mail_index_record)'. + +compat_flags: + Currently there is just one compatibility flag: + 'MAIL_INDEX_COMPAT_LITTLE_ENDIAN'. Dovecot doesn't try to bother to read + different endianess files, they're simply recreated. + +indexid: + Unique index file ID. This is used to make sure that the main index, + transaction log and cache file are all part of the same index. + +Header flags: + +MAIL_INDEX_HDR_FLAG_CORRUPTED: + Set whenever the index file is found to be corrupted. If the reader notices + this flag, it shouldn't try to continue using the index. + +MAIL_INDEX_HDR_FLAG_HAVE_DIRTY: + This index has records with 'MAIL_INDEX_MAIL_FLAG_DIRTY' flag set. + +MAIL_INDEX_HDR_FLAG_FSCK: + Call 'mail_index_fsck()' as soon as possible. This flag isn't actually set + anywhere currently. + +Message UIDs and counters: + +uid_validity: + IMAP UIDVALIDITY field. Initially can be 0, but after it's set we don't + currently try to even handle the case of UIDVALIDITY changing. It's done by + marking the index file corrupted and recreating it. That's a bit ugly, but + typically the UIDVALIDITY never changes. + +next_uid: + UID given to the next appended message. Only increases. + +messages_count: + Number of records in the index file. + +recent_messages_count: + Number of records with 'MAIL_RECENT' flag set. + +seen_messages_count: + Number of records with 'MAIL_SEEN' flag set. + +deleted_messages_count: + Number of records with 'MAIL_DELETED' flag set. + +first_recent_uid_lowwater: + There are no UIDs lower than this with 'MAIL_RECENT' flag set. + +first_unseen_uid_lowwater: + There are no UIDs lower than this *without* 'MAIL_SEEN' flag set. + +first_deleted_uid_lowwater: + There are no UIDs lower than this with 'MAIL_DELETE' flag set. + +The lowwater fields are used to optimize searching messages with/without a +specific flag. + +Fields related to syncing: + +log_file_seq: + Log file the log_*_offset fields point to. + +log_file_int_offset, log_file_ext_offset: + All the internal/external transactions before this offset in the log file are + synced to the index. External transactions are synced more often than + internal, so 'log_file_int_offset' <= 'log_file_ext_offset'. + +sync_size, sync_stamp: + Used by the mailbox backends to store their synchronization information. Some + day these should be removed and replaced with extension headers. + +Then there are day fields: + +day_stamp: + UNIX timestamp to the beginning of the day when new records were last added + to the index file. + +day_first_uid[8]: + These fields are updated when 'day_stamp' < today. The [0..6] are first moved + to [1..7], then [0] is set to the first appended UID. So they contain the + first UID of the day for last 8 days when messages were appended. + +The 'day_first_uid[]' fields are used by cache file compression to decide when +to drop 'MAIL_CACHE_DECISION_TEMP' data. + +Extension headers +----------------- + +After the base header comes a list of extensions and their headers. The first +extension begins from 'mail_index_header.base_header_size' offset. The second +begins after the first one's 'data[]' and so on. The extensions always begin +64bit aligned however, so you may need to skip a few bytes always. Read the +extensions as long as the offset is smaller than +'mail_index_header.header_size'. + +---%<------------------------------------------------------------------------- +struct mail_index_ext_header { + uint32_t hdr_size; /* size of data[] */ + uint32_t reset_id; + uint16_t record_offset; + uint16_t record_size; + uint16_t record_align; + uint16_t name_size; + /* unsigned char name[name_size] */ + /* unsigned char data[hdr_size] (starting 64bit aligned) */ +}; +---%<------------------------------------------------------------------------- + +'reset_id', record offset, size and alignment is explained in +<Design.Indexes.TransactionLog.txt>'s 'struct mail_transaction_ext_intro'. + +Records +------- + +There are 'hdr.messages_count' records in the file. Each record contains at +least two fields: Record UID and flags. The UID is always increasing for the +records, so it's possible to find a record by its UID with binary search. The +record size is specified by 'mail_index_header.record_size'. + +The flags are a combination of 'enum mail_flags' and 'enum +mail_index_mail_flags'. There exists only one index flag currently: +'MAIL_INDEX_MAIL_FLAG_DIRTY'. If a record has this flag set, it means that the +mailbox syncing code should ignore the flag in the mailbox and use the flag in +the index file instead. This is used for example with mbox and +'mbox_lazy_writes=yes'. It also allows having modifiable flags for read-only +mailboxes. + +The rest data is stored in record extensions. + +Keywords +-------- + +The keywords are stored in record extensions, but for better performance and +lower disk space usage in transaction logs, they are quite tightly integrated +to the index file code. + +The list of keywords is stored in "keywords" extension header: + +---%<------------------------------------------------------------------------- +struct mail_index_keyword_header { + uint32_t keywords_count; + /* struct mail_index_keyword_header_rec[] */ + /* char name[][] */ +}; +struct mail_index_keyword_header_rec { + uint32_t unused; /* for backwards compatibility */ + uint32_t name_offset; /* relative to beginning of name[] */ +}; +---%<------------------------------------------------------------------------- + +The unused field originally contained 'count' field, but while writing this +documentation I noticed it's not actually used anywhere. Apparently it was +added there accidentally. It'll be removed in later versions. + +So there exists 'keywords_count' keywords, each listed in a NUL-terminated +string beginning from 'name_offset'. + +Since crashing in the middle of updating the keywords list pretty much breaks +the keywords, adding new keywords causes the index file to be always copied to +a temporary file and be replaced. + +The keywords in the records are stored in a "keywords" extension bitfield. So +the nth bit in the bitfield points to the nth keyword listed in the header. + +It's not currently possible to safely remove existing keywords. + +Extensions +---------- + +The extensions only specify their wanted size and alignment, the index file +syncing code is free to assign any offset inside the record to them. The +extensions may be reordered at any time. + +Dovecot's current extension ordering code works pretty well, but it's not +perfect. If the extension size isn't the same as its alignment, it may create +larger records than necessary. This will be fixed later. + +The records size is always divisible by the maximum alignment requirement. This +isn't strictly necessary either, so it could be fixed later as well. + +(This file was created from the wiki on 2019-06-19 12:42) |