summaryrefslogtreecommitdiffstats
path: root/doc/wiki/IndexFiles.txt
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--doc/wiki/IndexFiles.txt192
1 files changed, 192 insertions, 0 deletions
diff --git a/doc/wiki/IndexFiles.txt b/doc/wiki/IndexFiles.txt
new file mode 100644
index 0000000..88ff012
--- /dev/null
+++ b/doc/wiki/IndexFiles.txt
@@ -0,0 +1,192 @@
+Dovecot's index files
+=====================
+
+The basic idea behind Dovecot's index files is that it makes reading the
+mailboxes a lot faster. The index files consist of the following files:
+
+ * dovecot.index: Main index file
+ * dovecot.index.cache: Cached mailbox data
+ * dovecot.index.log: Transaction log file
+ * dovecot.index.log.2: .log file is rotated to .log.2 file when it grows too
+ large.
+ * dovecot.list.index*: Mailbox list index files
+
+Each mailbox has its own separate index files. If the index files are disabled,
+the same structures are still kept in the memory, except cache file is disabled
+completely (because the client probably won't fetch the same data twice within
+a connection).
+
+If index files are missing, Dovecot creates them automatically when the mailbox
+is opened. If at any point creating a file or growing a file gives "not enough
+disk space" error, the indexes are transparently moved to memory for the rest
+of the session. This isn't done with mailbox formats that rely on index files
+(e.g. dbox).
+
+See <Design.Indexes.txt> for more technical information how the index files are
+handled.
+
+Main index
+----------
+
+The main index contains the following information for each message:
+
+ * IMAP UID
+ * Current flags and keywords
+ * Pointer to cache file
+ * mbox-only: mbox file offset
+ * mbox-only: MD5 sum of some of the message headers, intended to help find the
+ message when its X-UID: header hasn't yet been written
+ * Other extensions in Dovecot v1.1+, such as mailbox sorting data
+
+This is the same information that most other IMAP servers keep in memory while
+the mailbox is open, but Dovecot has the advantage of keeping the information
+permanently stored so it's easy to get it when opening the mailbox.
+
+The index file's header also contains some summary information, such as how
+many messages exist, how many of them are unseen and how many are marked with
+\Deleted flag. Opening mailboxes and answering to STATUS IMAP commands can be
+usually done simply by getting the required information from the index file's
+header. This is why these operations are extremely fast with Dovecot compared
+to other servers that don't use an equivalent index file.
+
+Mailbox synchronization
+-----------------------
+
+The main index's header also contains mailbox syncing state:
+
+ * Maildir: cur/ and new/ directories' timestamps
+ * mbox: mbox file's mtime and size
+
+The index file is synchronized against mailbox only if the syncing information
+changes.
+
+Cache file
+----------
+
+Cache file may contain the following information for messages:
+
+ * Message headers (some, not all)
+ * Sent date (parsed Date: header)
+ * Received date (IMAP's INTERNALDATE field)
+ * Physical and virtual message sizes
+ * Message's parsed MIME structure, allowing to quickly read only a specific
+ MIME part (IMAP's FETCH BODY[1.2.3] command)
+ * IMAP's BODY and BODYSTRUCTURE fields
+ * If both are used, only BODYSTRUCTURE is saved, since BODY can be
+ generated from it
+ * IMAP's ENVELOPE isn't cached currently. Instead the headers used to build it
+ are cached directly.
+
+IMAP clients can work in many different ways. There are basically 2 types:
+
+ 1. Online clients that ask for the same information multiple times (eg.
+ webmails, Pine)
+ 2. Offline clients that usually download first some of the interesting message
+ headers and only after that the message bodies (possibly automatically, or
+ possibly only when the user opens the mail). Most IMAP clients behave like
+ this.
+
+Cache file is extremely helpful with the type 1 clients. The first time that
+client requests message headers or some other metadata they're stored into the
+cache file. The second time they ask for the same information Dovecot can now
+get it quickly from the cache file instead of opening the message and parsing
+the headers.
+
+For type 2 clients the cache file is helpful if they use multiple clients or if
+the data was cached while the message was being saved (Dovecot v1.1+ can do
+this). Some of the information is helpful in any case, for example it's
+required to know the message's virtual size when downloading the message.
+Without the virtual size being in cache Dovecot first has to read the whole
+message to calculate it.
+
+Only the mailbox metadata that client(s) have asked for earlier are stored into
+cache file. This allows Dovecot to be adaptive to different clients' needs and
+still not waste disk space (and cause extra disk I/O!) for fields that client
+never needs.
+
+Dovecot can cache fields either permanently or temporarily. Temporarily cached
+fields are dropped from the cache file after about a week. Dovecot uses two
+rules to determine when data should be cached permanently instead of
+temporarily:
+
+ 1. Client accessed messages in non-sequential order within this session. This
+ most likely means it doesn't have a local cache.
+ 2. Client accessed a message older than one week.
+
+<Design.Indexes.Cache.txt> explains the reasons for these rules.
+
+Transaction log
+---------------
+
+All changes to the main index go through transaction log first. This has two
+advantages when the mailbox is accessed using multiple simultaneous
+connections:
+
+ 1. It allows getting a list of changes quickly so that IMAP clients can be
+ notified of the changes. An alternative would be to do a comparison of two
+ index mappings, which is what most other IMAP servers do.
+ 2. 'mmap_disable=yes' implementation relies on the transaction log. Instead of
+ re-reading the whole main index file after each change it's necessary to
+ only read a few bytes from the transaction log.
+
+In Dovecot v1.1+ the transaction log plays an even more important role. The
+main index file is updated only "once in a while" to reduce disk writes, so it
+is common to first read the main index and then apply new changes from the
+transaction log on top of that. With empty mailboxes (eg. download+delete POP3
+users) it would even be possible to delete the whole main index and keep only
+the transaction log (although this isn't done currently).
+
+List index
+----------
+
+Mailbox list index file is called dovecot.list.index[.log] and it basically
+contains:
+
+ * Header contains ID => name mapping. The name isn't the full mailbox name,
+ but rather each hierarchy level has its own ID and name. For example a
+ mailbox name "foo/bar" (with '/' as separator) would have separate IDs for
+ "foo" and "bar" names.
+ * The records contain { parent_uid, uid, name_id } field that can be used to
+ build the whole mailbox tree. parent_uid=0 means root, otherwise it's the
+ parent node's uid.
+ * Each record also contains GUID for each selectable mailbox. If a mailbox is
+ recreated using the same name, its GUID also changes. Note however that the
+ UID doesn't change, because the UID refers to the mailbox name, not to the
+ mailbox itself.
+ * The records may contain also extensions for allowing mailbox_get_status() to
+ return values directly from the mailbox list index.
+ * Storage backends may also add their own extensions to figure out if a record
+ is up to date.
+
+Settings
+--------
+
+Since v2.2.34+ you can configure some of the hardcoded optimization-related
+settings. It's not recommended to change these settings without fully
+understanding the consequences.
+
+ * 'mail_cache_unaccessed_field_drop': Drop fields that haven't been accessed
+ for n seconds.
+ * 'mail_cache_record_max_size': If cache record becomes larger than this,
+ don't add it.
+ * 'mail_cache_compress_min_size': Never compress the file if it's smaller than
+ this.
+ * 'mail_cache_compress_delete_percentage': Compress the file when n% of
+ records are deleted (by count, not by size).
+ * 'mail_cache_compress_continued_percentage': Compress the file when n% of
+ rows contain continued rows. For example 200% means that the record has 2
+ continued rows, i.e. it exists in 3 separate segments in the cache file.
+ * 'mail_cache_compress_header_continue_count': Compress the file when we need
+ to follow more than n next_offsets to find the latest cache header.
+ * 'mail_index_rewrite_min_log_bytes', 'mail_index_rewrite_max_log_bytes':
+ Rewrite the index when the number of bytes that needs to be read from the
+ .log on refresh is between these min/max values.
+ * 'mail_index_log_rotate_min_size', 'mail_index_log_rotate_max_size',
+ 'mail_index_log_rotate_min_age': Rotate transaction log after it's a)
+ min_size or larger and it was created at least min_age_secs or b) larger
+ than max_size.
+ * 'mail_index_log2_max_age': Delete .log.2 when it's older than
+ log2_stale_secs. Don't be too eager, because older files are useful for
+ QRESYNC and dsync.
+
+(This file was created from the wiki on 2019-06-19 12:42)