summaryrefslogtreecommitdiffstats
path: root/doc/wiki/Design.Dsync.txt
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-15 17:36:47 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-15 17:36:47 +0000
commit0441d265f2bb9da249c7abf333f0f771fadb4ab5 (patch)
tree3f3789daa2f6db22da6e55e92bee0062a7d613fe /doc/wiki/Design.Dsync.txt
parentInitial commit. (diff)
downloaddovecot-0441d265f2bb9da249c7abf333f0f771fadb4ab5.tar.xz
dovecot-0441d265f2bb9da249c7abf333f0f771fadb4ab5.zip
Adding upstream version 1:2.3.21+dfsg1.upstream/1%2.3.21+dfsg1
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to '')
-rw-r--r--doc/wiki/Design.Dsync.txt114
1 files changed, 114 insertions, 0 deletions
diff --git a/doc/wiki/Design.Dsync.txt b/doc/wiki/Design.Dsync.txt
new file mode 100644
index 0000000..7b51c93
--- /dev/null
+++ b/doc/wiki/Design.Dsync.txt
@@ -0,0 +1,114 @@
+Dsync Design
+============
+
+FIXME: This describes the design for v2.0/v2.1. The v2.2 design is somewhat
+different.
+
+Two-way synchronization
+-----------------------
+
+dsync attempts to preserve all changes done by both sides of the synced
+mailboxes.
+
+Mailbox list
+------------
+
+Mailboxes have 128 bit globally unique IDs, which are used for figuring out
+when two mailboxes should actually be synchronized. This solves two major
+problems:
+
+ * If mailbox has been renamed in one side, dsync finds it because its GUID
+ hasn't changed.
+ * If mailbox has been deleted and recreated, dsync doesn't attempt to sync it
+ because it's a different mailbox.
+
+Then there's the problem of how to correctly sync mailbox renames and
+deletions. How do you know which side of the sync has the most recent name for
+the mailbox? How do you know if one side had deleted mailbox, or if the other
+side had created it? To solve these problems, Dovecot v2.0 created a "mailbox
+log", which adds a record with mailbox GUID and timestamp whenever mailbox is
+renamed or deleted. So:
+
+ * If mailbox has different names on two sides, its "last renamed" timestamp is
+ looked up from the mailbox list index. The side with the most recent
+ timestamp is assumed to contain the newer name and the other side's mailbox
+ is renamed to it.
+ * If neither side has a "last renamed" timestamp, one side is picked. This
+ shouldn't happen, except when mailbox log is deleted for some reason or
+ if the renaming is done outside Dovecot.
+ * If mailbox exists only on one side, the other side checks if mailbox log
+ contains a delete record for its GUID. If there is one, the mailbox is
+ deleted from the other side. If there's not, the mailbox is created and
+ synced.
+ * Subscriptions and unsubscriptions are synced in a similar way. But because
+ it's possible to be subscribed to nonexistent mailboxes, mailbox log can't
+ contain mailbox GUIDs for them. Instead the first 128 bits of SHA1 of
+ mailbox name are used. Collisions for mailbox names are highly unlikely, but
+ even if one happens, the worst that can happen is that user gets
+ unsubscribed from wrong mailbox.
+
+dsync writes timestamps to changelog using the original timestamps, so that
+dsync's changes won't override changes done by user during sync.
+
+Mailbox
+-------
+
+When saving new mails, dsync preserves all of their immutable state:
+
+ * GUID
+ * Received date
+ * Save date
+ * Message contents
+
+It also attempts preserve IMAP UID. This works as long as the other side hasn't
+already used the UID for another mail. If it has, dsync doesn't attempt to
+preserve the UID, because an IMAP client might have already seen the UID and
+cached another mail's contents for it. IMAP requires that message's contents
+must never change, so UIDs can't be reused. So whenever an UID conflict
+happens, dsync gives messages in both sides a new UID, because it can't know
+which message the client had seen, or perhaps user used two clients and both
+saw a different message. (This assumes a master/slave replication use case for
+dsync.)
+
+The mutable metadata that dsync preserves is:
+
+ * Message flags and keywords
+ * Modification sequences (modseqs)
+
+Flags and keywords are synced based on modseqs. Whichever side has a higher
+modseq for the message, its flags and keywords are synced to the other side.
+Currently there's no per-flag or per-keyword synchronization, so that if one
+side had added \Seen flag and other side had added \Answered flag, one of them
+would be dropped.
+
+Finding what to sync
+--------------------
+
+dsync can run in full mode or fast mode. Full mode means it goes through all
+messages in all mailboxes, making sure everything is fully synchronized. In
+fast mode it relies on uidvalidity, uid-next and highest-modseq values to find
+out changes. If any of the values changed, the mailbox is included in sync.
+
+FIXME: A superfast mode should still be implemented, where once a mailbox is
+selected for syncing, it should sync only mails whose modseq is higher than a
+given one. This would improve performance and network traffic with large
+mailboxes.
+
+Copy optimizations
+------------------
+
+Before dsync actually starts syncing anything, it first fetched a list of all
+to-be-synced messages and adds them to a GUID -> message hash table. Whenever
+dsync needs to sync a new message to the other side, it first checks if the
+message's GUID already exists on the other side. If it does, it starts a
+message copy operation instead of a full save. It's possible that this copy
+operation fails if the message just gets expunged from the other side, so there
+needs to be fallback handling for this. If the message exists in multiple
+mailboxes, a copy from the next mailbox is attempted. If all of them fail,
+dsync fallbacks to saving the message.
+
+FIXME: This optimization currently works only in full sync mode. If this were
+to work in fast sync mode, the full mailbox list would have to be looked up
+from local side. And this would slow it down..
+
+(This file was created from the wiki on 2019-06-19 12:42)