summaryrefslogtreecommitdiffstats
path: root/doc/thread-refs.txt
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-15 17:36:47 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-15 17:36:47 +0000
commit0441d265f2bb9da249c7abf333f0f771fadb4ab5 (patch)
tree3f3789daa2f6db22da6e55e92bee0062a7d613fe /doc/thread-refs.txt
parentInitial commit. (diff)
downloaddovecot-0441d265f2bb9da249c7abf333f0f771fadb4ab5.tar.xz
dovecot-0441d265f2bb9da249c7abf333f0f771fadb4ab5.zip
Adding upstream version 1:2.3.21+dfsg1.upstream/1%2.3.21+dfsg1
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/thread-refs.txt')
-rw-r--r--doc/thread-refs.txt223
1 files changed, 223 insertions, 0 deletions
diff --git a/doc/thread-refs.txt b/doc/thread-refs.txt
new file mode 100644
index 0000000..484aee9
--- /dev/null
+++ b/doc/thread-refs.txt
@@ -0,0 +1,223 @@
+Optimistic incremental THREAD=REFERENCES
+
+Step (1) is the slowest stage for building a THREAD=REFERENCES tree. If its
+result tree is permanently saved, the following thread builds can be based
+on it by updating the tree incrementally.
+
+Adding new messages to the tree is simple: simply follow the normal rules
+as when building a new tree from scratch. Expunging messages gets more
+problematic though.
+
+Each node in the tree keeps a "link reference count" which specifies how
+many messages contain a "this message" -> "parent message" reference
+(number of links to parent node). The first reference is usually added by
+the message's own References: or In-Reply-To: header and the latter
+references are added by References: headers. This link refcount must be
+updated when adding and expunging messages. When the link refcount drops to
+zero, the message becomes a root. The link refcount doesn't tell much about
+the number of children the node has, because References: headers may
+reference any number of its ancestors.
+
+The optimistic approach assumes that usually there are no problematic
+links. In such case expunging a message simply updates the link refcounts
+and marks the message's node as expunged. Expunged messages may still act
+as dummy nodes. The node may be removed only after there are no nodes which
+point to it.
+
+Problematic links are handled by marking nodes affected by them. If such a
+node is expunged or its link is unreferenced, the thread tree must be
+rebuilt. This is assumed to be a rarely occurring event. The problematic
+cases are:
+
+1) Duplicate Message-ID: headers. If the first message having it is
+expunged, the thread tree must be rebuilt.
+
+2) Message-ID loops. If a message referencing the looping path gets
+expunged, the loop may break and the thread tree must be rebuilt. Because
+it can't be determined easily which loops get broken by which expunges,
+this case can be handled in a bit easier way: When detecting a loop between
+parent and child, rebuild the tree if any link between the parent and child
+gets unreferenced.
+
+3) A message changes its parent because an earlier message's References:
+header contained a different link. If the message gets expunged, the thread
+tree must be rebuilt to get the original parent back.
+
+4) A link in a message's References: header is ignored, because the
+referenced child already specified a different parent to itself. If the
+message gets expunged, the thread tree must be rebuilt to determine its new
+parent.
+
+5) A link in a message's References: header is ignored, because an earlier
+message's References: header already specified a different link. If the
+earlier message gets expunged, the parent may change. The earlier message
+could be found out quickly by keeping some extra state (or with a slow
+scan), but since this is assumed to be a rare problem, there's an easier
+(but less specific) way to handle this: Rebuild the tree if a link to the
+child node is unreferenced (or alternatively if a link to the original
+parent node is unreferenced, but it probably happens more often).
+
+Pseudocode:
+
+node {
+ parent: Pointer to parent node. Children pointers aren't required.
+ uid: Message's UID (0 = expunged or never even existed)
+
+ parent_link_refcount: Number of messages containing "this message" -> "parent
+ message" link, i.e. "number of links to parent node". However since parents
+ can change, not all of these references might be from our current child
+ nodes. When this refcount reaches 0, it means we must detach from our
+ parent.
+ expunge_rebuilds: If this message gets expunged, rebuild the thread tree.
+ child_unref_rebuilds: If a link between this node and its child gets
+ unreferenced, rebuild the thread tree.
+}
+
+link_reference(parent_node, child_node)
+ child_node.parent_link_refcount++
+ if is_ancestor(child_node, parent_node)
+ // child_node is an ancestor of parent_node. Adding child_node ->
+ // parent_node would introduce a loop. If any messages referencing the
+ // path between parent_node's parent and child_node get expunged, we
+ // have to rebuild the tree because the loop might break. For example:
+ // #1: a -> b (a.ref=1, b.ref=1)
+ // #2: b -> a (a.ref=2, b.ref=2)
+ // #3: c -> a -> b (a.ref=3, b.ref=3, c.ref=1)
+ // Expunging #3 wouldn't break the loop, but expunging #1 would.
+ for node in nodes[parent_node.parent .. child_node]
+ node.child_unref_rebuilds = true
+ else if child_node.parent == parent_node
+ // The same link already exists
+ else
+ // Set parent_node as child_node's parent
+ if child_node.parent == NIL
+ child_node.parent = parent_node
+ else
+ // Conflicting parent already exists, keep the original.
+ // We get here only when handling References: header.
+ if child_node.uid != 0
+ // We've already seen this message. It specifies its own parent and
+ // it doesn't matter what any other reference says. The only way its
+ // parent can change is if the message itself gets expunged.
+ child_node.expunge_rebuilds = true
+ else
+ // Message doesn't exist, so it was one of the node's children
+ // that created the original reference. If that reference gets
+ // dropped, the parent is changed. We could catch this in one of
+ // several ways:
+ // a) Link to original parent node gets unreferenced
+ // b) Link to this node gets unreferenced
+ // c) Any of the child nodes gets expunged
+ // b) is probably the least likely to happen, so use it
+ child_node.child_unref_rebuilds = true
+
+thread_add_msg(uid)
+ // get the Message-IDs as specified by the thread spec
+ (msgid, parent_msgid, references) = message_get_thread_headers(uid)
+
+ if msgid != NIL
+ if nodes[msgid].uid == 0
+ nodes[msgid].uid = uid
+ else
+ // duplicate Message-ID. if the original ever gets expunged,
+ // rebuild the thread tree
+ nodes[msgid].expunge_rebuilds = true
+ msgid = NIL
+
+ if msgid == NIL
+ msgid = get_unique_msg_id()
+
+ node = nodes[msgid]
+ if node.parent != NIL and
+ (parent_msgid == NIL or node.parent.msgid != parent_msgid)
+ // Conflicting parent, remove it. If this message gets expunged, we have
+ // to revert back to the original parent.
+ node.parent = NIL
+ node.expunge_rebuilds = true
+
+ if parent_msgid != NIL
+ link_reference(nodes[parent_msgid], node)
+
+ // go through References (skipping the last one)
+ for (ref_parent, ref_child) in references
+ link_reference(nodes[ref_parent], nodes[ref_child])
+
+unref_link(parent, child)
+ if parent.child_unref_rebuilds
+ return false
+
+ child.parent_link_refcount--
+ if child.parent_link_refcount == 0
+ child.parent = NIL
+ return true
+
+// returns false if thread tree needs to be rebuilt
+thread_expunge_msg(uid)
+ // get the Message-IDs as specified by the thread spec
+ (msgid, in_reply_to_msgid, references) = message_get_thread_headers(uid)
+
+ node = nodes[msgid]
+ if node.uid != uid
+ // Removing a duplicate Message-ID
+ return false
+
+ if node.expunge_rebuilds
+ return false
+
+ if parent_msgid != NIL and
+ not unref_link(nodes[parent_msgid], nodes[child_msgid])
+ return false
+
+ if references != NIL
+ // go through References
+ for (parent_msgid, child_msgid) in references
+ if not unref_link(nodes[parent_msgid], nodes[child_msgid])
+ return false
+ last_parent_msgid = references.last
+ else if in_reply_to_msgid != NIL
+ last_parent_msgid = in_reply_to_msgid
+
+ if last_parent_msgid != NIL and
+ not unref_link(nodes[last_parent_msgid], node)
+ return false
+
+ // mark this node as expunged
+ node.uid = 0
+ return true
+
+thread_iterate()
+ root_nodes = []
+ referenced = []
+ children = [][]
+ // Steps (2) and (3)
+ for node in nodes
+ if node.parent != NIL
+ root_nodes[] = node
+ else
+ referenced[node.parent] = true
+ if node.uid != 0
+ // Find the node's first non-dummy parent and add the node as its child.
+ // If there are no non-dummy parents, add it as the highest dummy's
+ // child.
+ nondummy_parent = node.parent
+ while nondummy_parent.uid == 0 and nondummy_parent.parent != NIL
+ nondummy_parent = nondummy_parent.parent
+ children[nondummy_parent][] = node
+
+ for node in root_nodes
+ if node.uid == 0
+ if children[node] == NIL
+ // remove dummy roots that have no children.
+ delete(node)
+ else if count(children[node]) == 1
+ // dummy root has a single child, replace the root with its child
+ node = children[node]
+
+ for node in nodes
+ if node.uid == 0 and !referenced[node]
+ free(node)
+
+ // root_nodes and children now contain a tree equivalent to a tree built by
+ // THREAD=REFERENCES specification steps (1)-(3). The rest of the steps
+ // can be performed using them. Note that node.parent should not (and need
+ // not) be used because it points its parent before steps (2) and (3).