summaryrefslogtreecommitdiffstats
path: root/doc/thread-refs.txt
blob: 484aee9239a14e3176c7c40ad8d74aeb027db852 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
Optimistic incremental THREAD=REFERENCES

Step (1) is the slowest stage for building a THREAD=REFERENCES tree. If its
result tree is permanently saved, the following thread builds can be based
on it by updating the tree incrementally.

Adding new messages to the tree is simple: simply follow the normal rules
as when building a new tree from scratch. Expunging messages gets more
problematic though.

Each node in the tree keeps a "link reference count" which specifies how
many messages contain a "this message" -> "parent message" reference
(number of links to parent node). The first reference is usually added by
the message's own References: or In-Reply-To: header and the latter
references are added by References: headers. This link refcount must be
updated when adding and expunging messages. When the link refcount drops to
zero, the message becomes a root. The link refcount doesn't tell much about
the number of children the node has, because References: headers may
reference any number of its ancestors.

The optimistic approach assumes that usually there are no problematic
links. In such case expunging a message simply updates the link refcounts
and marks the message's node as expunged. Expunged messages may still act
as dummy nodes. The node may be removed only after there are no nodes which
point to it.

Problematic links are handled by marking nodes affected by them. If such a
node is expunged or its link is unreferenced, the thread tree must be
rebuilt. This is assumed to be a rarely occurring event. The problematic
cases are:

1) Duplicate Message-ID: headers. If the first message having it is
expunged, the thread tree must be rebuilt.

2) Message-ID loops. If a message referencing the looping path gets
expunged, the loop may break and the thread tree must be rebuilt. Because
it can't be determined easily which loops get broken by which expunges,
this case can be handled in a bit easier way: When detecting a loop between
parent and child, rebuild the tree if any link between the parent and child
gets unreferenced.

3) A message changes its parent because an earlier message's References:
header contained a different link. If the message gets expunged, the thread
tree must be rebuilt to get the original parent back.

4) A link in a message's References: header is ignored, because the
referenced child already specified a different parent to itself. If the
message gets expunged, the thread tree must be rebuilt to determine its new
parent.

5) A link in a message's References: header is ignored, because an earlier
message's References: header already specified a different link. If the
earlier message gets expunged, the parent may change. The earlier message
could be found out quickly by keeping some extra state (or with a slow
scan), but since this is assumed to be a rare problem, there's an easier
(but less specific) way to handle this: Rebuild the tree if a link to the
child node is unreferenced (or alternatively if a link to the original
parent node is unreferenced, but it probably happens more often).

Pseudocode:

node {
  parent: Pointer to parent node. Children pointers aren't required.
  uid: Message's UID (0 = expunged or never even existed)

  parent_link_refcount: Number of messages containing "this message" -> "parent
    message" link, i.e. "number of links to parent node". However since parents
    can change, not all of these references might be from our current child
    nodes. When this refcount reaches 0, it means we must detach from our
    parent.
  expunge_rebuilds: If this message gets expunged, rebuild the thread tree.
  child_unref_rebuilds: If a link between this node and its child gets
    unreferenced, rebuild the thread tree.
}

link_reference(parent_node, child_node)
  child_node.parent_link_refcount++
  if is_ancestor(child_node, parent_node)
    // child_node is an ancestor of parent_node. Adding child_node ->
    // parent_node would introduce a loop. If any messages referencing the
    // path between parent_node's parent and child_node get expunged, we
    // have to rebuild the tree because the loop might break. For example:
    //   #1: a -> b       (a.ref=1, b.ref=1)
    //   #2: b -> a       (a.ref=2, b.ref=2)
    //   #3: c -> a -> b  (a.ref=3, b.ref=3, c.ref=1)
    // Expunging #3 wouldn't break the loop, but expunging #1 would.
    for node in nodes[parent_node.parent .. child_node]
      node.child_unref_rebuilds = true
  else if child_node.parent == parent_node
    // The same link already exists
  else
    // Set parent_node as child_node's parent
    if child_node.parent == NIL
      child_node.parent = parent_node
    else
      // Conflicting parent already exists, keep the original.
      // We get here only when handling References: header.
      if child_node.uid != 0
	// We've already seen this message. It specifies its own parent and
	// it doesn't matter what any other reference says. The only way its
	// parent can change is if the message itself gets expunged.
	child_node.expunge_rebuilds = true
      else
	// Message doesn't exist, so it was one of the node's children
	// that created the original reference. If that reference gets
	// dropped, the parent is changed. We could catch this in one of
	// several ways:
	//  a) Link to original parent node gets unreferenced
	//  b) Link to this node gets unreferenced
	//  c) Any of the child nodes gets expunged
	// b) is probably the least likely to happen, so use it
	child_node.child_unref_rebuilds = true

thread_add_msg(uid)
  // get the Message-IDs as specified by the thread spec
  (msgid, parent_msgid, references) = message_get_thread_headers(uid)

  if msgid != NIL
    if nodes[msgid].uid == 0
      nodes[msgid].uid = uid
    else
      // duplicate Message-ID. if the original ever gets expunged,
      // rebuild the thread tree
      nodes[msgid].expunge_rebuilds = true
      msgid = NIL

  if msgid == NIL
    msgid = get_unique_msg_id()

  node = nodes[msgid]
  if node.parent != NIL and
     (parent_msgid == NIL or node.parent.msgid != parent_msgid)
    // Conflicting parent, remove it. If this message gets expunged, we have
    // to revert back to the original parent.
    node.parent = NIL
    node.expunge_rebuilds = true

  if parent_msgid != NIL
    link_reference(nodes[parent_msgid], node)

  // go through References (skipping the last one)
  for (ref_parent, ref_child) in references
    link_reference(nodes[ref_parent], nodes[ref_child])

unref_link(parent, child)
  if parent.child_unref_rebuilds
    return false

  child.parent_link_refcount--
  if child.parent_link_refcount == 0
    child.parent = NIL
  return true  

// returns false if thread tree needs to be rebuilt
thread_expunge_msg(uid)
  // get the Message-IDs as specified by the thread spec
  (msgid, in_reply_to_msgid, references) = message_get_thread_headers(uid)

  node = nodes[msgid]
  if node.uid != uid
    // Removing a duplicate Message-ID
    return false

  if node.expunge_rebuilds
    return false

  if parent_msgid != NIL and
     not unref_link(nodes[parent_msgid], nodes[child_msgid])
    return false

  if references != NIL
    // go through References
    for (parent_msgid, child_msgid) in references
      if not unref_link(nodes[parent_msgid], nodes[child_msgid])
	return false
    last_parent_msgid = references.last
  else if in_reply_to_msgid != NIL
    last_parent_msgid = in_reply_to_msgid

  if last_parent_msgid != NIL and
     not unref_link(nodes[last_parent_msgid], node)
    return false

  // mark this node as expunged
  node.uid = 0
  return true

thread_iterate()
  root_nodes = []
  referenced = []
  children = [][]
  // Steps (2) and (3)
  for node in nodes
    if node.parent != NIL
      root_nodes[] = node
    else
      referenced[node.parent] = true
      if node.uid != 0
	// Find the node's first non-dummy parent and add the node as its child.
	// If there are no non-dummy parents, add it as the highest dummy's
	// child.
	nondummy_parent = node.parent
	while nondummy_parent.uid == 0 and nondummy_parent.parent != NIL
	  nondummy_parent = nondummy_parent.parent
	children[nondummy_parent][] = node

  for node in root_nodes
    if node.uid == 0
      if children[node] == NIL
	// remove dummy roots that have no children.
	delete(node)
      else if count(children[node]) == 1
	// dummy root has a single child, replace the root with its child
	node = children[node]

  for node in nodes
    if node.uid == 0 and !referenced[node]
      free(node)

  // root_nodes and children now contain a tree equivalent to a tree built by
  // THREAD=REFERENCES specification steps (1)-(3). The rest of the steps
  // can be performed using them. Note that node.parent should not (and need
  // not) be used because it points its parent before steps (2) and (3).