summaryrefslogtreecommitdiffstats
path: root/doc/wiki/MailboxFormat.dbox.txt
blob: 8df69bf124943b4737d474f2397112ea324d108d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
dbox
====

dbox is Dovecot's own high-performance mailbox format. The original version was
introduced in v1.0 alpha4, but since then it has been completely redesigned in
v1.1 series and improved even further in v2.0.

dbox can be used in two ways:

 1. *single-dbox* ('sdbox' in <mail location> [MailLocation.txt]): One message
    per file, similar to <Maildir> [MailboxFormat.Maildir.txt]. For backwards
    compatibility,'dbox' is an alias to 'sdbox' in <mail_location>
    [MailLocation.txt].
 2. *multi-dbox* ('mdbox' in <mail location> [MailLocation.txt]): Multiple
    messages per file, but unlike <mbox> [MailboxFormat.mbox.txt] multiple
    files per mailbox.

One of the main reasons for dbox's high performance is that it uses Dovecot's
index files as the only storage for message flags and keywords, so the indexes
don't have to be "synchronized". Dovecot trusts that they're always up-to-date
(unless it sees that something is clearly broken). This also means that *you
must not lose the dbox index files, they can't be regenerated without data
loss*.

dbox has a feature for transparently moving message data to an alternate
storage area. See <Alternate storage> [MailboxFormat.dbox.txt] below.

dbox storage is extensible. Single instance attachment storage was already
implemented as such extension.

Layout
------

By default, the dbox filesystem layout will be as follows. Data which isn't the
actual message content is stored in a layout common to both *single-dbox* and
*multi-dbox*:

 * <mail location root>'/mailboxes/INBOX/dbox-Mails/dovecot.index*' - Index
   files for INBOX
 * <mail location root>'/mailboxes/foo/dbox-Mails/dovecot.index*' - Index files
   for mailbox "foo"
 * <mail location root>'/mailboxes/foo/bar/dbox-Mails/dovecot.index*' - Index
   files for mailbox "foo/bar"
 * <mail location root>'/dovecot.mailbox.log*' - Mailbox changelog
 * <mail location root>'/subscriptions' - subscribed mailboxes list
 * <mail location root>'/dovecot-uidvalidity*' - IMAP UID validity

Note that with dbox the Index files actually contain significant data which is
held nowhere else. Index files for both *single-dbox* and *multi-dbox* contain
message flags and keywords. For *multi-dbox*, the index file also contains the
map_uids which link (via the "map index") to the actual message data. This data
cannot be automatically recreated, so it is important that Index files are
treated with the same care as message data files.

Index files can be stored in a different location by using the INDEX parameter
in the mail location specification. If the INDEX parameter is specified, it
will make Dovecot look for the Index files as follows:

 * <INDEX location>'/mailboxes/INBOX/dbox-Mails/dovecot.index*' - Index files
   for INBOX
 * <INDEX location>'/mailboxes/foo/dbox-Mails/dovecot.index*' - Index files for
   mailbox "foo"
 * <INDEX location>'/mailboxes/foo/bar/dbox-Mails/dovecot.index*' - Index files
   for mailbox "foo/bar"

Actual message content is stored differently depending on whether it is
*single-dbox* or *multi-dbox*.

Under *single-dbox* we have:

 * <mail location root>'/mailboxes/INBOX/dbox-Mails/u.*' - Numbered files
   ('u.1','u.2', ...) each containing one message of INBOX
 * <mail location root>'/mailboxes/foo/dbox-Mails/u.*' - Files each containing
   one message for mailbox "foo"
 * <mail location root>'/mailboxes/foo/bar/dbox-Mails/u.*' - Files each
   containing one message for for mailbox "foo/bar"

Under *multi-dbox* we have:

 * <mail location root>'/storage/dovecot.map.index*' - "Map index" containing a
   record for each message stored
 * <mail location root>'/storage/m.*' - Numbered files ('m.1', 'm.2', ...) each
   containing one or multiple messages

With Dovecot versions 2.0.4 and later, setting the INDEX parameter sets the
location of the "map index" as well as the location of the mailbox indexes. So
this would make the "map index" be stored as follows:

 * <INDEX location>'/storage/dovecot.map.index*' - "Map index" containing a
   record for each message stored

Multi-dbox
----------

You can enable multi-dbox with:

---%<-------------------------------------------------------------------------
mail_location = mdbox:~/mdbox
---%<-------------------------------------------------------------------------

The directory layout (under '~/mdbox/') is:

 * '~/mdbox/storage/' contains the actual mail data for all mailboxes
 * '~/mdbox/mailboxes/' contains directories for mailboxes and their index
   files

The storage directory has files:

 * 'dovecot.map.index*' files contain the "map index"
 * 'm.*' files contain the mail data

Each m.* file contains one or more messages. 'mdbox_rotate_size' setting can be
used to configure how large the files can grow.

The map index contains a record for each message:

 * map_uid: Unique growing 32 bit number for the message.
 * refcount: 16 bit reference counter for this message. Each time the message
   is copied the refcount is increased.
 * file_id: File number containing the message. For example if file_id=5, the
   message is in file 'm.5'.
 * offset: Offset to message within the file.
 * size: Space used by the message in the file, including all metadata.

Mailbox indexes refer to messages only using map_uids. This allows messages to
be moved to different files by updating only the map index. Copying is done
simply by appending a new record to mailbox index containing the existing
map_uid and increasing its refcount. If refcount grows over 32768, currently
Dovecot gives an error message. It's unlikely anyone really wants to copy the
same message that many times.

Expunging a message only decreases the message's refcount. The space is later
freed in "purge" step. This is typically done in a nightly cronjob when there's
less disk I/O activity. The purging first finds all files that have refcount=0
mails. Then it goes through each file and copies the refcount>0 mails to other
mdbox files (to the same files as where newly saved messages would also go),
updates the map index and finally deletes the original file. So there is never
any overwriting or file truncation.

The purging can be invoked explicitly running <doveadm purge>
[Tools.Doveadm.Purge.txt].

There are several safety features built into dbox to avoid losing messages or
their state if map index or mailbox index gets corrupted:

 * Each message has a 128 bit globally unique identifier (GUID). The GUID is
   saved to message metadata in m.* files and also to mailbox indexes. This
   allows Dovecot to find messages even if map index gets corrupted.
 * Whenever index file is rewritten, the old index is renamed to
   'dovecot.index.backup'. If the main index becomes corrupted, this backup
   index is used to restore flags and figure out what messages belong to the
   mailbox.
 * Initial mailbox where message was saved to is stored in the message metadata
   in m.* files. So if all indexes get lost, the messages are put to their
   initial mailboxes. This is better than placing everything into a single
   mailbox.

Alternate storage
-----------------

Unlike Maildir, with dbox the message file names don't change. This makes it
possible to support storing files in multiple directories or mount points. dbox
supports looking up files from "altpath" if they're not found from the primary
path. This means that it's possible to move older mails that are rarely
accessed to cheaper (slower) storage.

To enable this functionality, use the 'ALT' parameter in the mail location. For
example, specifying the mail location as:

---%<-------------------------------------------------------------------------
mail_location = mdbox:/var/vmail/%d/%n:ALT=/altstorage/vmail/%d/%n
---%<-------------------------------------------------------------------------

will make Dovecot look for message data first under '/var/vmail/%d/%n'
("primary storage"), and if it is not found there it will look under
'/altstorage/vmail/%d/%n' ("alternate storage") instead. There's no problem
having the same (identical) file in both storages.

Keep the unmounted '/altstorage' directory permissions such that Dovecot mail
processes can't create directories under it (e.g. root:root 0755). This way if
the alt storage isn't mounted for some reason, Dovecot won't think that all the
messages in alt storage were deleted and lose their flags.

When messages are moved from primary storage to alternate storage, only the
actual message data (stored in files 'u.*' under *single-dbox* and 'm.*' under
*multi-dbox*) is moved to alternate storage; everything else remains in the
primary storage.

Message data can be moved from primary storage to alternate storage using
<doveadm altmove> [Tools.Doveadm.Altmove.txt]. (In theory you could also do
this with some combination of cp/mv, but better not to go there unless you
really need to. The updates must be atomic in any case, so direct cp won't
work.)

The granularity at which data is moved to alternate storage is individual
messages. This is true even for *multi-dbox* when multiple messages are stored
in a single 'm.*' storage file. If individual messages from an 'm.*' storage
file need to be moved to alternate storage, the message data is written out to
a different 'm.*' storage file (either new or existing) in the alternate
storage area and the "map index" updated accordingly.

Alternate storage is completely transparent at the IMAP/POP level. Users
accessing mail through IMAP or POP cannot normally tell if any given message is
stored in primary storage or alternate storage. Conceivably users might be able
to measure a performance difference; the point is that there is no IMAP/POP
command which could be used to expose this information. It is entirely possible
to have a mail folder which contains a mix of messages stored in primary
storage and alternate storage.

dbox and mail header metadata
-----------------------------

Unlike when using <mbox> [MailboxFormat.mbox.txt] as <mailbox format>
[MailboxFormat.txt], where mail headers (for example 'Status', 'X-UID', etc.)
are <used to determine and store meta-data> [MailboxFormat.mbox.txt], the mail
headers within dbox files are (usually)*not* used for this purpose by dovecot;
neither when mails are created/moved/etc. via IMAP nor when dboxes are placed
(e.g. copied or moved in the filesystem) in a mail location (and then
"imported" by dovecot).Therefore, it is (usually) *not* necessary, to strip any
such mail headers at the MTA, MDA or LDA (as it is recommended with <mbox>
[MailboxFormat.mbox.txt]).

There is one exception, though, namely when 'pop3_reuse_xuidl=yes' (which is
however rather deprecated):In this case 'X-UIDL' is used for the POP3 UIDLs.
Therefore,*in this case, is recommended to strip the 'X-UIDL' mail headers
_case-insensitively_ at the MTA, MDA or LDA*.

Accessing expunged mails with mdbox
-----------------------------------

"mdbox_deleted" storage can be used to access mdbox's all mails that are
completely deleted (reference count=0). The mdbox_deleted parameters should
otherwise be exactly the same as mdbox's. Then you can use e.g. doveadm fetch
or doveadm import commands to access the mails. For example:

---%<-------------------------------------------------------------------------
# If you have mail_location=mdbox:~/mdbox:INDEX=/var/index/%u
doveadm import mdbox_deleted:~/mdbox:INDEX=/var/index/%u "" subject oops
---%<-------------------------------------------------------------------------

This finds a deleted mail with subject "oops" and imports it into INBOX.

Mail delivery
=============

Some MTA configurations have the MTA directly dropping mail into Maildirs or
mboxes.  Since most MTAs don't understand the dbox format, this option is not
available.  Instead, the MTA could be set up to use <LMTP#mta> [LMTP.txt] or
<the Dovecot Local Delivery Agent> [LDA.txt].

(This file was created from the wiki on 2019-06-19 12:42)