summaryrefslogtreecommitdiffstats
path: root/doc/wiki/Plugins.FTS.txt
blob: e960b4c7e0a11c724d58672547fd099701b68209 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
Full text search indexing
=========================

The following FTS indexers (in preferred order) are supported:

 * <Solr> [Plugins.FTS.Solr.txt] communicates with Lucene's Solr server
   [http://lucene.apache.org/solr/].
 * <Lucene> [Plugins.FTS.Lucene.txt] uses Lucene's C++ library. (Requires
   v2.1+)
 * <fts-dovecot> [Plugins.FTS.Dovecot.txt] is Dovecot Pro's new search index,
   and is not available without commercial agreement. (Requires v2.2+)
 * <Squat> [Plugins.FTS.Squat.txt] is Dovecot's own search index. (Obsolete in
   v2.1+)
 * fts-xapian [https://github.com/grosjo/fts-xapian] is Xapian
   [https://xapian.org] based plugin maintained by '<jom AT NOSPAM grosjo DOT
   net>'.  (Requires v2.3+)

Indexing
--------

By default the FTS indexes are updated *only* while searching, so neither the
<LDA.txt> nor an IMAP APPEND command updates the indexes immediately. This
means that if user has received a lot of mail since the last indexing (==
search operation), it may take a while to index all the mails before replying
to the search command. Dovecot sends periodic "* OK Indexed n% of the mailbox"
updates which can be caught by webmail implementations to implement a progress
bar.

In v2.2.9+ the indexing can be done automatically with 'fts_autoindex=yes'
setting (see below).

The indexing can be done manually (e.g. cronjob) or by a LDA script by running:

 * v2.1: 'doveadm index -u user@domain -q INBOX'
 * v2.0: 'printf "a select INBOX\nb search text xyzzy\nc logout\n" |
   /usr/local/libexec/dovecot/imap -u user@domain'

Of course the INBOX needs to be replaced with whatever mailbox needs to be
indexed.

Indexing Attachments (v2.1+)
----------------------------

Attachments can be indexed either via a script that translates the attachment
to UTF-8 plaintext or Apache Tika server.

 * 'fts_decoder = <service>': Decode attachments to plaintext using this
   service and index the resulting plaintext. See the 'decode2text.sh' script
   included in Dovecot for how to use this. (v2.1+)
 * 'fts_tika = http://tikahost:9998/tika/': This URL needs to be running Apache
   Tika server (e.g. started with 'java -jar
   tika-server/target/tika-server-1.5.jar') (v2.2.13+)

Rescan (v2.1+)
--------------

Since v2.1 Dovecot keeps track of indexed messages in the dovecot.index files.
If this becomes out of sync with the actual FTS indexes (either too many or too
few mails), you'll need to do a rescan:

---%<-------------------------------------------------------------------------
doveadm fts rescan -u user@domain
---%<-------------------------------------------------------------------------

Other Settings
--------------

All the FTS settings go inside 'plugin {} ' section of 90-plugin.conf.

 * 'fts_autoindex=yes': Index new messages immediately after they've been
   saved/copied. (v2.2.9+)
 * 'fts_autoindex_exclude=pattern1', 'fts_autoindex_exclude2=pattern2', ...:
   Exclude given mailboxes, one pattern per setting. Supports "*" and "?"
   wildcards. If a name starts with '\', it's treated as a case-insensitive
   special-use flag. (v2.2.25+)
    * Example:

      ---%<-------------------------------------------------------------------
      plugin {
        fts_autoindex_exclude = \Junk
        fts_autoindex_exclude2 = \Trash
        fts_autoindex_exclude3 = DUMPSTER
      }
      ---%<-------------------------------------------------------------------

 * 'fts_autoindex_max_recent_msgs=n': Skip autoindexing the mailbox if it has
   more than n \Recent messages (implying that the mailbox is never actually
   being accessed). (v2.2.9+)
 * 'fts_enforced':
    * no (default): All body searches will index all missing mails in FTS.
      Header searches will use FTS if the mails are indexed, otherwise fallback
      to parsing the headers (usually from dovecot.index.cache). If FTS search
      fails, fallback to reading and parsing all mails.
    * yes: All header and body searches will index all missing mails in FTS. If
      FTS search fails, error is returned to client.
 * 'fts_index_timeout': When SEARCH notices that index isn't up to date, it
   tells indexer to index the mails and waits until it is finished. This
   setting adds a maximum timeout to this wait. If the timeout is reached, the
   SEARCH fails with:'NO [INUSE] Timeout while waiting for indexing to finish'
   (v2.1+)

(This file was created from the wiki on 2019-06-19 12:42)