summaryrefslogtreecommitdiffstats
path: root/doc/wiki/Plugins.FTS.Lucene.txt
blob: 0ddc392d52b07d2a8ebbdb2db7cd096dfe43a8a4 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
Lucene Full Text Search Indexing
================================

*NOTE*: Although the fts-lucene plugin works, it's using CLucene library, which
is very old and has some bugs. It's a much better idea to use <fts-solr>
[Plugins.FTS.Solr.txt] instead, which has much more features and is more
stable.

Requires Dovecot v2.1+ to work properly. The CLucene version must be v2.3 (not
v0.9).Dovecot builds only a single Lucene index for all mailboxes. The Lucene
indexes are stored in 'lucene-indexes/' directory under the mail root index
directory (e.g.'~/Maildir/lucene-indexes/').

Compilation
-----------

If you compile Dovecot yourself, you must add the following switches to your
configure command for the plugin to be built:

---%<-------------------------------------------------------------------------
--with-lucene --with-stemmer
---%<-------------------------------------------------------------------------

The second switch is only required if you have compiled libstemmer yourself or
if it's included in the CLucene you are using.

Configuration
-------------

Into 10-mail.conf (note add existing plugins to string)

---%<-------------------------------------------------------------------------
mail_plugins = $mail_plugins fts fts_lucene
---%<-------------------------------------------------------------------------

Into 90-plugins.conf

---%<-------------------------------------------------------------------------
plugin {
  fts = lucene
  # Lucene-specific settings, good ones are:
  fts_lucene = whitespace_chars=@.
}
---%<-------------------------------------------------------------------------

The fts-lucene settings include:

 * whitespace_chars=<chars>: List of characters that are translated to
   whitespace. You may want to use "@." so that e.g. in
   "'first.last@example.org'" it won't be treated as a single word, but rather
   you can search separately for "first", "last" and "example".
 * default_language=<lang>: Default stemming language to use for mails. The
   default is english. Requires that Dovecot is built with libstemmer, which
   also limits the languages that are supported.
 * textcat_conf=<path> textcat_dir=<path>: If specified, enable guessing the
   stemming language for emails and search keywords. This is a little bit
   problematic in practice, since indexing and searching languages may differ
   and may not find even exact words because they stem differently.
 * no_snowball: Support normalization of indexed words even without stemming
   and libstemmer (Snowball). (v2.2.3+)
 * mime_parts: Index each MIME part separately and include the MIME part number
   in the "part" field. In future versions this will allowing showing which
   attachment matched the search result. (v2.2.13+)

Libraries
---------

 * CLucene [http://sourceforge.net/projects/clucene/files/]: Get v2.3.3.4 (not
   v0.9)
 * libstemmer [http://snowball.tartarus.org/download.php]: Builds libstemmer.o,
   which you can rename to libstemmer.a
 * textcat [http://textcat.sourceforge.net/]

(This file was created from the wiki on 2019-06-19 12:42)