summaryrefslogtreecommitdiffstats
path: root/doc/wiki/Plugins.FTS.Solr.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/wiki/Plugins.FTS.Solr.txt')
-rw-r--r--doc/wiki/Plugins.FTS.Solr.txt289
1 files changed, 289 insertions, 0 deletions
diff --git a/doc/wiki/Plugins.FTS.Solr.txt b/doc/wiki/Plugins.FTS.Solr.txt
new file mode 100644
index 0000000..e888a71
--- /dev/null
+++ b/doc/wiki/Plugins.FTS.Solr.txt
@@ -0,0 +1,289 @@
+Solr Full Text Search Indexing
+==============================
+
+Solr [https://lucene.apache.org/solr/] is a Lucene indexing server. Dovecot
+communicates to it using HTTP/XML queries.
+
+The steps described in this wiki page are tested for Solr 7.7.0. For other
+versions, this these steps may need to be adjusted.
+
+Compiling
+---------
+
+Dovecot is not compiled with Solr FTS support by default. To enable it, you
+need to add the '--with-solr' parameter to your invocation of the 'configure'
+script. You will also need to have libexpat installed, including development
+headers (typically from a separate development package). Configuration will
+fail if '--with-solr' is enabled while libexpat headers cannot be found. Older
+versions of Dovecot also required libcurl for Solr support, but recent versions
+of Dovecot include a custom HTTP client.
+
+Configuration
+-------------
+
+Solr Installation
+-----------------
+
+First, the Solr server needs to be installed. Most operating systems will have
+packages for this. The latest version can be downloaded and installed from
+official website, and here are instructions to install 7.7.0 based on the howto
+How to Install Apache Solr 7.5 on Debian 9/8
+[https://tecadmin.net/install-apache-solr-on-debian/]:
+
+---%<-------------------------------------------------------------------------
+wget https://www-eu.apache.org/dist/lucene/solr/7.7.0/solr-7.7.0.tgz
+tar xzf solr-7.7.0.tgz solr-7.7.0/bin/install_solr_service.sh
+--strip-components=2
+sudo bash ./install_solr_service.sh solr-7.7.0.tgz
+---%<-------------------------------------------------------------------------
+
+To use Solr with Dovecot, it needs to configured specifically for use with
+Dovecot.
+
+---%<-------------------------------------------------------------------------
+sudo -u solr /opt/solr/bin/solr create -c dovecot
+---%<-------------------------------------------------------------------------
+
+The location of the files for the newly created instance on the filesystem
+varies between operating systems and installation methods. For example, in
+Archlinux, the config files are located in '/opt/solr/server/solr/dovecot/conf'
+and data files can be found in '/opt/solr/server/solr/dovecot/data'. When
+installed from tarball, these directories can be found in
+'/var/solr/data/dovecot/'.
+
+Once the instance is created, you can start Solr. The means of starting,
+stopping and querying the status of the 'solr' service varies between systems.
+For systemd, these commands are as follows:
+
+---%<-------------------------------------------------------------------------
+sudo systemctl stop solr
+sudo systemctl start solr
+sudo systemctl status solr
+---%<-------------------------------------------------------------------------
+
+By default, the Solr administation page for the newly created instance is
+located at https://localhost:8983/solr/#/~cores/dovecot. It can be used to
+check the status of the Solr instance. Configuration errors are often most
+conveniently viewed here. Solr also writes log files. For a tarball
+installation, these can be found at '/var/solr/logs/'.
+
+Solr Configuration
+------------------
+
+There are three primary configuration files that need to be changed to
+accommodate the Dovecot FTS needs: the instance configuration file
+'solrconfig.xml' and the schema files 'schema.xml' and 'managed-schema' used by
+the instance. These files are both located in the 'conf' directory of the Solr
+instance (e.g.,'/var/solr/data/dovecot/conf/').
+
+Remove default core configuration files
+---------------------------------------
+
+---%<-------------------------------------------------------------------------
+rm -f /var/solr/data/dovecot/conf/schema.xml
+rm -f /var/solr/data/dovecot/conf/managed-schema
+rm -f /var/solr/data/dovecot/conf/solrconfig.xml
+---%<-------------------------------------------------------------------------
+
+Install schema.xml and solrconfig.xml
+-------------------------------------
+
+Copy doc/solr-config-7.7.0.xml
+[https://raw.githubusercontent.com/dovecot/core/master/doc/solr-config-7.7.0.xml]
+and doc/solr-schema-7.7.0.xml
+[https://raw.githubusercontent.com/dovecot/core/master/doc/solr-schema-7.7.0.xml]
+(Since Dovecot 2.3.6+) to '/var/solr/data/dovecot/conf/' as 'solrconfig.xml'
+and 'schema.xml'. The 'managed-schema' file is generated based on 'schema.xml'.
+
+Dovecot Plugin
+--------------
+
+On Dovecot's side add:
+
+Into 10-mail.conf (note add existing plugins to string)
+
+---%<-------------------------------------------------------------------------
+mail_plugins = $mail_plugins fts fts_solr
+---%<-------------------------------------------------------------------------
+
+Into 90-plugins.conf
+
+---%<-------------------------------------------------------------------------
+plugin {
+ fts = solr
+ fts_solr = url=https://solr.example.org:8983/solr/dovecot/
+}
+---%<-------------------------------------------------------------------------
+
+Fields listed in 'fts_solr' plugin setting are space separated. They can
+contain:
+
+ * url=<solr url> : Required base URL for Solr. (remember to add your core name
+ if using solr 7+ : "/solr/dovecot/"). The default URL for Solr 7+ is
+ https://localhost:8983/solr/dovecot
+ * debug : Enable HTTP debugging. Writes to debug log.
+ * break-imap-search : Use Solr also for indexing TEXT and BODY searches. This
+ makes your server non-IMAP-compliant. (This is always enabled in v2.1+, and
+ removed since v2.3+ as it's default behaviour)
+ * rawlog_dir=<directory> : For debugging, store HTTP exchanges between Dovecot
+ and Solr in this directory. (2.3.6+)
+ * batch_size : Configure the number of mails sent in single requests to Solr,
+ default is 1000. (2.3.6+)
+ * with fts_autoindex=yes, each new mail gets separately indexed on arrival,
+ so batch_size only matters when doing the initial indexing of a mailbox.
+ * with fts_autoindex=no, new mails don't get indexed on arrival, so
+ batch_size is used when indexing gets triggered.
+ * soft_commit=yes|no : Control whether new mails are immediately searchable
+ via Solr, default to yes. When using no, it's important to set autoCommit or
+ autoSoftCommit time in solrconfig.xml so mails eventually become searchable.
+ (2.3.6+)
+
+Important notes:
+
+ * Some mail clients will not submit any search requests for certain fields if
+ they index things locally eg. Thunderbird will not send any requests for
+ fields such as sender/recipients/subject when Body is not included as this
+ data is contained within the local index.
+
+Solr commits & optimization
+---------------------------
+
+Solr indexes should be optimized once in a while to make searches faster and to
+remove space used by deleted mails. Dovecot never asks Solr to optimize, so you
+should do this yourself. Perhaps a cronjob that sends the optimize-command to
+Solr every n hours.
+
+With v2.2.3+ Dovecot only does soft commits to the Solr index to improve
+performance. You must run a hard commit once in a while or Solr will keep
+increasing its transaction log sizes. For example send the commit command to
+Solr every few minutes.
+
+---%<-------------------------------------------------------------------------
+# Optimize should be run somewhat rarely, e.g. once a day
+curl https://<hostname/ip>:<port|default
+8983>/solr/dovecot/update?optimize=true
+# Commit should be run pretty often, e.g. every minute
+curl https://<hostname/ip>:<port|default 8983>/solr/dovecot/update?commit=true
+---%<-------------------------------------------------------------------------
+
+You may not need those if you are using a recent Solr (7+) or <SolrCloud.txt>.
+The default configuration of Solr is to auto-commit every once in a while
+(~15sec) so commit is not necessary. Also, the default /
+<TieredMergePolicy.txt>/ in Solr will automatically purge removed documents
+later, so optimize is not necessary.
+
+Re-index mailbox
+----------------
+
+If you require to force dovecot to reindex a whole mailbox you can run the
+command shown, this will only take action when a search is done and will apply
+to the whole mailbox.
+
+---%<-------------------------------------------------------------------------
+doveadm fts rescan -u <username>
+---%<-------------------------------------------------------------------------
+
+If you want to index a single mailbox/all mailboxes you can run the command
+shown, this will happen immediately and will block until the action is
+completed.
+
+---%<-------------------------------------------------------------------------
+doveadm index [-u <user>|-A] [-S <socket_path>] [-q] [-n <max recent>] <mailbox
+mask>
+---%<-------------------------------------------------------------------------
+
+Sorting by relevancy
+--------------------
+
+Solr/Lucene supports returning a relevancy score for search results. If you
+want to sort the search results by the score, use Dovecot's non-standard
+X-SCORE sort key:
+
+---%<-------------------------------------------------------------------------
+1 SORT (X-SCORE) UTF-8 <search parameters>
+---%<-------------------------------------------------------------------------
+
+Indexes
+-------
+
+Dovecot creates the following fields:
+
+ * id: Unique ID consisting of uid/uidv/user/box.
+ * Note that your user names really shouldn't contain '/' character.
+ * uid: Message's IMAP UID.
+ * uidv: Mailbox's UIDVALIDITY. This changes if mailbox gets recreated.
+ * box: Mailbox name
+ * user: User name who owns the mailbox, or empty for public namespaces
+ * hdr: Indexed message headers
+ * body: Indexed message body
+ * any: "Copy field" from hdr and body, i.e. searching based on this will
+ search from both headers and bodies.
+
+Lucene does duplicate suppression based on the "id" field, so even if Dovecot
+sends the same message multiple times to Solr it gets indexed only once. This
+might happen currently if multiple searches are started at the same time.
+
+You might want to build a cronjob to go through the Lucene indexes once in a
+while to delete indexed messages (or entire mailboxes) that no longer exist on
+the filesystem. It shouldn't normally find any such messages though.
+
+Testing
+-------
+
+---%<-------------------------------------------------------------------------
+# telnet localhost imap
+* OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE SORT
+SORT=DISPLAY THREAD=REFERENCES THREAD=REFS MULTIAPPEND UNSELECT CHILDREN
+NAMESPACE UIDPLUS LIST-EXTENDED I18NLEVEL=1 ESEARCH ESORT SEARCHRES WITHIN
+CONTEXT=SEARCH LIST-STATUS STARTTLS AUTH=PLAIN AUTH=LOGIN] I am ready.
+1 login username password
+2 select Inbox
+3 SEARCH text "test"
+---%<-------------------------------------------------------------------------
+
+Sharding
+--------
+
+If you have more users than fit into a single Solr box, you can split users off
+to different servers. A couple of different ways you could do it are:
+
+ * Have some HTTP proxy redirecting the connections based on the URL
+ * Configure Dovecot's userdb lookup to return a different host for 'fts_solr'
+ setting using <extra fields> [UserDatabase.ExtraFields.txt].
+ * LDAP: 'user_attrs = ...,
+ solrHost=fts_solr=url=https://%$:8983/solr/dovecot/'
+ * MySQL: 'user_query = SELECT concat('url=https://', solr_host,
+ ':8983/solr/dovecot/') AS fts_solr, ...'
+
+You can also use SolrCloud
+[https://lucene.apache.org/solr/guide/7_6/solrcloud.html], the clustered
+version of Solr, that allows you to scale up, and adds failover / high
+availability to your FTS system. Dovecot-solr works fine with a <SolrCloud.txt>
+cluster as long as the solr schema is the right one.
+
+External Tutorials
+------------------
+
+External sites with tutorials on using Solr under Dovecot
+
+ * Installing Apache Solr with Dovecot for fulltext search results (ATmail
+ support guide)
+ [https://help.atmail.com/hc/en-us/articles/201566404-Installing-Apache-Solr-with-Dovecot-for-fulltext-search-results]
+ * FreeBSD: https://mor-pah.net/2016/08/15/dovecot-2-2-with-solr-6-or-5/
+ * Substring searches with ngrams:
+ https://dovecot.org/list/dovecot/2011-May/059338.html
+
+Tips
+----
+
+Some additional things which might help you configuring Solr search:
+
+ * If you are using Tomcat: Set 'maxHttpHeaderSize="65536"' (connector
+ definition for port 8080 in '/etc/tomcat7/server.xml') to accept long search
+ query strings (iPhones tend to send multi-kilobyte-sized queries)
+ * Set 'df' to 'hdr' in '/etc/solr/conf/solrconfig.xml' ('/select' request
+ handler) to avoid strange 'undefined field text' errors.
+ * Please keep in mind that you will have to change the Solr URL to include the
+ core name (ie:'dovecot': 'https://localhost:8939/solr/dovecot').
+
+(This file was created from the wiki on 2019-06-19 12:42)