summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/dict-int.sgml
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src/sgml/dict-int.sgml')
-rw-r--r--doc/src/sgml/dict-int.sgml101
1 files changed, 101 insertions, 0 deletions
diff --git a/doc/src/sgml/dict-int.sgml b/doc/src/sgml/dict-int.sgml
new file mode 100644
index 0000000..8dd07b9
--- /dev/null
+++ b/doc/src/sgml/dict-int.sgml
@@ -0,0 +1,101 @@
+<!-- doc/src/sgml/dict-int.sgml -->
+
+<sect1 id="dict-int" xreflabel="dict_int">
+ <title>dict_int &mdash;
+ example full-text search dictionary for integers</title>
+
+ <indexterm zone="dict-int">
+ <primary>dict_int</primary>
+ </indexterm>
+
+ <para>
+ <filename>dict_int</filename> is an example of an add-on dictionary template
+ for full-text search. The motivation for this example dictionary is to
+ control the indexing of integers (signed and unsigned), allowing such
+ numbers to be indexed while preventing excessive growth in the number of
+ unique words, which greatly affects the performance of searching.
+ </para>
+
+ <para>
+ This module is considered <quote>trusted</quote>, that is, it can be
+ installed by non-superusers who have <literal>CREATE</literal> privilege
+ on the current database.
+ </para>
+
+ <sect2 id="dict-int-config">
+ <title>Configuration</title>
+
+ <para>
+ The dictionary accepts three options:
+ </para>
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ The <literal>maxlen</literal> parameter specifies the maximum number of
+ digits allowed in an integer word. The default value is 6.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ The <literal>rejectlong</literal> parameter specifies whether an overlength
+ integer should be truncated or ignored. If <literal>rejectlong</literal> is
+ <literal>false</literal> (the default), the dictionary returns the first
+ <literal>maxlen</literal> digits of the integer. If <literal>rejectlong</literal> is
+ <literal>true</literal>, the dictionary treats an overlength integer as a stop
+ word, so that it will not be indexed. Note that this also means that
+ such an integer cannot be searched for.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ The <literal>absval</literal> parameter specifies whether leading
+ <quote><literal>+</literal></quote> or <quote><literal>-</literal></quote>
+ signs should be removed from integer words. The default
+ is <literal>false</literal>. When <literal>true</literal>, the sign is
+ removed before <literal>maxlen</literal> is applied.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </sect2>
+
+ <sect2 id="dict-int-usage">
+ <title>Usage</title>
+
+ <para>
+ Installing the <literal>dict_int</literal> extension creates a text search
+ template <literal>intdict_template</literal> and a dictionary <literal>intdict</literal>
+ based on it, with the default parameters. You can alter the
+ parameters, for example
+
+<programlisting>
+mydb# ALTER TEXT SEARCH DICTIONARY intdict (MAXLEN = 4, REJECTLONG = true);
+ALTER TEXT SEARCH DICTIONARY
+</programlisting>
+
+ or create new dictionaries based on the template.
+ </para>
+
+ <para>
+ To test the dictionary, you can try
+
+<programlisting>
+mydb# select ts_lexize('intdict', '12345678');
+ ts_lexize
+-----------
+ {123456}
+</programlisting>
+
+ but real-world usage will involve including it in a text search
+ configuration as described in <xref linkend="textsearch"/>.
+ That might look like this:
+
+<programlisting>
+ALTER TEXT SEARCH CONFIGURATION english
+ ALTER MAPPING FOR int, uint WITH intdict;
+</programlisting>
+
+ </para>
+ </sect2>
+
+</sect1>