1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
|
<!-- doc/src/sgml/dict-int.sgml -->
<sect1 id="dict-int" xreflabel="dict_int">
<title>dict_int —
example full-text search dictionary for integers</title>
<indexterm zone="dict-int">
<primary>dict_int</primary>
</indexterm>
<para>
<filename>dict_int</filename> is an example of an add-on dictionary template
for full-text search. The motivation for this example dictionary is to
control the indexing of integers (signed and unsigned), allowing such
numbers to be indexed while preventing excessive growth in the number of
unique words, which greatly affects the performance of searching.
</para>
<para>
This module is considered <quote>trusted</quote>, that is, it can be
installed by non-superusers who have <literal>CREATE</literal> privilege
on the current database.
</para>
<sect2 id="dict-int-config">
<title>Configuration</title>
<para>
The dictionary accepts three options:
</para>
<itemizedlist>
<listitem>
<para>
The <literal>maxlen</literal> parameter specifies the maximum number of
digits allowed in an integer word. The default value is 6.
</para>
</listitem>
<listitem>
<para>
The <literal>rejectlong</literal> parameter specifies whether an overlength
integer should be truncated or ignored. If <literal>rejectlong</literal> is
<literal>false</literal> (the default), the dictionary returns the first
<literal>maxlen</literal> digits of the integer. If <literal>rejectlong</literal> is
<literal>true</literal>, the dictionary treats an overlength integer as a stop
word, so that it will not be indexed. Note that this also means that
such an integer cannot be searched for.
</para>
</listitem>
<listitem>
<para>
The <literal>absval</literal> parameter specifies whether leading
<quote><literal>+</literal></quote> or <quote><literal>-</literal></quote>
signs should be removed from integer words. The default
is <literal>false</literal>. When <literal>true</literal>, the sign is
removed before <literal>maxlen</literal> is applied.
</para>
</listitem>
</itemizedlist>
</sect2>
<sect2 id="dict-int-usage">
<title>Usage</title>
<para>
Installing the <literal>dict_int</literal> extension creates a text search
template <literal>intdict_template</literal> and a dictionary <literal>intdict</literal>
based on it, with the default parameters. You can alter the
parameters, for example
<programlisting>
mydb# ALTER TEXT SEARCH DICTIONARY intdict (MAXLEN = 4, REJECTLONG = true);
ALTER TEXT SEARCH DICTIONARY
</programlisting>
or create new dictionaries based on the template.
</para>
<para>
To test the dictionary, you can try
<programlisting>
mydb# select ts_lexize('intdict', '12345678');
ts_lexize
-----------
{123456}
</programlisting>
but real-world usage will involve including it in a text search
configuration as described in <xref linkend="textsearch"/>.
That might look like this:
<programlisting>
ALTER TEXT SEARCH CONFIGURATION english
ALTER MAPPING FOR int, uint WITH intdict;
</programlisting>
</para>
</sect2>
</sect1>
|