blob: 3463289f1c4b27f65067f9b179b42e7f0b1a35a5 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
|
src/backend/snowball/README
Snowball-Based Stemming
=======================
This module uses the word stemming code developed by the Snowball project,
http://snowballstem.org (formerly http://snowball.tartarus.org)
which is released by them under a BSD-style license.
The Snowball project is not currently making formal releases; it's best
to pull from their git repository
git clone https://github.com/snowballstem/snowball.git
and then building the derived files is as simple as
cd snowball
make
At least on Linux, no platform-specific adjustment is needed.
Postgres' files under src/backend/snowball/libstemmer/ and
src/include/snowball/libstemmer/ are taken directly from the Snowball
files, with only some minor adjustments of file inclusions. Note
that most of these files are in fact derived files, not master source.
The master sources are in the Snowball language, and are built using
the Snowball-to-C compiler that is also part of the Snowball project.
We choose to include the derived files in the PostgreSQL distribution
because most installations will not have the Snowball compiler available.
We are currently synced with the Snowball git commit
4456b82c26c02493e8807a66f30593a98c5d2888
of 2019-06-24.
To update the PostgreSQL sources from a new Snowball version:
0. If you didn't do it already, "make -C snowball".
1. Copy the *.c files in snowball/src_c/ to src/backend/snowball/libstemmer
with replacement of "../runtime/header.h" by "header.h", for example
for f in .../snowball/src_c/*.c
do
sed 's|\.\./runtime/header\.h|header.h|' $f >libstemmer/`basename $f`
done
2. Copy the *.c files in snowball/runtime/ to
src/backend/snowball/libstemmer, and edit them to remove direct inclusions
of system headers such as <stdio.h> --- they should only include "header.h".
(This removal avoids portability problems on some platforms where <stdio.h>
is sensitive to largefile compilation options.)
3. Copy the *.h files in snowball/src_c/ and snowball/runtime/
to src/include/snowball/libstemmer. At this writing the header files
do not require any changes.
4. Check whether any stemmer modules have been added or removed. If so, edit
the OBJS list in Makefile, the list of #include's in dict_snowball.c, and the
stemmer_modules[] table in dict_snowball.c. You might also need to change
the LANGUAGES list in Makefile and tsearch_config_languages in initdb.c.
5. The various stopword files in stopwords/ must be downloaded
individually from pages on the snowballstem.org website.
Be careful that these files must be stored in UTF-8 encoding.
|