summaryrefslogtreecommitdiffstats
path: root/src/backend/snowball/README
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-05-04 12:15:05 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-05-04 12:15:05 +0000
commit46651ce6fe013220ed397add242004d764fc0153 (patch)
tree6e5299f990f88e60174a1d3ae6e48eedd2688b2b /src/backend/snowball/README
parentInitial commit. (diff)
downloadpostgresql-14-46651ce6fe013220ed397add242004d764fc0153.tar.xz
postgresql-14-46651ce6fe013220ed397add242004d764fc0153.zip
Adding upstream version 14.5.upstream/14.5upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'src/backend/snowball/README')
-rw-r--r--src/backend/snowball/README68
1 files changed, 68 insertions, 0 deletions
diff --git a/src/backend/snowball/README b/src/backend/snowball/README
new file mode 100644
index 0000000..d83321b
--- /dev/null
+++ b/src/backend/snowball/README
@@ -0,0 +1,68 @@
+src/backend/snowball/README
+
+Snowball-Based Stemming
+=======================
+
+This module uses the word stemming code developed by the Snowball project,
+http://snowballstem.org (formerly http://snowball.tartarus.org)
+which is released by them under a BSD-style license.
+
+The Snowball project does not often make formal releases; it's best
+to pull from their git repository
+
+git clone https://github.com/snowballstem/snowball.git
+
+and then building the derived files is as simple as
+
+cd snowball
+make
+
+At least on Linux, no platform-specific adjustment is needed.
+
+Postgres' files under src/backend/snowball/libstemmer/ and
+src/include/snowball/libstemmer/ are taken directly from the Snowball
+files, with only some minor adjustments of file inclusions. Note
+that most of these files are in fact derived files, not original source.
+The original sources are in the Snowball language, and are built using
+the Snowball-to-C compiler that is also part of the Snowball project.
+We choose to include the derived files in the PostgreSQL distribution
+because most installations will not have the Snowball compiler available.
+
+We are currently synced with the Snowball git commit
+4764395431c8f2a0b4fe18b816ab1fc966a45837 (tag v2.1.0)
+of 2021-01-21.
+
+To update the PostgreSQL sources from a new Snowball version:
+
+0. If you didn't do it already, "make -C snowball".
+
+1. Copy the *.c files in snowball/src_c/ to src/backend/snowball/libstemmer
+with replacement of "../runtime/header.h" by "header.h", for example
+
+for f in .../snowball/src_c/*.c
+do
+ sed 's|\.\./runtime/header\.h|header.h|' $f >libstemmer/`basename $f`
+done
+
+Do not copy stemmers that are listed in libstemmer/modules.txt as
+nonstandard, such as "german2" or "lovins".
+
+2. Copy the *.c files in snowball/runtime/ to
+src/backend/snowball/libstemmer, and edit them to remove direct inclusions
+of system headers such as <stdio.h> --- they should only include "header.h".
+(This removal avoids portability problems on some platforms where <stdio.h>
+is sensitive to largefile compilation options.)
+
+3. Copy the *.h files in snowball/src_c/ and snowball/runtime/
+to src/include/snowball/libstemmer. At this writing the header files
+do not require any changes.
+
+4. Check whether any stemmer modules have been added or removed. If so, edit
+the OBJS list in Makefile, the list of #include's in dict_snowball.c, and the
+stemmer_modules[] table in dict_snowball.c, as well as the list in the
+documentation in textsearch.sgml. You might also need to change
+the LANGUAGES list in Makefile and tsearch_config_languages in initdb.c.
+
+5. The various stopword files in stopwords/ must be downloaded
+individually from pages on the snowballstem.org website.
+Be careful that these files must be stored in UTF-8 encoding.