diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-04 12:15:05 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-04 12:15:05 +0000 |
commit | 46651ce6fe013220ed397add242004d764fc0153 (patch) | |
tree | 6e5299f990f88e60174a1d3ae6e48eedd2688b2b /src/backend/snowball/README | |
parent | Initial commit. (diff) | |
download | postgresql-14-46651ce6fe013220ed397add242004d764fc0153.tar.xz postgresql-14-46651ce6fe013220ed397add242004d764fc0153.zip |
Adding upstream version 14.5.upstream/14.5upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'src/backend/snowball/README')
-rw-r--r-- | src/backend/snowball/README | 68 |
1 files changed, 68 insertions, 0 deletions
diff --git a/src/backend/snowball/README b/src/backend/snowball/README new file mode 100644 index 0000000..d83321b --- /dev/null +++ b/src/backend/snowball/README @@ -0,0 +1,68 @@ +src/backend/snowball/README + +Snowball-Based Stemming +======================= + +This module uses the word stemming code developed by the Snowball project, +http://snowballstem.org (formerly http://snowball.tartarus.org) +which is released by them under a BSD-style license. + +The Snowball project does not often make formal releases; it's best +to pull from their git repository + +git clone https://github.com/snowballstem/snowball.git + +and then building the derived files is as simple as + +cd snowball +make + +At least on Linux, no platform-specific adjustment is needed. + +Postgres' files under src/backend/snowball/libstemmer/ and +src/include/snowball/libstemmer/ are taken directly from the Snowball +files, with only some minor adjustments of file inclusions. Note +that most of these files are in fact derived files, not original source. +The original sources are in the Snowball language, and are built using +the Snowball-to-C compiler that is also part of the Snowball project. +We choose to include the derived files in the PostgreSQL distribution +because most installations will not have the Snowball compiler available. + +We are currently synced with the Snowball git commit +4764395431c8f2a0b4fe18b816ab1fc966a45837 (tag v2.1.0) +of 2021-01-21. + +To update the PostgreSQL sources from a new Snowball version: + +0. If you didn't do it already, "make -C snowball". + +1. Copy the *.c files in snowball/src_c/ to src/backend/snowball/libstemmer +with replacement of "../runtime/header.h" by "header.h", for example + +for f in .../snowball/src_c/*.c +do + sed 's|\.\./runtime/header\.h|header.h|' $f >libstemmer/`basename $f` +done + +Do not copy stemmers that are listed in libstemmer/modules.txt as +nonstandard, such as "german2" or "lovins". + +2. Copy the *.c files in snowball/runtime/ to +src/backend/snowball/libstemmer, and edit them to remove direct inclusions +of system headers such as <stdio.h> --- they should only include "header.h". +(This removal avoids portability problems on some platforms where <stdio.h> +is sensitive to largefile compilation options.) + +3. Copy the *.h files in snowball/src_c/ and snowball/runtime/ +to src/include/snowball/libstemmer. At this writing the header files +do not require any changes. + +4. Check whether any stemmer modules have been added or removed. If so, edit +the OBJS list in Makefile, the list of #include's in dict_snowball.c, and the +stemmer_modules[] table in dict_snowball.c, as well as the list in the +documentation in textsearch.sgml. You might also need to change +the LANGUAGES list in Makefile and tsearch_config_languages in initdb.c. + +5. The various stopword files in stopwords/ must be downloaded +individually from pages on the snowballstem.org website. +Be careful that these files must be stored in UTF-8 encoding. |