From 46651ce6fe013220ed397add242004d764fc0153 Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Sat, 4 May 2024 14:15:05 +0200 Subject: Adding upstream version 14.5. Signed-off-by: Daniel Baumann --- src/backend/snowball/README | 68 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+) create mode 100644 src/backend/snowball/README (limited to 'src/backend/snowball/README') diff --git a/src/backend/snowball/README b/src/backend/snowball/README new file mode 100644 index 0000000..d83321b --- /dev/null +++ b/src/backend/snowball/README @@ -0,0 +1,68 @@ +src/backend/snowball/README + +Snowball-Based Stemming +======================= + +This module uses the word stemming code developed by the Snowball project, +http://snowballstem.org (formerly http://snowball.tartarus.org) +which is released by them under a BSD-style license. + +The Snowball project does not often make formal releases; it's best +to pull from their git repository + +git clone https://github.com/snowballstem/snowball.git + +and then building the derived files is as simple as + +cd snowball +make + +At least on Linux, no platform-specific adjustment is needed. + +Postgres' files under src/backend/snowball/libstemmer/ and +src/include/snowball/libstemmer/ are taken directly from the Snowball +files, with only some minor adjustments of file inclusions. Note +that most of these files are in fact derived files, not original source. +The original sources are in the Snowball language, and are built using +the Snowball-to-C compiler that is also part of the Snowball project. +We choose to include the derived files in the PostgreSQL distribution +because most installations will not have the Snowball compiler available. + +We are currently synced with the Snowball git commit +4764395431c8f2a0b4fe18b816ab1fc966a45837 (tag v2.1.0) +of 2021-01-21. + +To update the PostgreSQL sources from a new Snowball version: + +0. If you didn't do it already, "make -C snowball". + +1. Copy the *.c files in snowball/src_c/ to src/backend/snowball/libstemmer +with replacement of "../runtime/header.h" by "header.h", for example + +for f in .../snowball/src_c/*.c +do + sed 's|\.\./runtime/header\.h|header.h|' $f >libstemmer/`basename $f` +done + +Do not copy stemmers that are listed in libstemmer/modules.txt as +nonstandard, such as "german2" or "lovins". + +2. Copy the *.c files in snowball/runtime/ to +src/backend/snowball/libstemmer, and edit them to remove direct inclusions +of system headers such as --- they should only include "header.h". +(This removal avoids portability problems on some platforms where +is sensitive to largefile compilation options.) + +3. Copy the *.h files in snowball/src_c/ and snowball/runtime/ +to src/include/snowball/libstemmer. At this writing the header files +do not require any changes. + +4. Check whether any stemmer modules have been added or removed. If so, edit +the OBJS list in Makefile, the list of #include's in dict_snowball.c, and the +stemmer_modules[] table in dict_snowball.c, as well as the list in the +documentation in textsearch.sgml. You might also need to change +the LANGUAGES list in Makefile and tsearch_config_languages in initdb.c. + +5. The various stopword files in stopwords/ must be downloaded +individually from pages on the snowballstem.org website. +Be careful that these files must be stored in UTF-8 encoding. -- cgit v1.2.3