Update to latest Snowball sources.
It's been almost a year since we last did this, and upstream has
been busy. They've added stemmers for Polish and Esperanto,
and also deprecated their old Dutch stemmer in favor of the
Kraaij-Pohlmann algorithm. (The "dutch" stemmer is now the
latter, and "dutch_porter" is the old algorithm.)
Upstream also decided to rename their internal header "header.h"
to something less generic: "snowball_runtime.h". Seems like a good
thing, but it complicates this patch a bit because we were relying on
interposing our own version of "header.h" to control system header
inclusion order. (We're partially failing at that now, because now the
generated stemmer files include <stddef.h> before snowball_runtime.h.
I think that'll be okay, but if the buildfarm complains then we'll
have to do more-extensive editing of the generated files.)
I realized that we weren't documenting the available stemmers in
any user-visible place, except indirectly through sample \dFd output.
That's incomplete because we only provide built-in dictionaries for
the recommended stemmers for each language, not alternative stemmers
such as dutch_porter. So I added a list to the documentation.
I did not do anything with the stopword lists. If those are still
available from snowballstem.org, they are mighty well hidden.
Discussion: https://postgr.es/m/1185975.1767569534@sss.pgh.pa.us
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/7dc95cc3b94f2f558606e5ec307466a4e3dbc832
Modified Files
--------------
doc/src/sgml/textsearch.sgml | 52 +
src/backend/snowball/Makefile | 5 +
src/backend/snowball/README | 22 +-
src/backend/snowball/dict_snowball.c | 14 +-
src/backend/snowball/libstemmer/api.c | 48 +-
.../snowball/libstemmer/stem_ISO_8859_1_basque.c | 1132 +--
.../snowball/libstemmer/stem_ISO_8859_1_catalan.c | 1364 +--
.../snowball/libstemmer/stem_ISO_8859_1_danish.c | 324 +-
.../snowball/libstemmer/stem_ISO_8859_1_dutch.c | 2338 ++++-
.../libstemmer/stem_ISO_8859_1_dutch_porter.c | 665 ++
.../snowball/libstemmer/stem_ISO_8859_1_english.c | 1310 +--
.../snowball/libstemmer/stem_ISO_8859_1_finnish.c | 686 +-
.../snowball/libstemmer/stem_ISO_8859_1_french.c | 1493 +--
.../snowball/libstemmer/stem_ISO_8859_1_german.c | 547 +-
.../libstemmer/stem_ISO_8859_1_indonesian.c | 543 +-
.../snowball/libstemmer/stem_ISO_8859_1_irish.c | 388 +-
.../snowball/libstemmer/stem_ISO_8859_1_italian.c | 995 +-
.../libstemmer/stem_ISO_8859_1_norwegian.c | 420 +-
.../snowball/libstemmer/stem_ISO_8859_1_porter.c | 611 +-
.../libstemmer/stem_ISO_8859_1_portuguese.c | 901 +-
.../snowball/libstemmer/stem_ISO_8859_1_spanish.c | 1017 +-
.../snowball/libstemmer/stem_ISO_8859_1_swedish.c | 477 +-
.../libstemmer/stem_ISO_8859_2_hungarian.c | 1101 +-
.../snowball/libstemmer/stem_ISO_8859_2_polish.c | 520 +
.../snowball/libstemmer/stem_KOI8_R_russian.c | 602 +-
.../snowball/libstemmer/stem_UTF_8_arabic.c | 1554 +--
.../snowball/libstemmer/stem_UTF_8_armenian.c | 528 +-
.../snowball/libstemmer/stem_UTF_8_basque.c | 1135 +--
.../snowball/libstemmer/stem_UTF_8_catalan.c | 1367 +--
.../snowball/libstemmer/stem_UTF_8_danish.c | 326 +-
src/backend/snowball/libstemmer/stem_UTF_8_dutch.c | 2400 ++++-
.../snowball/libstemmer/stem_UTF_8_dutch_porter.c | 680 ++
.../snowball/libstemmer/stem_UTF_8_english.c | 1324 +--
.../snowball/libstemmer/stem_UTF_8_esperanto.c | 820 ++
.../snowball/libstemmer/stem_UTF_8_estonian.c | 2010 ++--
.../snowball/libstemmer/stem_UTF_8_finnish.c | 696 +-
.../snowball/libstemmer/stem_UTF_8_french.c | 1523 +--
.../snowball/libstemmer/stem_UTF_8_german.c | 554 +-
src/backend/snowball/libstemmer/stem_UTF_8_greek.c | 4218 ++++----
src/backend/snowball/libstemmer/stem_UTF_8_hindi.c | 308 +-
.../snowball/libstemmer/stem_UTF_8_hungarian.c | 1100 +-
.../snowball/libstemmer/stem_UTF_8_indonesian.c | 543 +-
src/backend/snowball/libstemmer/stem_UTF_8_irish.c | 388 +-
.../snowball/libstemmer/stem_UTF_8_italian.c | 1007 +-
.../snowball/libstemmer/stem_UTF_8_lithuanian.c | 1179 ++-
.../snowball/libstemmer/stem_UTF_8_nepali.c | 598 +-
.../snowball/libstemmer/stem_UTF_8_norwegian.c | 422 +-
.../snowball/libstemmer/stem_UTF_8_polish.c | 523 +
.../snowball/libstemmer/stem_UTF_8_porter.c | 620 +-
.../snowball/libstemmer/stem_UTF_8_portuguese.c | 910 +-
.../snowball/libstemmer/stem_UTF_8_romanian.c | 961 +-
.../snowball/libstemmer/stem_UTF_8_russian.c | 625 +-
.../snowball/libstemmer/stem_UTF_8_serbian.c | 10148 ++++++++++---------
.../snowball/libstemmer/stem_UTF_8_spanish.c | 1023 +-
.../snowball/libstemmer/stem_UTF_8_swedish.c | 479 +-
src/backend/snowball/libstemmer/stem_UTF_8_tamil.c | 1361 +--
.../snowball/libstemmer/stem_UTF_8_turkish.c | 2371 +++--
.../snowball/libstemmer/stem_UTF_8_yiddish.c | 1235 +--
src/backend/snowball/libstemmer/utilities.c | 205 +-
src/backend/snowball/meson.build | 7 +-
src/backend/snowball/snowball_create.pl | 2 +
src/bin/initdb/initdb.c | 2 +
src/include/snowball/libstemmer/api.h | 18 +-
src/include/snowball/libstemmer/header.h | 61 -
src/include/snowball/libstemmer/snowball_runtime.h | 109 +
.../snowball/libstemmer/stem_ISO_8859_1_basque.h | 3 +-
.../snowball/libstemmer/stem_ISO_8859_1_catalan.h | 3 +-
.../snowball/libstemmer/stem_ISO_8859_1_danish.h | 3 +-
.../snowball/libstemmer/stem_ISO_8859_1_dutch.h | 3 +-
.../libstemmer/stem_ISO_8859_1_dutch_porter.h | 14 +
.../snowball/libstemmer/stem_ISO_8859_1_english.h | 3 +-
.../snowball/libstemmer/stem_ISO_8859_1_finnish.h | 3 +-
.../snowball/libstemmer/stem_ISO_8859_1_french.h | 3 +-
.../snowball/libstemmer/stem_ISO_8859_1_german.h | 3 +-
.../libstemmer/stem_ISO_8859_1_indonesian.h | 3 +-
.../snowball/libstemmer/stem_ISO_8859_1_irish.h | 3 +-
.../snowball/libstemmer/stem_ISO_8859_1_italian.h | 3 +-
.../libstemmer/stem_ISO_8859_1_norwegian.h | 3 +-
.../snowball/libstemmer/stem_ISO_8859_1_porter.h | 3 +-
.../libstemmer/stem_ISO_8859_1_portuguese.h | 3 +-
.../snowball/libstemmer/stem_ISO_8859_1_spanish.h | 3 +-
.../snowball/libstemmer/stem_ISO_8859_1_swedish.h | 3 +-
.../libstemmer/stem_ISO_8859_2_hungarian.h | 3 +-
.../snowball/libstemmer/stem_ISO_8859_2_polish.h | 14 +
.../snowball/libstemmer/stem_KOI8_R_russian.h | 3 +-
.../snowball/libstemmer/stem_UTF_8_arabic.h | 3 +-
.../snowball/libstemmer/stem_UTF_8_armenian.h | 3 +-
.../snowball/libstemmer/stem_UTF_8_basque.h | 3 +-
.../snowball/libstemmer/stem_UTF_8_catalan.h | 3 +-
.../snowball/libstemmer/stem_UTF_8_danish.h | 3 +-
src/include/snowball/libstemmer/stem_UTF_8_dutch.h | 3 +-
.../snowball/libstemmer/stem_UTF_8_dutch_porter.h | 14 +
.../snowball/libstemmer/stem_UTF_8_english.h | 3 +-
.../snowball/libstemmer/stem_UTF_8_esperanto.h | 14 +
.../snowball/libstemmer/stem_UTF_8_estonian.h | 3 +-
.../snowball/libstemmer/stem_UTF_8_finnish.h | 3 +-
.../snowball/libstemmer/stem_UTF_8_french.h | 3 +-
.../snowball/libstemmer/stem_UTF_8_german.h | 3 +-
src/include/snowball/libstemmer/stem_UTF_8_greek.h | 3 +-
src/include/snowball/libstemmer/stem_UTF_8_hindi.h | 3 +-
.../snowball/libstemmer/stem_UTF_8_hungarian.h | 3 +-
.../snowball/libstemmer/stem_UTF_8_indonesian.h | 3 +-
src/include/snowball/libstemmer/stem_UTF_8_irish.h | 3 +-
.../snowball/libstemmer/stem_UTF_8_italian.h | 3 +-
.../snowball/libstemmer/stem_UTF_8_lithuanian.h | 3 +-
.../snowball/libstemmer/stem_UTF_8_nepali.h | 3 +-
.../snowball/libstemmer/stem_UTF_8_norwegian.h | 3 +-
.../snowball/libstemmer/stem_UTF_8_polish.h | 14 +
.../snowball/libstemmer/stem_UTF_8_porter.h | 3 +-
.../snowball/libstemmer/stem_UTF_8_portuguese.h | 3 +-
.../snowball/libstemmer/stem_UTF_8_romanian.h | 3 +-
.../snowball/libstemmer/stem_UTF_8_russian.h | 3 +-
.../snowball/libstemmer/stem_UTF_8_serbian.h | 3 +-
.../snowball/libstemmer/stem_UTF_8_spanish.h | 3 +-
.../snowball/libstemmer/stem_UTF_8_swedish.h | 3 +-
src/include/snowball/libstemmer/stem_UTF_8_tamil.h | 3 +-
.../snowball/libstemmer/stem_UTF_8_turkish.h | 3 +-
.../snowball/libstemmer/stem_UTF_8_yiddish.h | 3 +-
.../snowball/{header.h => snowball_runtime.h} | 22 +-
119 files changed, 36038 insertions(+), 27113 deletions(-)