Обсуждение: pgsql: Update to latest Snowball sources.

Поиск
Список
Период
Сортировка

pgsql: Update to latest Snowball sources.

От
Tom Lane
Дата:
Update to latest Snowball sources.

It's been almost a year since we last did this, and upstream has
been busy.  They've added stemmers for Polish and Esperanto,
and also deprecated their old Dutch stemmer in favor of the
Kraaij-Pohlmann algorithm.  (The "dutch" stemmer is now the
latter, and "dutch_porter" is the old algorithm.)

Upstream also decided to rename their internal header "header.h"
to something less generic: "snowball_runtime.h".  Seems like a good
thing, but it complicates this patch a bit because we were relying on
interposing our own version of "header.h" to control system header
inclusion order.  (We're partially failing at that now, because now the
generated stemmer files include <stddef.h> before snowball_runtime.h.
I think that'll be okay, but if the buildfarm complains then we'll
have to do more-extensive editing of the generated files.)

I realized that we weren't documenting the available stemmers in
any user-visible place, except indirectly through sample \dFd output.
That's incomplete because we only provide built-in dictionaries for
the recommended stemmers for each language, not alternative stemmers
such as dutch_porter.  So I added a list to the documentation.

I did not do anything with the stopword lists.  If those are still
available from snowballstem.org, they are mighty well hidden.

Discussion: https://postgr.es/m/1185975.1767569534@sss.pgh.pa.us

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/7dc95cc3b94f2f558606e5ec307466a4e3dbc832

Modified Files
--------------
doc/src/sgml/textsearch.sgml                       |    52 +
src/backend/snowball/Makefile                      |     5 +
src/backend/snowball/README                        |    22 +-
src/backend/snowball/dict_snowball.c               |    14 +-
src/backend/snowball/libstemmer/api.c              |    48 +-
.../snowball/libstemmer/stem_ISO_8859_1_basque.c   |  1132 +--
.../snowball/libstemmer/stem_ISO_8859_1_catalan.c  |  1364 +--
.../snowball/libstemmer/stem_ISO_8859_1_danish.c   |   324 +-
.../snowball/libstemmer/stem_ISO_8859_1_dutch.c    |  2338 ++++-
.../libstemmer/stem_ISO_8859_1_dutch_porter.c      |   665 ++
.../snowball/libstemmer/stem_ISO_8859_1_english.c  |  1310 +--
.../snowball/libstemmer/stem_ISO_8859_1_finnish.c  |   686 +-
.../snowball/libstemmer/stem_ISO_8859_1_french.c   |  1493 +--
.../snowball/libstemmer/stem_ISO_8859_1_german.c   |   547 +-
.../libstemmer/stem_ISO_8859_1_indonesian.c        |   543 +-
.../snowball/libstemmer/stem_ISO_8859_1_irish.c    |   388 +-
.../snowball/libstemmer/stem_ISO_8859_1_italian.c  |   995 +-
.../libstemmer/stem_ISO_8859_1_norwegian.c         |   420 +-
.../snowball/libstemmer/stem_ISO_8859_1_porter.c   |   611 +-
.../libstemmer/stem_ISO_8859_1_portuguese.c        |   901 +-
.../snowball/libstemmer/stem_ISO_8859_1_spanish.c  |  1017 +-
.../snowball/libstemmer/stem_ISO_8859_1_swedish.c  |   477 +-
.../libstemmer/stem_ISO_8859_2_hungarian.c         |  1101 +-
.../snowball/libstemmer/stem_ISO_8859_2_polish.c   |   520 +
.../snowball/libstemmer/stem_KOI8_R_russian.c      |   602 +-
.../snowball/libstemmer/stem_UTF_8_arabic.c        |  1554 +--
.../snowball/libstemmer/stem_UTF_8_armenian.c      |   528 +-
.../snowball/libstemmer/stem_UTF_8_basque.c        |  1135 +--
.../snowball/libstemmer/stem_UTF_8_catalan.c       |  1367 +--
.../snowball/libstemmer/stem_UTF_8_danish.c        |   326 +-
src/backend/snowball/libstemmer/stem_UTF_8_dutch.c |  2400 ++++-
.../snowball/libstemmer/stem_UTF_8_dutch_porter.c  |   680 ++
.../snowball/libstemmer/stem_UTF_8_english.c       |  1324 +--
.../snowball/libstemmer/stem_UTF_8_esperanto.c     |   820 ++
.../snowball/libstemmer/stem_UTF_8_estonian.c      |  2010 ++--
.../snowball/libstemmer/stem_UTF_8_finnish.c       |   696 +-
.../snowball/libstemmer/stem_UTF_8_french.c        |  1523 +--
.../snowball/libstemmer/stem_UTF_8_german.c        |   554 +-
src/backend/snowball/libstemmer/stem_UTF_8_greek.c |  4218 ++++----
src/backend/snowball/libstemmer/stem_UTF_8_hindi.c |   308 +-
.../snowball/libstemmer/stem_UTF_8_hungarian.c     |  1100 +-
.../snowball/libstemmer/stem_UTF_8_indonesian.c    |   543 +-
src/backend/snowball/libstemmer/stem_UTF_8_irish.c |   388 +-
.../snowball/libstemmer/stem_UTF_8_italian.c       |  1007 +-
.../snowball/libstemmer/stem_UTF_8_lithuanian.c    |  1179 ++-
.../snowball/libstemmer/stem_UTF_8_nepali.c        |   598 +-
.../snowball/libstemmer/stem_UTF_8_norwegian.c     |   422 +-
.../snowball/libstemmer/stem_UTF_8_polish.c        |   523 +
.../snowball/libstemmer/stem_UTF_8_porter.c        |   620 +-
.../snowball/libstemmer/stem_UTF_8_portuguese.c    |   910 +-
.../snowball/libstemmer/stem_UTF_8_romanian.c      |   961 +-
.../snowball/libstemmer/stem_UTF_8_russian.c       |   625 +-
.../snowball/libstemmer/stem_UTF_8_serbian.c       | 10148 ++++++++++---------
.../snowball/libstemmer/stem_UTF_8_spanish.c       |  1023 +-
.../snowball/libstemmer/stem_UTF_8_swedish.c       |   479 +-
src/backend/snowball/libstemmer/stem_UTF_8_tamil.c |  1361 +--
.../snowball/libstemmer/stem_UTF_8_turkish.c       |  2371 +++--
.../snowball/libstemmer/stem_UTF_8_yiddish.c       |  1235 +--
src/backend/snowball/libstemmer/utilities.c        |   205 +-
src/backend/snowball/meson.build                   |     7 +-
src/backend/snowball/snowball_create.pl            |     2 +
src/bin/initdb/initdb.c                            |     2 +
src/include/snowball/libstemmer/api.h              |    18 +-
src/include/snowball/libstemmer/header.h           |    61 -
src/include/snowball/libstemmer/snowball_runtime.h |   109 +
.../snowball/libstemmer/stem_ISO_8859_1_basque.h   |     3 +-
.../snowball/libstemmer/stem_ISO_8859_1_catalan.h  |     3 +-
.../snowball/libstemmer/stem_ISO_8859_1_danish.h   |     3 +-
.../snowball/libstemmer/stem_ISO_8859_1_dutch.h    |     3 +-
.../libstemmer/stem_ISO_8859_1_dutch_porter.h      |    14 +
.../snowball/libstemmer/stem_ISO_8859_1_english.h  |     3 +-
.../snowball/libstemmer/stem_ISO_8859_1_finnish.h  |     3 +-
.../snowball/libstemmer/stem_ISO_8859_1_french.h   |     3 +-
.../snowball/libstemmer/stem_ISO_8859_1_german.h   |     3 +-
.../libstemmer/stem_ISO_8859_1_indonesian.h        |     3 +-
.../snowball/libstemmer/stem_ISO_8859_1_irish.h    |     3 +-
.../snowball/libstemmer/stem_ISO_8859_1_italian.h  |     3 +-
.../libstemmer/stem_ISO_8859_1_norwegian.h         |     3 +-
.../snowball/libstemmer/stem_ISO_8859_1_porter.h   |     3 +-
.../libstemmer/stem_ISO_8859_1_portuguese.h        |     3 +-
.../snowball/libstemmer/stem_ISO_8859_1_spanish.h  |     3 +-
.../snowball/libstemmer/stem_ISO_8859_1_swedish.h  |     3 +-
.../libstemmer/stem_ISO_8859_2_hungarian.h         |     3 +-
.../snowball/libstemmer/stem_ISO_8859_2_polish.h   |    14 +
.../snowball/libstemmer/stem_KOI8_R_russian.h      |     3 +-
.../snowball/libstemmer/stem_UTF_8_arabic.h        |     3 +-
.../snowball/libstemmer/stem_UTF_8_armenian.h      |     3 +-
.../snowball/libstemmer/stem_UTF_8_basque.h        |     3 +-
.../snowball/libstemmer/stem_UTF_8_catalan.h       |     3 +-
.../snowball/libstemmer/stem_UTF_8_danish.h        |     3 +-
src/include/snowball/libstemmer/stem_UTF_8_dutch.h |     3 +-
.../snowball/libstemmer/stem_UTF_8_dutch_porter.h  |    14 +
.../snowball/libstemmer/stem_UTF_8_english.h       |     3 +-
.../snowball/libstemmer/stem_UTF_8_esperanto.h     |    14 +
.../snowball/libstemmer/stem_UTF_8_estonian.h      |     3 +-
.../snowball/libstemmer/stem_UTF_8_finnish.h       |     3 +-
.../snowball/libstemmer/stem_UTF_8_french.h        |     3 +-
.../snowball/libstemmer/stem_UTF_8_german.h        |     3 +-
src/include/snowball/libstemmer/stem_UTF_8_greek.h |     3 +-
src/include/snowball/libstemmer/stem_UTF_8_hindi.h |     3 +-
.../snowball/libstemmer/stem_UTF_8_hungarian.h     |     3 +-
.../snowball/libstemmer/stem_UTF_8_indonesian.h    |     3 +-
src/include/snowball/libstemmer/stem_UTF_8_irish.h |     3 +-
.../snowball/libstemmer/stem_UTF_8_italian.h       |     3 +-
.../snowball/libstemmer/stem_UTF_8_lithuanian.h    |     3 +-
.../snowball/libstemmer/stem_UTF_8_nepali.h        |     3 +-
.../snowball/libstemmer/stem_UTF_8_norwegian.h     |     3 +-
.../snowball/libstemmer/stem_UTF_8_polish.h        |    14 +
.../snowball/libstemmer/stem_UTF_8_porter.h        |     3 +-
.../snowball/libstemmer/stem_UTF_8_portuguese.h    |     3 +-
.../snowball/libstemmer/stem_UTF_8_romanian.h      |     3 +-
.../snowball/libstemmer/stem_UTF_8_russian.h       |     3 +-
.../snowball/libstemmer/stem_UTF_8_serbian.h       |     3 +-
.../snowball/libstemmer/stem_UTF_8_spanish.h       |     3 +-
.../snowball/libstemmer/stem_UTF_8_swedish.h       |     3 +-
src/include/snowball/libstemmer/stem_UTF_8_tamil.h |     3 +-
.../snowball/libstemmer/stem_UTF_8_turkish.h       |     3 +-
.../snowball/libstemmer/stem_UTF_8_yiddish.h       |     3 +-
.../snowball/{header.h => snowball_runtime.h}      |    22 +-
119 files changed, 36038 insertions(+), 27113 deletions(-)


Re: pgsql: Update to latest Snowball sources.

От
Jelte Fennema-Nio
Дата:
On Mon, 5 Jan 2026 at 21:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Update to latest Snowball sources.

On my own CI I got some failures seemingly related to this:
https://cirrus-ci.com/task/5989391459418112

Might be build cache issues or something (that I don't have time to
look into right now), but wanted to call it out in case others run
into the same.



Re: pgsql: Update to latest Snowball sources.

От
Tom Lane
Дата:
Jelte Fennema-Nio <postgres@jeltef.nl> writes:
> On my own CI I got some failures seemingly related to this:
> https://cirrus-ci.com/task/5989391459418112

If it's in a meson build, yeah ... fixed ...

            regards, tom lane