I wrote:
> Teodor Sigaev <teodor@sigaev.ru> writes:
>> 2 Snowball's compiling infrastructure doesn't support Windows target.
> Yeah. Another problem with using their original source code is that
> running the Snowball compiler during build would not work for
> cross-compiled builds of Postgres, at least not without solving the
> problem of building some code for the host platform instead of the
> target.
> So what I'm thinking now is we should import libstemmer instead of the
> snowball_code representation. I haven't gotten as far as thinking about
> exactly how to lay out the files though.
I've done some more work on this point. After looking at the Snowball
code in more detail, I'm thinking it'd be a good idea to keep it at
arm's length in a loadable shared library, instead of incorporating it
directly into the backend. This is because they don't see anything
wrong with exporting random global function names like "eq_v" and
"skip_utf8"; so the probability of name collisions is a bit too high for
my taste. The current tsearch_core patch envisions having a couple of
the snowball stemmers in the core backend and the rest in a loadable
library, but I suggest we just put them all in a loadable library, with
the only entry points being snowball_init() and snowball_lexize()
tsearch dictionary support functions. (I am thinking of having just one
such function pair, with the init function taking an init option to
select which stemmer to use, instead of a separate Postgres function
pair per stemmer.)
Attached is a rough proof-of-concept patch for this. It doesn't do
anything useful, but it does prove that we can compile and link the
Snowball stemmers into a Postgres loadable module with only trivial
changes to their source code. The code compiles cleanly (zero warnings
in gcc). The file layout is
src/backend/snowball/Makefile our files
src/backend/snowball/README
src/backend/snowball/dict_snowball.c
src/backend/snowball/libstemmer/*.c their .c files
src/include/snowball/header.h intercepting .h file
src/include/snowball/libstemmer/*.h their .h files
If there're no objections, I'll push forward with completing the
dictionary support functions to go with this infrastructure.
regards, tom lane