Re: How to drop all tokens that a snowball dictionary cannot stem?

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: How to drop all tokens that a snowball dictionary cannot stem?
Дата
Msg-id CAMkU=1xnBfTm3LFeXT2-EvUuOM=px_h7O9sE1cjYm6CUetoKjw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: How to drop all tokens that a snowball dictionary cannot stem?  (Christoph Gößmann <mail@goessmann.io>)
Список pgsql-general
On Sat, Nov 23, 2019 at 10:42 AM Christoph Gößmann <mail@goessmann.io> wrote:
Hi Jeff,

You're right about that point. Let me redefine. I would like to drop all tokens which neither are the stemmed or unstemmed version of a known word. Would there be the possibility of putting a wordlist as a filter ahead of the stemming? Or do you know about a good English lexeme list that could be used to filter after stemming?

I think what you describe is the opposite of what snowball was designed to do.  You want an ispell-based dictionary instead.

PostgreSQL doesn't ship with real ispell dictionaries, so you have to retrieve the files yourself and install them into $SHAREDIR/tsearch_data as described in the docs for https://www.postgresql.org/docs/12/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY

Cheers,

Jeff 

В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Remote Connection Help
Следующее
От: Blake McBride
Дата:
Сообщение: Trouble incrementing a column