Re: BUG #15689: Stemming of negation/not operator

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: BUG #15689: Stemming of negation/not operator
Дата
Msg-id 16223.1552430042@sss.pgh.pa.us
обсуждение исходный текст
Ответ на BUG #15689: Stemming of negation/not operator  (PG Bug reporting form <noreply@postgresql.org>)
Ответы Re: BUG #15689: Stemming of negation/not operator
Список pgsql-bugs
PG Bug reporting form <noreply@postgresql.org> writes:
> When using to_tsquery function it is stemming negation/not parts of the
> query, where it probably shouldn't.
> Some examples:

> SELECT to_tsquery('english', 'car & !cars');
>    to_tsquery   
> ----------------
>  'car' & !'car'

I'm not exactly convinced by this argument, because it seems like
you're only thinking about a corner case.  There are probably at
least as many examples where you *do* want stemming on a negated term.

Another issue is that even if we changed the tsquery input function
to not stem particular words, I doubt that it would do anything useful,
because what it will be comparing to is tsvector entries that have
certainly been stemmed.  That is, even if the original document said
"cars", what's going to be in the tsvector is just "car", so that
forbidding a match to "cars" isn't going to do anything.  (Maybe
what this says is that there should be a less-lossy recheck against
the original document after the tsvector match, but that'd have to
be done by an additional, explicit operator I think.  Or possibly
the recheck just requires tsquery match with a different stemming
configuration.)

A related problem that's bothered me for some time is that lexemes
get stemmed even if there is a "*" (prefix match) marker on them,
causing them to possibly match much more than the user expected.
But again, it's not real obvious how to make that better given the
match-to-tsvector context --- not stemming could easily remove
desired matches to stemmed tsvector entries.

If we could think of a way for it to do something useful, my inclination
would be to allow an explicit "don't stem" marker on lexemes, rather
than trying to drive it off whether the context is a negation or not.

            regards, tom lane


В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #15689: Stemming of negation/not operator
Следующее
От: Sandeep Thakkar
Дата:
Сообщение: Re: Installation issue