Contractions in full text search result in very surprising stemming

Поиск
Список
Период
Сортировка
От Sam Saffron
Тема Contractions in full text search result in very surprising stemming
Дата
Msg-id CAAtdryOnYDJz7C8PLmYxGj8GGU=CTTVsxRF+5ys7XZWcTkHp=Q@mail.gmail.com
обсуждение исходный текст
Список pgsql-hackers
Per:

```
select ts_debug('english', 'you''re a star');
                               ts_debug
-----------------------------------------------------------------------
 (asciiword,"Word, all ASCII",you,{english_stem},english_stem,{})
 (blank,"Space symbols",',{},,)
 (asciiword,"Word, all ASCII",re,{english_stem},english_stem,{re})
 (blank,"Space symbols"," ",{},,)
 (asciiword,"Word, all ASCII",a,{english_stem},english_stem,{})
 (blank,"Space symbols"," ",{},,)
 (asciiword,"Word, all ASCII",star,{english_stem},english_stem,{star})
(7 rows)
```

And:

https://snowballstem.org/demo.html
https://snowballstem.org/texts/apostrophe.html

Snowball stemmer has special handling for contraction built in, but
out-of-the-box due to the order of filters it never gets access to the
data.

That means that a word such as `you're` stems incorrectly down to
`re`. Prefix matches end up hitting lots of surprising words.

I know this is a big can of worms... and unlikely easy to resolve ...
the latest changes to `to_tsquery` (replacing & with <=>)  are already
a bitter enough pill for lots to swallow and another breaking change
is not something many desire.  However, it feels like an oversight (at
least documentation wise). Perhaps a good starting point might be to
clearly document the issue and workaround?



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: pub/sub - specifying optional parameters without values.
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Generating code for query jumbling through gen_node_support.pl