Re: Wrong results with equality search using trigram index and non-deterministic collation
| От | David Geier |
|---|---|
| Тема | Re: Wrong results with equality search using trigram index and non-deterministic collation |
| Дата | |
| Msg-id | cad6d0fa-b81a-4b43-803c-bf1dbbc6385a@gmail.com обсуждение |
| Ответ на | Wrong results with equality search using trigram index and non-deterministic collation (Laurenz Albe <laurenz.albe@cybertec.at>) |
| Ответы |
Re: Wrong results with equality search using trigram index and non-deterministic collation
|
| Список | pgsql-hackers |
On 17.09.2024 08:00, Laurenz Albe wrote: > Using a trigram index with an non-deterministic collation can > lead to wrong query results: > > CREATE COLLATION faux_cn (PROVIDER = icu, LOCALE = 'und', DETERMINISTIC = FALSE, RULES = '&l = r'); > > CREATE TABLE boom (id integer PRIMARY KEY, t text COLLATE faux_cn); > > INSERT INTO boom VALUES (1, 'right'), (2, 'light'); > > SELECT * FROM boom WHERE t = 'right'; > > id │ t > ════╪═══════ > 1 │ right > 2 │ light > (2 rows) > > CREATE INDEX ON boom USING gin (t gin_trgm_ops); > > SET enable_seqscan = off; > > SELECT * FROM boom WHERE t = 'right'; > > id │ t > ════╪═══════ > 1 │ right > (1 row) > > I also see questionable results with the similarity operator (with and > without the index): > > SELECT * FROM boom WHERE t % 'rigor'; > > id │ t > ════╪═══════ > 1 │ right > (1 row) > > But here you could argue that the operator ignores the collation, so > the result is correct. With equality, there is no such loophole. I think we should change that. It's very counter intuitive that a query can change behavior when the planner flips from using e.g. a Seq Scan to a Bitmap Index Scan or the other way around. There's already a patch for that, see [1]. > I don't know what the correct fix would be. Perhaps just refusing to use > the index for equality comparisons with non-deterministic collations. If we merge [1], then not only = but also LIKE would be incorrect. How about disabling CREATE INDEX USING gin on columns with non-deterministic collations? Or is there maybe a way to make these cases work correctly for non-deterministic collations by applying the collation when extracting the search trigrams? I take a look into that. [1] https://www.postgresql.org/message-id/flat/db087c3e-230e-4119-8a03-8b5d74956bc2%40gmail.com -- David Geier
В списке pgsql-hackers по дате отправления: