Re: BUG #15285: Query used index over field with ICU collation insome cases wrongly return 0 rows
От | Jehan-Guillaume de Rorthais |
---|---|
Тема | Re: BUG #15285: Query used index over field with ICU collation insome cases wrongly return 0 rows |
Дата | |
Msg-id | 20200612184055.205f0159@firost обсуждение исходный текст |
Ответ на | Re: BUG #15285: Query used index over field with ICU collation insome cases wrongly return 0 rows (Jehan-Guillaume de Rorthais <jgdr@dalibo.com>) |
Ответы |
Re: BUG #15285: Query used index over field with ICU collation insome cases wrongly return 0 rows
|
Список | pgsql-bugs |
On Wed, 10 Jun 2020 00:29:33 +0200 Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote: [...] > After playing with ICU regression tests, I found functions ucol_strcollIter > and ucol_nextSortKeyPart are safe. I'll do some performance tests and report > here. I did some benchmarks. See attachment for the script and its header to reproduce. It sorts 935895 french phrases from 0 to 122 chars with an average of 49. Performance tests were done on current master HEAD (buggy) and using the patch in attachment, relying on ucol_strcollIter. My preliminary test with ucol_getSortKey was catastrophic, as we might expect. 15-17x slower than the current HEAD. So I removed it from actual tests. I didn't try with ucol_nextSortKeyPart though. Using ucol_strcollIter performs ~20% slower than HEAD on UTF8 databases, but this might be acceptable. Here are the numbers: DB Encoding HEAD strcollIter ratio UTF8 2.74 3.27 1.19x LATIN1 5.34 5.40 1.01x I plan to add a regression test soon. > In the meantime, I've been working on various workarounds. The only one I > found is to use "fr-u-kr-latn-digit-kn" instead of "fr-u-kr-latn-digit". > Unfortunately, the two collations are not equivalent, but I believe it might > be useful in many case. > > I've been working on a second workaround: creating a type (a char variant for > our usecase), its operators and opfamily. All operators and function 1 relies > on ucol_getSortKey. Most of the workaround works good but surprisingly, the > sort order is only enforced if the field is in the first position: > > * this works: "SORT BY f1 COLLATE digitslast" > * this fails: "SORT BY f2, f1 COLLATE digitslast" I fixed this. I didn't declare my opclass as default for the type I created. I'm not sure people would like to see/discuss this user workaround here? Regards,
Вложения
В списке pgsql-bugs по дате отправления: