Re: BUG #15285: Query used index over field with ICU collation insome cases wrongly return 0 rows
От | Jehan-Guillaume de Rorthais |
---|---|
Тема | Re: BUG #15285: Query used index over field with ICU collation insome cases wrongly return 0 rows |
Дата | |
Msg-id | 20200610002933.6a6d482b@firost обсуждение исходный текст |
Ответ на | Re: BUG #15285: Query used index over field with ICU collation insome cases wrongly return 0 rows (Thomas Munro <thomas.munro@enterprisedb.com>) |
Ответы |
Re: BUG #15285: Query used index over field with ICU collation insome cases wrongly return 0 rows
Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows |
Список | pgsql-bugs |
Hello, I didn't find any other discussion related to this bug, neither on pgsql-bugs or pgsql-hackers. Hopefully, this is the best thread to give some update. On Sat, 21 Jul 2018 13:39:12 +1200 Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Fri, Jul 20, 2018 at 11:26 AM, Peter Geoghegan <pg@bowt.ie> wrote: > > On Thu, Jul 19, 2018 at 9:44 AM, Peter Geoghegan <pg@bowt.ie> wrote: > >> It appears that the main support function 1 routine disagrees with the > >> CREATE INDEX sort order, which is wrong. I'll try to isolate the > >> problem a bit further. > > > > As far as I can tell, this is an ICU bug. ucol_strcollUTF8() is buggy > > with this digitslast collation, which ucol_nextSortKeyPart() fails to > > be bug-compatible with. Other similar customized collations (e.g. > > 'en-u-kf-upper') work fine. (Ugh, that's familiar in an unpleasant > > way.) > > > > I'm using libicu60. What version are you using, Roman? > > > > I tried to find something that matches this on the ICU bug tracker. > > This might be a match: https://ssl.icu-project.org/trac/ticket/12518 > > FWIW I see the same result with icu 61.1 and 62.1_1 from FreeBSD ports. Some colleagues hit this bug as well last week and reported it to me. I can reproduce this bug with ICU current master branch, version post 67.1. I wrote a regression test for icu4c and posted it on ICU-12518. See: https://unicode-org.atlassian.net/browse/ICU-12518 As Peter wrote, ucol_strcollUTF8 (and ucol_strcoll) functions are affected. A quick and dirty patch to replace ucol_strcoll* by ucol_getSortKey/strcmp everywhere fixed the bug for my tests. After playing with ICU regression tests, I found functions ucol_strcollIter and ucol_nextSortKeyPart are safe. I'll do some performance tests and report here. In the meantime, I've been working on various workarounds. The only one I found is to use "fr-u-kr-latn-digit-kn" instead of "fr-u-kr-latn-digit". Unfortunately, the two collations are not equivalent, but I believe it might be useful in many case. I've been working on a second workaround: creating a type (a char variant for our usecase), its operators and opfamily. All operators and function 1 relies on ucol_getSortKey. Most of the workaround works good but surprisingly, the sort order is only enforced if the field is in the first position: * this works: "SORT BY f1 COLLATE digitslast" * this fails: "SORT BY f2, f1 COLLATE digitslast" I hadn't time to investigate further on this last topic. Regards,
В списке pgsql-bugs по дате отправления: