Re: BUG #15892: URGENT: Using an ICU collation in a primary keycolumn breaks ILIKE query

Поиск
Список
Период
Сортировка
От James Inform
Тема Re: BUG #15892: URGENT: Using an ICU collation in a primary keycolumn breaks ILIKE query
Дата
Msg-id 1820097204.279159.1562258411044@email.ionos.de
обсуждение исходный текст
Ответ на Re: BUG #15892: URGENT: Using an ICU collation in a primary keycolumn breaks ILIKE query  ("Daniel Verite" <daniel@manitou-mail.org>)
Список pgsql-bugs
The weird thing is, that this behavior only occures when used on a field field a primary key:

If you do:

create database testdb;

\c testdb;

-- Just create a simple table with one column
create table icutest(data text not null collate "de-x-icu" primary key, data2 text collate "de-x-icu");

-- Insert a record with uppercase string
insert into icutest values ('MYTEST','MYTEST');

-- This is not giving a match
select * from icutest where data ilike 'mytest';

-- BUT THIS GIVES A MATCH:

select * from icutest where data2 ilike 'mytest';

-- So it seems to be especially related to the scenario where a primary key / index exists.


> On July 4, 2019 at 12:36 PM Daniel Verite <daniel@manitou-mail.org> wrote:
>
>
>  PG Bug reporting form wrote:
> > -- Just create a simple table with one column
> > create table icutest(data text not null collate "de-x-icu" primary key);
> >
> > -- Insert a record with uppercase string
> > insert into icutest values ('MYTEST');
> >
> > -- This is not giving a match
> > select * from icutest where data ilike 'mytest';
>
> This also happens on v10 and on the master branch.
>
> The bug seems to come from a mistake in like_support.c:
>
>
> /* * Check whether char is a letter (and, hence, subject to case-folding)
>  * * In multibyte character sets or with ICU, we can't use isalpha, and it does * not seem worth trying to convert to
wchar_tto use iswalpha. Instead, 
> just * assume any multibyte char is potentially case-varying.
>  */
> static int
> pattern_char_isalpha(char c, bool is_multibyte,
>  pg_locale_t locale, bool locale_is_c)
> {
>  if (locale_is_c)
>  return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
>  else if (is_multibyte && IS_HIGHBIT_SET(c))
>  return true;
>  else if (locale && locale->provider == COLLPROVIDER_ICU)
>  return IS_HIGHBIT_SET(c) ? true : false;
>
>
> With an ICU locale, this returns false for all characters in 'mytest'.
>
> I think this eventually leads the caller to incorrectly believe that it
> can optimize the test into an exact match (data='mytest'), given
> there are otherwise no wildcards in the pattern.
>
> On fixing the bug, if we make this function returns true for all
> characters under an ICU locale, it appears to work, but we're loosing an
> opportunity to optimize for some patterns.
> If OTOH we wanted to use an ICU call like u_isalpha(), to be closer
> to what's done with libc, we'd need to pass a UChar32 argument,
> not a char, and since we're in a char-oriented context, I don't see how
> to do that.
>
>
> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org
> Twitter: @DanielVerite



В списке pgsql-bugs по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: ALTER TABLE results in "ERROR: could not open relation with OID 43707388"
Следующее
От: Manuel Rigger
Дата:
Сообщение: Re: ALTER TABLE results in "ERROR: could not open relation with OID 43707388"