Re: BUG #15892: URGENT: Using an ICU collation in a primary keycolumn breaks ILIKE query

Поиск
Список
Период
Сортировка
От Daniel Verite
Тема Re: BUG #15892: URGENT: Using an ICU collation in a primary keycolumn breaks ILIKE query
Дата
Msg-id c02df3ae-ef62-41a7-bcda-3ac6da8c5f30@manitou-mail.org
обсуждение исходный текст
Ответ на BUG #15892: URGENT: Using an ICU collation in a primary key column breaks ILIKE query  (PG Bug reporting form <noreply@postgresql.org>)
Ответы Re: BUG #15892: URGENT: Using an ICU collation in a primary keycolumn breaks ILIKE query  (James Inform <james.inform@pharmapp.de>)
Список pgsql-bugs
    PG Bug reporting form wrote:

> -- Just create a simple table with one column
> create table icutest(data text not null collate "de-x-icu" primary key);
>
> -- Insert a record with uppercase string
> insert into icutest values ('MYTEST');
>
> -- This is not giving a match
> select * from icutest where data ilike 'mytest';

This also happens on v10 and on the master branch.

The bug seems to come from a mistake in like_support.c:


/*
 * Check whether char is a letter (and, hence, subject to case-folding)
 *
 * In multibyte character sets or with ICU, we can't use isalpha, and it does
 * not seem worth trying to convert to wchar_t to use iswalpha.  Instead,
just
 * assume any multibyte char is potentially case-varying.
 */
static int
pattern_char_isalpha(char c, bool is_multibyte,
             pg_locale_t locale, bool locale_is_c)
{
    if (locale_is_c)
    return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
    else if (is_multibyte && IS_HIGHBIT_SET(c))
    return true;
    else if (locale && locale->provider == COLLPROVIDER_ICU)
    return IS_HIGHBIT_SET(c) ? true : false;


With an ICU locale, this returns false for all characters in 'mytest'.

I think this eventually leads the caller to incorrectly believe that it
can optimize the test into an exact match (data='mytest'), given
there are otherwise no wildcards in the pattern.

On fixing the bug, if we make this function returns true for all
characters under an ICU locale, it appears to work, but we're loosing an
opportunity to optimize for some patterns.
If OTOH we wanted to use an ICU call like u_isalpha(), to be closer
to what's done with libc,  we'd need to pass a UChar32 argument,
not a char, and since we're in a char-oriented context, I don't see how
to do that.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: BUG #15889: PostgreSQL failed to build due to error MSB8020 withMSVC on windows
Следующее
От: Manuel Rigger
Дата:
Сообщение: DISCARD TEMP results in "ERROR: cache lookup failed for type 0"