Sigh, LIKE indexing is *still* broken in foreign locales

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Sigh, LIKE indexing is *still* broken in foreign locales
Дата
Msg-id 14045.960430926@sss.pgh.pa.us
обсуждение исходный текст
Ответ на LIKE bug  (Moucha Václav <MouchaV@Radiomobil.cz>)
Ответы Re: [HACKERS] Sigh, LIKE indexing is *still* broken in foreign locales  (Giles Lean <giles@nemeton.com.au>)
Re: [HACKERS] Sigh, LIKE indexing is *still* broken in foreign locales  (Peter Eisentraut <peter_e@gmx.net>)
Список pgsql-bugs
Moucha Václav <MouchaV@Radiomobil.cz> writes:
> 1. Compilation
>    ./configure --enable-locale    # not needed for RPMS precompiled binaries 

> 2. Starting postmaster
>    export LC_CTYPE=cs_CZ
>    export LC_COLLATE=cs_CZ        # this setting is important for the
> bug result
>    postmaster -S -D /home/pgsql/data -o '-Fe'    

> 3. SQL steps
>    create table test (name text);
>    insert into test values ('�');    # the first char is E1 from LATIN 2
> coding
>    insert into test values ('�b');
>    create index test_index on test (name);
>    set cpu_tuple_cost=1;        # force backend to use index
> scanning
>    select * from test where name like '�%';

> BUG: Only 1 line is selected with '�' only instead of both lines.

The problem here is that given the search pattern '\341%', the planner
generates index limit conditionsname >= '\341' AND name < '\342';

Apparently, in CZ locale it is true that '\341' is less than '\342',
but it does not follow from that that all strings starting with '\341'
are less than '\342'.  In fact '\341b' is considered greater than '\342'.

Since '\341' and '\342' are two different accented forms of 'a'
(if I'm looking at the right character set), this is perhaps not so
improbable as all that.  Evidently the collation rule is that different
accent forms sort the same unless the strings would otherwise be
considered equal, in which case an ordering is assigned to them.

So, the rule we thought we had for generating index bounds falls flat,
and we're back to the same old question: given a proposed prefix string,
how can we generate bounds that are certain to be considered <= and >=
all strings starting with that prefix?

I am now thinking that maybe we should search for a string that compares
greater than "fooz" when the prefix is "foo" --- that is, append a 'z'
to the prefix string.  But I wouldn't be surprised if that fails too
in some locales.

I'm also wondering if the left-hand inequality ('foo' <= any string
beginning with 'foo') might fail in some locales ... we haven't seen
it reported but who knows ...
        regards, tom lane


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Bug?
Следующее
От: Giles Lean
Дата:
Сообщение: Re: [HACKERS] Sigh, LIKE indexing is *still* broken in foreign locales