Re: Sigh, LIKE indexing is *still* broken in foreign locales

Поиск
Список
Период
Сортировка
От Erich Stamberger
Тема Re: Sigh, LIKE indexing is *still* broken in foreign locales
Дата
Msg-id Pine.LNX.4.21.0006090101230.6349-100000@gewi.kfunigraz.ac.at
обсуждение исходный текст
Ответ на Sigh, LIKE indexing is *still* broken in foreign locales  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Sigh, LIKE indexing is *still* broken in foreign locales  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Wed, 7 Jun 2000, Tom Lane wrote:

> Moucha Václav <MouchaV@Radiomobil.cz> writes:
> > 1. Compilation
> >    ./configure --enable-locale    # not needed for RPMS precompiled binaries 
> 
> > 2. Starting postmaster
> >    export LC_CTYPE=cs_CZ
> >    export LC_COLLATE=cs_CZ        # this setting is important for the
> > bug result
> >    postmaster -S -D /home/pgsql/data -o '-Fe'    
> 
> > 3. SQL steps
> >    create table test (name text);
> >    insert into test values ('á');    # the first char is E1 from LATIN 2
> > coding
> >    insert into test values ('áb');
> >    create index test_index on test (name);
> >    set cpu_tuple_cost=1;        # force backend to use index
> > scanning
> >    select * from test where name like 'á%';
> 
> > BUG: Only 1 line is selected with 'á' only instead of both lines.
> 
> The problem here is that given the search pattern '\341%', the planner
> generates index limit conditions
>     name >= '\341' AND name < '\342';
> 
> Apparently, in CZ locale it is true that '\341' is less than '\342',
> but it does not follow from that that all strings starting with '\341'
> are less than '\342'.  In fact '\341b' is considered greater than '\342'.
> 

Hm. The character that follows 0xe1 in iso-8859-2 order is
"a + circumflex" (Oxe2) which is - as far as I know - not
part of the Czech alphapet. The successors of  0xe1 in
Czech collation order (code points from iso-8859-2)
are 0x042 (capital B) and 0x062 (small B).

=> name >= '0xe1' AND (name < '0x062' OR name < '0x042')
provided comparision is done by strcoll().

Another interresting feature of Czech collation is:

H < "CH" < I

and:

B < C < C + CARON < D .. < H < "CH" < I

So what happens with "WHERE name like 'Czec%`" ?

Regards
Erich




В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Hiroshi Inoue"
Дата:
Сообщение: RE: DROP COLUMN status
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: Sigh, LIKE indexing is *still* broken in foreign locales