Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

Поиск
Список
Период
Сортировка
От Hannu Krosing
Тема Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem
Дата
Msg-id 375EB323.799FCD63@trust.ee
обсуждение исходный текст
Ответ на Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Tom Lane wrote:
> 
> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> > This certainly explains it.  With locale enabled, LIKE does not use
> > indexes because we can't figure out how to do the indexing trick with
> > non-ASCII character sets because we can't figure out the maximum
> > character value for a particular encoding.
> 
> We don't actually need the *maximum* character value, what we need is
> to be able to generate a *slightly larger* character value.
> 
> For example, what the parser is doing now:
>         fld LIKE 'abc%' ==> fld <= 'abc\377'
> is not even really right in ASCII locale, because it will reject a
> data value like 'abc\377x'.
> 
> I think what we really want is to generate the "next value of the
> same length" and use a < comparison.  In ASCII locale this means
>         fld LIKE 'abc%' ==> fld < 'abd'
> which is reliable regardless of what comes after abc in the data.
> The trick is to figure out a "next" value without assuming a lot
> about the local character set and collation sequence.

in single-byte locales it should be easy:

1. sort a char[256] array from 0-255 using the current locale settings,do it once, either at startup or when first
needed.

2. use binary search on that array to locate the last char before %in this sorted array:if (it is not the last char in
sortedarray)then (replace that char with the one at index+1)else (  if (it is not the first char in like string)  then
(discardthe last char and goto 2.  else (don't do the end restriction))
 

some locales where the string is already sorted may use special 
treatment (ASCII, CYRILLIC) 

> But I am worried whether this trick will work in multibyte locales ---
> incrementing the last byte might generate an invalid character sequence
> and produce unpredictable results from strcmp.  So we need some help
> from someone who knows a lot about collation orders and multibyte
> character representations.

for double-byte locales something similar should work, but getting
the initial array is probably tricky

----------------
Hannu


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kaare Rasmussen
Дата:
Сообщение: Re: [HACKERS] Priorities for 6.6
Следующее
От: Hannu Krosing
Дата:
Сообщение: Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem