Re: [HACKERS] indexable and locale

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: [HACKERS] indexable and locale
Дата	16 октября 1999 г. 16:33:01
Msg-id	5492.940095061@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Re: [HACKERS] indexable and locale (Tatsuo Ishii <t-ishii@sra.co.jp>)
Ответы	Re: [HACKERS] indexable and locale Re: [HACKERS] indexable and locale
Список	pgsql-hackers

Дерево обсуждения

Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>> Attached is a patch to the old problem discussed feverly before 6.5.

> ... I think your pacthes break
> non-ascii multi-byte character sets data and should be surrounded by
> #ifdef LOCALE rather than replacing current codes surrounded by
> #ifndef LOCALE.

I am worried about this patch too.  Under MULTIBYTE could it
generate invalid characters?  Also, do all non-ASCII locales sort
codes 0-126 in the same order as ASCII?  I didn't think they do,
but I'm not an expert.

The approach I was considering for fixing the problem was to use a
loop that would repeatedly try to generate a string greater than the
prefix string.  The basic loop step would increment the rightmost
byte as Goran has done (or, if it's already up to the limit, chop
it off and increment the next character position).  Then test to
see whether the '<' operator actually believes the result is
greater than the given prefix, and repeat if not.  This avoids making
any strong assumptions about the sort order of different character
codes.  However, there are two significant issues that would have
to be surmounted to make it work reliably:

1. In MULTIBYTE mode incrementing the rightmost byte might yield
an illegal multibyte character.  Some way to prevent or detect this
would be needed, lest it confuse the comparison operator.  I think
we have some multibyte routines that could be used to check for
a valid result, but I haven't looked into it.

2. I think there are some locales out there that have context-
sensitive sorting rules, ie, a given character string may sort
differently than you'd expect from considering the characters in
isolation.  For example, in German isn't "ss" treated specially?
If "pqrss" does not sort between "pqrs" and "pqrt" then the entire
premise of *both* sides of the LIKE optimization falls apart,
because you can't be sure what will happen when comparing a prefix
string like "pqrs" against longer strings from the database.
I do not know if this is really a problem, nor what we could do
to avoid it if it is.
        regards, tom lane

В списке pgsql-hackers по дате отправления:

Предыдущее

От: "Oliver Elphick"
Дата: 16 октября 1999 г., 15:28:00
Сообщение: Re: [HACKERS] to_char(), md5() (long)

Следующее

От: Tom Lane
Дата: 16 октября 1999 г., 19:30:03
Сообщение: Re: [HACKERS] sort on huge table

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] indexable and locale

Предыдущее

Следующее