Re: multibyte-character aware support for function "downcase_truncate_identifier()"

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: multibyte-character aware support for function "downcase_truncate_identifier()"
Дата
Msg-id 11120.1290532369@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: multibyte-character aware support for function "downcase_truncate_identifier()"  (Greg Stark <gsstark@mit.edu>)
Ответы Re: multibyte-character aware support for function "downcase_truncate_identifier()"  (Greg Stark <gsstark@mit.edu>)
Список pgsql-hackers
Greg Stark <gsstark@mit.edu> writes:
> On Mon, Nov 22, 2010 at 12:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Well, that's why there's been no movement on this since 2004 :-(. �The
>> amount of work needed for a better solution seems far out of proportion
>> to the benefits.

> We could extend the existing logic to handle multi-bytes characters
> though, couldn't we? It's not going to fix all the problems but at
> least it'll do something sane.

Not easily, cheaply, or portably.  The closest you could get in that
line would be to use towlower(), which doesn't exist everywhere
(though I grant probably most platforms have it by now).  The much much
bigger problem though is that we don't know what character representation
towlower() deals in.  We recently kluged the regex code to assume that
the wchar_t representation for UTF8 locales is the standardized Unicode
code point.  I haven't heard of that breaking, but 9.0 hasn't been out
that long.  In other multibyte encodings we have no idea how to use that
function, short of invoking mbstowcs/wcstombs or local equivalent, which
is expensive and doesn't readily allow a short-circuit for ASCII.

And, after you've hacked your way through all that, you still end up
with case-folding behavior that depends on the prevailing locale.
Which is dangerous for the previously cited reasons, and arguably not
spec-compliant.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stefan Kaltenbrunner
Дата:
Сообщение: NLS builds on windows and lc_messages
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: multibyte-character aware support for function "downcase_truncate_identifier()"