Re: Mixing different LC_COLLATE and database encodings

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: Mixing different LC_COLLATE and database encodings
Дата
Msg-id 8764nc13ls.fsf@stark.xeocode.com
обсуждение исходный текст
Ответ на Re: Mixing different LC_COLLATE and database encodings  (Bill Moseley <moseley@hank.org>)
Ответы Re: Mixing different LC_COLLATE and database encodings
Список pgsql-general
Bill Moseley <moseley@hank.org> writes:

>     $ LC_ALL=en_US.UTF-8 locale charmap
>     UTF-8
>
>     $ LC_ALL=en_US locale charmap
>     ISO-8859-1
>
>     $ LC_ALL=C locale charmap
>     ANSI_X3.4-1968

Unfortunately Postgres only supports a single collation cluster-wide. So
depending on which collation you use of the ones above you would really have
to select either UTF-8 ISO-8859-1 or SQL_ASCII (ie ANSI_X3.4-1968). Anything
else and the collation just won't work properly. It will be expecting UTF-8
and be fed ISO-8859-1 strings, resulting in weird and sometimes inconsistent
sort orders.

There's a certain amount of feeling that using any locale other than C is
probably not ever the right thing given the current functionality. Just about
any database has some strings in it that are really just ascii strings like
char(1) primary keys and other internal database strings. You may not want
them being subject to the locale's collation for comparison purposes and you
may not want the overhead of variable width character encodings.

Those of us in this camp are defining all our databases using C locale and
then using the pg_strxfrm() function that's been floating around the list for
a while to handle sorting strings that need to be sorted in various locales.
This has performs acceptably (but not spectacularly) under glibc but it's not
clear which other libc implementations it works well under.

It also doesn't solve the whole problem since functions like substr() or LIKE
are locale sensitive too. If you need an encoding like UTF-8 and you're stuck
either pushing all your string manipulations into the client or going ahead
with a non-C locale and UTF-8 even for the strings that are really just ascii
strings.

--
greg

В списке pgsql-general по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: Mixing different LC_COLLATE and database encodings
Следующее
От: Bill Moseley
Дата:
Сообщение: Re: Mixing different LC_COLLATE and database encodings