Обсуждение: Different encodings in different DBs in same cluster

Поиск
Список
Период
Сортировка

Different encodings in different DBs in same cluster

От
Jamie Lawrence
Дата:
Hi All, 

I was going through the docs for Postgres 8 for info on setting the
character set (to UTF8). In the docs here:

http://www.postgresql.org/docs/8.0/interactive/multibyte.html#MULTIBYTE-CHARSET-SUPPORTED

I see:
  Since these locale settings are frozen by initdb, the apparent  flexibility to use different encodings in different
databasesof a  cluster is more theoretical than real. It is likely that these  mechanisms will be revisited in future
versionsof PostgreSQL.   One way to use multiple encodings safely is to set the locale  to C or POSIX during initdb,
thusdisabling any real locale  awareness. 
 

Does anyone know what "more theoretical than real" mean in this context?
If I set the locale to C, is it going to work correctly with UTF8
encoded data?

Thanks,

-j

-- 
Jamie Lawrence                                        jal@jal.org
It's strange to hear people like Gordon Liddy talking about morality. 
He hasn't been out of jail all that long.  - Ben Bradlee



Re: Different encodings in different DBs in same cluster

От
Tom Lane
Дата:
Jamie Lawrence <jal@jal.org> writes:
> I see:

>    Since these locale settings are frozen by initdb, the apparent
>    flexibility to use different encodings in different databases of a
>    cluster is more theoretical than real.

> Does anyone know what "more theoretical than real" mean in this context?

It means there are some locales that actively fail (you get inconsistent
comparison and sorting behavior) when presented with multibyte data that
doesn't match their encoding expectations.  IMHO such locale definitions
are broken and should be fixed, but they are not under our control.

> If I set the locale to C, is it going to work correctly with UTF8
> encoded data?

C will work "correctly" for suitably small values of "correctly" ---
non-ASCII characters may not sort where you'd wish, and it won't know
anything about case-folding for non-ASCII characters.  But it will at
least give consistent results.

When you use a non-C locale, it's best to stick to the encoding that
the locale expects.
        regards, tom lane