Обсуждение: Changing DB Encodings

Поиск
Список
Период
Сортировка

Changing DB Encodings

От
Naz Gassiep
Дата:
Why have I been told that I need to do a re initdb to change the char
encoding? The man says i can just     createdb foodb -E utf8    so why
would i need to dump/initdb/create/restore? cant i just dump/create/restore?

It'd save all the messing around with changing the data dirs etc.


Re: Changing DB Encodings

От
Tom Lane
Дата:
Naz Gassiep <naz@mira.net> writes:
> Why have I been told that I need to do a re initdb to change the char
> encoding? The man says i can just     createdb foodb -E utf8    so why
> would i need to dump/initdb/create/restore? cant i just dump/create/restore?

Are you clear on the difference between encoding and locale?

You can make new databases with whatever encoding you say, but the
server's lc_collate and lc_ctype are frozen at initdb, and it will
not work well to select an encoding that is incompatible with the
locale setting.  In practice this means that you can only use multiple
database encodings when you initdb'd in "C" locale; all other locale
settings imply a particular encoding.

Yes, this is pretty annoying.  No, it's not easy to fix.

            regards, tom lane

Re: Changing DB Encodings

От
Naz Gassiep
Дата:
Tom Lane wrote:
> Are you clear on the difference between encoding and locale?
>

I confidently reply with "maybe".

> You can make new databases with whatever encoding you say, but the
> server's lc_collate and lc_ctype are frozen at initdb, and it will
> not work well to select an encoding that is incompatible with the
> locale setting.  In practice this means that you can only use multiple
> database encodings when you initdb'd in "C" locale; all other locale
> settings imply a particular encoding.
>
> Yes, this is pretty annoying.  No, it's not easy to fix.
>

OK I understand now, I think. How do I check what locale is currently
set? I've successfully created a utf8 database, does that imply that
because I was able to create a DB with a different encoding to the ones
all the others use (SQL_ASCII) that my locale is set to "C" ?

Regards,
- Naz.

Re: Changing DB Encodings

От
Tom Lane
Дата:
Naz Gassiep <naz@mira.net> writes:
> OK I understand now, I think. How do I check what locale is currently
> set?

"show lc_collate" (maybe check the other lc_ settings too for luck).

> I've successfully created a utf8 database, does that imply that
> because I was able to create a DB with a different encoding to the ones
> all the others use (SQL_ASCII) that my locale is set to "C" ?

No, that implies a lack of error checking.

            regards, tom lane

Re: Changing DB Encodings

От
Naz Gassiep
Дата:
Tom Lane wrote:
>> I've successfully created a utf8 database, does that imply that
>> because I was able to create a DB with a different encoding to the ones
>> all the others use (SQL_ASCII) that my locale is set to "C" ?
>>
>
> No, that implies a lack of error checking.

Surely, then, that's a bug? Shouldn't postmaster check if a DB is in an
encoding that differs from the selected locale?

lc_collate and lc_type both report C as the locale. Does that mean I can
safely bulk load a pg_dump into it?

When doing the dump, do I have to use --encoding=utf8 as a param?

Please excuse me if I sound ignorant of all of this, I'm currently
forced to address an issue that I don't yet feel I have a sufficient
grasp of.

Re: Changing DB Encodings

От
Tom Lane
Дата:
Naz Gassiep <naz@mira.net> writes:
> Tom Lane wrote:
>> No, that implies a lack of error checking.

> Surely, then, that's a bug? Shouldn't postmaster check if a DB is in an
> encoding that differs from the selected locale?

Yeah, it should.  Whether it can is a different question.

Part of the problem here is the lack of a reliable way to tell
*which* encoding is implied by a locale.  On some systems you
can get a poorly-standardized string name for the locale's encoding;
on others you can't get anything.

There's been some experimental code in initdb for awhile now that tries
to guess encoding from locale.  I have not heard reports of it failing
lately, so maybe we could promote it into a hard error check, or at
least a backend-side warning at CREATE DATABASE time.  It still won't
help on old systems without nl_langinfo(CODESET), though.  (But how
many of those are left?  That call is specified by the Single Unix
Spec.  Anybody know if it works on Windows?)

            regards, tom lane