Обсуждение: 8.3.0, locales, and encodings

Поиск
Список
Период
Сортировка

8.3.0, locales, and encodings

От
"Dean Gibson (DB Administrator)"
Дата:
I got a small surprise when upgrading to 8.3.0 today (CentOS 4.4).  When
I went to restore my database, PSQL complained that my "CREATE
DATABASE... ENCODING 'UTF8';" wasn't valid (the new phpBB 3.0 software
enforces using UTF8).  Sure enough, a bit of research showed that the
supplied /etc/init.d/postgresql file had no encoding specified, which
meant that it defaulted to my operating system's 'en_US.ISO-8859-1'
encoding.  A quick reading of the PostgreSQL documentation revealed that
PostgreSQL can't really use an encoding different from that specified
during INITDB (caveat below), and apparently that is now enforced in v8.3.

OK, so I manually run INITDB w/ "-E UTF8", and now I can create/load one
of my databases.  Unfortunately, when I go to create another database
(where the data is 'ISO-8859-1'), now IT won't create.  A more detailed
reading of the PostgreSQL documentation revealed that PostgreSQL can't
use an encoding different from that specified during INITDB, UNLESS
INITDB is run w/ "-locale POSIX".

Ok, I rerun INITDB w/ "-locale POSIX  -E UTF8", and now both databases
create/load, apparently properly.  I also did an INITDB on another
database server, running INITDB w/ just "-locale POSIX" (apparently
meaning that the "template1" encoding defaulted to "SQL_ASCII"), and
successfully create/reload another database there (explicitly specifying
an ENCODING there).

Questions/suggestions:

1. The provided /etc/init.d/postgresql file seems to be part of this
trap.  Should it be changed to default the encoding to POSIX?
2. I want my installation to have the maximum flexibility.  Should I be
running INITDB w/ "-locale POSIX  -E UTF8", or just "-locale POSIX"
(implying "-E SQL_ASCII")?  Or does it make a difference, since I always
specify the desired database encoding when I "CREATE DATABASE...
ENCODING '...';"?
3. One of my databases is ISO-8859-1, because that's the format of the
data I get from the US gov't.  However, should I instead use the 'UTF8'
encoding for that database, and then use a client (PSQL) encoding of
"LATIN1", and let PostgreSQL convert the data?  Pros & cons?

Performance is a concern, but disk space is not.

I presume this ("admin") is the proper PostgreSQL list to post this to.
If not, let me know.

-- Dean Gibson

Re: 8.3.0, locales, and encodings

От
Tino Schwarze
Дата:
Hi Dean,

On Wed, Feb 06, 2008 at 06:29:02PM -0800, Dean Gibson (DB Administrator) wrote:

> 1. The provided /etc/init.d/postgresql file seems to be part of this
> trap.  Should it be changed to default the encoding to POSIX?
> 2. I want my installation to have the maximum flexibility.  Should I be
> running INITDB w/ "-locale POSIX  -E UTF8", or just "-locale POSIX"
> (implying "-E SQL_ASCII")?  Or does it make a difference, since I always
> specify the desired database encoding when I "CREATE DATABASE...
> ENCODING '...';"?

AFAIK, it shouldn't make a difference. -E is just the default encoding
for new databases.

> 3. One of my databases is ISO-8859-1, because that's the format of the
> data I get from the US gov't.  However, should I instead use the 'UTF8'
> encoding for that database, and then use a client (PSQL) encoding of
> "LATIN1", and let PostgreSQL convert the data?  Pros & cons?

I'd store anything UTF8. The performance penalty should be very tiny.

> I presume this ("admin") is the proper PostgreSQL list to post this to.

Yes, it is.

HTH,

Tino.

--
www.craniosacralzentrum.de
www.spiritualdesign-chemnitz.de

Tino Schwarze * Lortzingstraße 21 * 09119 Chemnitz

Re: 8.3.0, locales, and encodings

От
"Dean Gibson (DB Administrator)"
Дата:
On 2008-02-07 01:17, Tino Schwarze wrote:
> Hi Dean,
>
> On Wed, Feb 06, 2008 at 06:29:02PM -0800, Dean Gibson (DB Administrator) wrote:
>
>> 2. ... Should I be running INITDB w/ "-locale POSIX  -E UTF8", or just "-locale POSIX" (implying "-E SQL_ASCII")?
Ordoes it make a difference, since I always specify the desired database encoding when I "CREATE DATABASE... ENCODING
'...';"?
>>
>
> AFAIK, it shouldn't make a difference. -E is just the default encoding for new databases.
>
>
Actually, "-E" also affects the encoding of the base PostgreSQL
databases.  W/o "-E":

=> \l
        List of databases
   Name    |  Owner   | Encoding
-----------+----------+-----------
 postgres  | postgres | SQL_ASCII
 template0 | postgres | SQL_ASCII
 template1 | postgres | SQL_ASCII

If I INITDB /w "-E UTF8":

=> \l
        List of databases
   Name    |  Owner   | Encoding
-----------+----------+----------
 postgres  | postgres | UTF8
 template0 | postgres | UTF8
 template1 | postgres | UTF8


So, the interesting question is, does the difference in the encoding of
the base databases affect performance or capability in any way?

-- Dean