Обсуждение: 8.3.0, locales, and encodings
I got a small surprise when upgrading to 8.3.0 today (CentOS 4.4). When I went to restore my database, PSQL complained that my "CREATE DATABASE... ENCODING 'UTF8';" wasn't valid (the new phpBB 3.0 software enforces using UTF8). Sure enough, a bit of research showed that the supplied /etc/init.d/postgresql file had no encoding specified, which meant that it defaulted to my operating system's 'en_US.ISO-8859-1' encoding. A quick reading of the PostgreSQL documentation revealed that PostgreSQL can't really use an encoding different from that specified during INITDB (caveat below), and apparently that is now enforced in v8.3. OK, so I manually run INITDB w/ "-E UTF8", and now I can create/load one of my databases. Unfortunately, when I go to create another database (where the data is 'ISO-8859-1'), now IT won't create. A more detailed reading of the PostgreSQL documentation revealed that PostgreSQL can't use an encoding different from that specified during INITDB, UNLESS INITDB is run w/ "-locale POSIX". Ok, I rerun INITDB w/ "-locale POSIX -E UTF8", and now both databases create/load, apparently properly. I also did an INITDB on another database server, running INITDB w/ just "-locale POSIX" (apparently meaning that the "template1" encoding defaulted to "SQL_ASCII"), and successfully create/reload another database there (explicitly specifying an ENCODING there). Questions/suggestions: 1. The provided /etc/init.d/postgresql file seems to be part of this trap. Should it be changed to default the encoding to POSIX? 2. I want my installation to have the maximum flexibility. Should I be running INITDB w/ "-locale POSIX -E UTF8", or just "-locale POSIX" (implying "-E SQL_ASCII")? Or does it make a difference, since I always specify the desired database encoding when I "CREATE DATABASE... ENCODING '...';"? 3. One of my databases is ISO-8859-1, because that's the format of the data I get from the US gov't. However, should I instead use the 'UTF8' encoding for that database, and then use a client (PSQL) encoding of "LATIN1", and let PostgreSQL convert the data? Pros & cons? Performance is a concern, but disk space is not. I presume this ("admin") is the proper PostgreSQL list to post this to. If not, let me know. -- Dean Gibson
Hi Dean, On Wed, Feb 06, 2008 at 06:29:02PM -0800, Dean Gibson (DB Administrator) wrote: > 1. The provided /etc/init.d/postgresql file seems to be part of this > trap. Should it be changed to default the encoding to POSIX? > 2. I want my installation to have the maximum flexibility. Should I be > running INITDB w/ "-locale POSIX -E UTF8", or just "-locale POSIX" > (implying "-E SQL_ASCII")? Or does it make a difference, since I always > specify the desired database encoding when I "CREATE DATABASE... > ENCODING '...';"? AFAIK, it shouldn't make a difference. -E is just the default encoding for new databases. > 3. One of my databases is ISO-8859-1, because that's the format of the > data I get from the US gov't. However, should I instead use the 'UTF8' > encoding for that database, and then use a client (PSQL) encoding of > "LATIN1", and let PostgreSQL convert the data? Pros & cons? I'd store anything UTF8. The performance penalty should be very tiny. > I presume this ("admin") is the proper PostgreSQL list to post this to. Yes, it is. HTH, Tino. -- www.craniosacralzentrum.de www.spiritualdesign-chemnitz.de Tino Schwarze * Lortzingstraße 21 * 09119 Chemnitz
On 2008-02-07 01:17, Tino Schwarze wrote: > Hi Dean, > > On Wed, Feb 06, 2008 at 06:29:02PM -0800, Dean Gibson (DB Administrator) wrote: > >> 2. ... Should I be running INITDB w/ "-locale POSIX -E UTF8", or just "-locale POSIX" (implying "-E SQL_ASCII")? Ordoes it make a difference, since I always specify the desired database encoding when I "CREATE DATABASE... ENCODING '...';"? >> > > AFAIK, it shouldn't make a difference. -E is just the default encoding for new databases. > > Actually, "-E" also affects the encoding of the base PostgreSQL databases. W/o "-E": => \l List of databases Name | Owner | Encoding -----------+----------+----------- postgres | postgres | SQL_ASCII template0 | postgres | SQL_ASCII template1 | postgres | SQL_ASCII If I INITDB /w "-E UTF8": => \l List of databases Name | Owner | Encoding -----------+----------+---------- postgres | postgres | UTF8 template0 | postgres | UTF8 template1 | postgres | UTF8 So, the interesting question is, does the difference in the encoding of the base databases affect performance or capability in any way? -- Dean