Обсуждение: Re: dump/restore results in duplicate key violation with 7.4.6.

Поиск
Список
Период
Сортировка

Re: dump/restore results in duplicate key violation with 7.4.6.

От
Dan Libby
Дата:
Some more info...

danda wrote:

> So at this point I am mostly at a loss.  I would have thought that
> after changing the source DB to UNICODE encoding it should exhibit the
> same behavior as the target.  I can think of two explanations:
>
> 1: initdb does something with the encoding beyond setting
> pg_database(encoding).
>
> 2: there is a bug in 7.4.6 that does not exist in 7.4.3
>
> I suppose the next step is to create a new DB in 7.4.3 using UNICODE
> and attempt to import the data in the same manner.  But right now I
> need a break.

Okay, I tried the following test using the same input file on both
machines.  All steps identical.

1) create database test_enc with encoding = 'UNICODE';

2) Create the schema for the 'category' table.

3) \i category_dump.sql

It works on 7.4.3, but I get the duplicate key violation on 7.4.6.

7.4.3 is running on gentoo and was built with emerge postgresql.

7.4.6 is running on Redhat 9 and was built with
./configure --prefix=/var/lib/pgsql --with-python --with-perl


I can probably come up with a clean test case using simplified data.
I'm still hoping someone has the magical solution though...

Dan Libby

Re: dump/restore results in duplicate key violation with 7.4.6.

От
Dan Libby
Дата:
Update.

After Tom mentioned that my issue might be locale related I ran
pg_controldata on both servers.

On gentoo, LC_COLLATE and LC_CTYPE are set to C locale.  On Redhat they
are set to en_US.UTF-8.

I re-ran initdb on Redhat with the --locale=C param, and performed the
import again.  This time all data imported correctly.

That is great, as it enables me to move forward, but there's still a
couple open questions:

1) I don't understand why a difference in locale should cause a
duplicate key error, especially when both databases were created using
'UNICODE' encoding.  Is this valid behavior or a postgres bug?

2) According to the docs [1], locale is set at initdb time.  The redhat
machine is a production server and has other databases running for other
applications.  I could do a dump of all data, then initdb, then import
data, but it occurs to me that I might run into a similar "duplicate
key" error (or other import strangeness) in one of the other databases.
Can anyone shed more light on the implications of moving data from
en_US.UTF-8 locale to C locale?

regards,

Dan Libby


[1] http://www.postgresql.org/docs/7.4/static/charset.html

Re: dump/restore results in duplicate key violation with 7.4.6.

От
Tom Lane
Дата:
Dan Libby <dan@libby.com> writes:
> 1) I don't understand why a difference in locale should cause a
> duplicate key error, especially when both databases were created using
> 'UNICODE' encoding.  Is this valid behavior or a postgres bug?

The past reports I've seen of such misbehavior appeared to result
from strcoll() failing to cope well with multibyte sequences that
were illegal according to the selected locale's idea of the character
set in use.  You can get burnt by this quite easily if you set the
database encoding to something not compatible with the locale.
If you didn't make that mistake, then I'd bet on Postgres having a
more liberal idea of what are valid characters in the encoding than
the locale definition does.  Whether that is Postgres' bug or the
locale definition's is impossible to say without more data.

            regards, tom lane