Re: UTF8 frustrations

Поиск
Список
Период
Сортировка
От Albe Laurenz
Тема Re: UTF8 frustrations
Дата
Msg-id D960CB61B694CF459DCFB4B0128514C2297F8E@exadv11.host.magwien.gv.at
обсуждение исходный текст
Ответ на UTF8 frustrations  (jesse.waters@gmail.com)
Список pgsql-general
jesse.waters@gmail.com wrote:
>
> Here is the latest issue, to verify that the pg_dump works, I'm going
> to do dump and restore on the same host/cluster.
>
> Source:
>  DB_source:
>  Red Hat Enterprise Linux AS release 4 (Nahant Update 4)
>  psql 8.2.4
> Destination:
>  same machine different db name
>
> echo $LANG
> en_US.UTF-8
>
> SET client_encoding = 'UTF8';
>
> Command used:
>   pg_dump -Fc srcdb > db.dump
>   pg_restore -d devdb db.dump
>
> Results, same error. Now I'm really concerned.
>
> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 1625; 0 16680 TABLE
DATA logs watersj
> pg_restore: [archiver (db)] COPY failed: ERROR:  invalid byte sequence
for encoding "UTF8": 0xdf69
> HINT:  This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding
> CONTEXT:  COPY logs, line 69238382
> WARNING: errors ignored on restore: 1

I can recreate this behaviour with 8.2.4 (UTF-8).

psql> CREATE TABLE test (id serial PRIMARY KEY, val text);

psql> INSERT INTO test (val) VALUES (E'\xdf\x69');

psql> \q

$ pg_dump -F c -f x.dmp -t test testdb

$ pg_restore -c -d testdb x.dmp

pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 1840; 0 45883 TABLE
DATA test laurenz
pg_restore: [archiver (db)] COPY failed: ERROR:  invalid byte sequence
for encoding "UTF8": 0xdf69
HINT:  This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".
CONTEXT:  COPY test, line 1
WARNING: errors ignored on restore: 1

The problem is that in (at least) one record in your table
watersj.logs, there is a corrupt string.

Unfortunately (as demonstrated above) it is possible to enter
corrupt data into a PostgreSQL database, this is what must have
happened in your case.

I suggest that you identify and correct this string in the original
database, then everything should work fine.

You can extract the offending row from the dump, that should
help to identify it. 69238382 rows is a little unwieldy, but
tools like awk can help:

pg_restore db.dump | awk '/^COPY logs /,/^\\\.$/ { if (lineno==69238382)
print $0; ++lineno }'

I think there is the desire to fix problems like this in 8.3,
but I don't think that's done yet.

Yours,
Laurenz Albe

В списке pgsql-general по дате отправления:

Предыдущее
От: Ow Mun Heng
Дата:
Сообщение: Re: Need suggestion on how best to update 3 million rows
Следующее
От: Richard Huxton
Дата:
Сообщение: Re: Need suggestion on how best to update 3 million rows