Re: 8.0, UTF8, and CLIENT_ENCODING

Поиск
Список
Период
Сортировка
От Hannes Dorbath
Тема Re: 8.0, UTF8, and CLIENT_ENCODING
Дата
Msg-id 464CC60C.1040300@theendofthetunnel.de
обсуждение исходный текст
Ответ на 8.0, UTF8, and CLIENT_ENCODING  (Paul Ramsey <pramsey@refractions.net>)
Список pgsql-general
Paul Ramsey wrote:
> I have a small database (PgSQL 8.0, database encoding UTF8) that folks
> are inserting into via a web form. The form itself is declared
> ISO-8859-1 and the prior to inserting any data, pg_client_encoding is
> set to LATIN1.
>
> Most of the high-bit characters are correctly translated from LATIN1 to
> UTF8. So for e-accent-egu I see the two-byte UTF8 value in the database.
>
> Sometimes, in their wisdom, people cut'n'paste information out of MSWord
> and put that in the form. Instead of being mapped to 2-byte UTF8
> high-bit equivalents, they are going into the database directly as
> one-byte values > 127. That is, as illegal UTF8 values.
>
> When I try to dump'n'restore this database into PgSQL 8.2, my data can't
> made the transit.
>
> Firstly, is this "kinda sorta" encoding handling expected in 8.0, or did
> I do something wrong?
>
> Secondly, anyone know any useful tools to pipe a stream through to strip
> out illegal UTF8 bytes, so I can pipe my dump through that rather than
> hand editing it?

This is know issue, use

iconv -c -f UTF-8 -t UTF-8 -o cleanfile.sql dumpfile.sql

to convert your dumps. I'm not sure if this is fixed in the 8.0 branch
at all.


--
Best regards,
Hannes Dorbath

В списке pgsql-general по дате отправления:

Предыдущее
От: Ben
Дата:
Сообщение: Re: Large Database Restore
Следующее
От: Ron Johnson
Дата:
Сообщение: Re: Large Database Restore