8.0, UTF8, and CLIENT_ENCODING

Поиск

Список

Период

Сортировка

От	Paul Ramsey
Тема	8.0, UTF8, and CLIENT_ENCODING
Дата	17 мая 2007 г. 20:57:28
Msg-id	464CC160.4080401@refractions.net обсуждение исходный текст
Ответы	Re: 8.0, UTF8, and CLIENT_ENCODING (Hannes Dorbath <light@theendofthetunnel.de>) Re: 8.0, UTF8, and CLIENT_ENCODING (PFC <lists@peufeu.com>)
Список	pgsql-general

Дерево обсуждения

I have a small database (PgSQL 8.0, database encoding UTF8) that folks
are inserting into via a web form. The form itself is declared
ISO-8859-1 and the prior to inserting any data, pg_client_encoding is
set to LATIN1.

Most of the high-bit characters are correctly translated from LATIN1 to
UTF8. So for e-accent-egu I see the two-byte UTF8 value in the database.

Sometimes, in their wisdom, people cut'n'paste information out of MSWord
and put that in the form. Instead of being mapped to 2-byte UTF8
high-bit equivalents, they are going into the database directly as
one-byte values > 127. That is, as illegal UTF8 values.

When I try to dump'n'restore this database into PgSQL 8.2, my data can't
made the transit.

Firstly, is this "kinda sorta" encoding handling expected in 8.0, or did
I do something wrong?

Secondly, anyone know any useful tools to pipe a stream through to strip
out illegal UTF8 bytes, so I can pipe my dump through that rather than
hand editing it?

Thanks,

Paul

--

   Paul Ramsey
   Refractions Research
   http://www.refractions.net
   pramsey@refractions.net
   Phone: 250-383-3022
   Cell: 250-885-0632

В списке pgsql-general по дате отправления:

Предыдущее

От: Ron Johnson
Дата: 17 мая 2007 г., 20:55:51
Сообщение: Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)

Следующее

От: Ben
Дата: 17 мая 2007 г., 21:01:16
Сообщение: Re: Large Database Restore

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

8.0, UTF8, and CLIENT_ENCODING

Предыдущее

Следующее