Re: BUG #13785: Postgresql encoding screw-up

Поиск
Список
Период
Сортировка
От Feike Steenbergen
Тема Re: BUG #13785: Postgresql encoding screw-up
Дата
Msg-id CAK_s-G21eiMoKdKeBM42Zgr5-LC7mZ14FCPJ9gPxqe0d8kw+hw@mail.gmail.com
обсуждение исходный текст
Ответ на BUG #13785: Postgresql encoding screw-up  (ntpt@seznam.cz)
Список pgsql-bugs
Hi,

> there is a major design flaw or bug

I feel your pain, but how is this a bug? Once the character that cannot be
mapped to latin2 is stored, there is no information about the
source-encoding
(win1250) of this character available anymore. Any client connecting
(whether your application or pg_dump) will get that character "as is".

I don't see a way around solving this in general, other than rejecting
characters that do not fit in the target character set

> where client use multiple encodings that have more characters then
database
> encoding, the database is screwed forever

The allowed conversions from LATIN2 to other encodings is quite
limited (MULE_INTERNAL, UTF8, WIN1250), , see:
see: http://www.postgresql.org/docs/9.4/static/multibyte.html#AEN35768:

If the clients using different encodings all touch the same data, the data
is already dirty. The migration is only bringing it to light then.

If the clients all touch different parts of the data, the data can be
safely migrated by exporting distinct parts of data in its correct encoding
and then importing it with that encoding in the the target database with
UTF8 encoding.

> I thik that safe practice would be: Pg_dum with -E as used by client
> applicaton  and then restore to newly created utf8 database . It should
 be
> mentioned as safe way in the doc, at least

This looks safe to me, you export unknown characters data into its original
encoding thereby making them known again. If you now import this into UTF8
it
will be encoded correctly, because both the source (WIN1250) as the target
(UTF8) can encode these character.

regards,

Feike Steenbergen

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Mark Kirkwood
Дата:
Сообщение: Re: Recovery conflict message lost in user session for 9.3
Следующее
От: txie@incognito.com
Дата:
Сообщение: BUG #13786: ODBC driver doesn't work to connect to database