Обсуждение: Latin1 to UTF-8 ?

Поиск
Список
Период
Сортировка

Latin1 to UTF-8 ?

От
Aarni Ruuhimäki
Дата:
Hi,

I've set up a new CentOs server with PostgreSQL 8.2.4 and initdb'ed it with
UTF-8.

Ok, and runs fine.

I have a problem with encodings, however. And mainly with the russian cyrillic
characters.

When I testdumped some dbs from the old FC / Pg 8.0.2, all Latin1, I noticed
that some of the dumps show in the Konqueror file browser as 'Plain Text
Documents' and some as 'C++ Source Files'. Both have Latin1 as client
encoding at the top of the files. Changing that gives errors, as expected.

Looking in to the plain text dumps I see all cyrillic characters as Р...
and these go in display fine from the new server's UTF-8 environment.

Some of the 'C++' files have the cyrillics as 'îñåòèòåëåé'. Some have both
'îñåòèòåëåé' and Р... and ofcourse the 'îñåò' characters come out wrong
and unreadable to the browser. (not sure if you an see single quoted ones,
but they look something like hebrew or similar)

I have no idea what browsers / encodings or even keyboard layouts have been
used when the data has been inserted by users through their web
interfaces ...

I tried the -F p switch as the earlier version has no -E for dumps. Same
output. Also with pg_dumpall.

I tried various encodings with iconv too.

So, what would be the proper way to convert the dumps to UTF-8 ? Or any other
solution ? Any other tool to work with the problem files ?

BR,

Aarni
--
Aarni Ruuhimäki


Re: Latin1 to UTF-8 ?

От
Peter Eisentraut
Дата:
Aarni Ruuhimäki wrote:
> So, what would be the proper way to convert the dumps to UTF-8 ? Or
> any other solution ? Any other tool to work with the problem files ?

Dump them again but set your client encoding to UTF8.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/