Обсуждение: Converting from SQL_ASCII to UTF8

Поиск
Список
Период
Сортировка

Converting from SQL_ASCII to UTF8

От
"Peter Koczan"
Дата:
Hi all,

I'd like to move my database encoding from SQL_ASCII to UTF8, mostly
because "No encoding conversion will be done when the setting is
SQL_ASCII. Thus, this setting is not so much a declaration that a
specific encoding is in use, as a declaration of ignorance about the
encoding." (from
http://www.postgresql.org/docs/current/static/multibyte.html) I saw a
few threads on the list regarding this before, for instance this one
(http://archives.postgresql.org/pgsql-admin/2004-01/msg00225.php) but
there's a specific issue that I'm having that wasn't addressed.

I have some UTF-8 data in my databases, and it's causing dump/restore
to fail. Specifically, I'm seeing messages like:
pg_restore: [archiver (db)] COPY failed: ERROR:  invalid byte sequence
for encoding "UTF8": 0xe14c65
HINT:  This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".
CONTEXT:  COPY applicants, line 282

Which happens even if I specify "-E UTF8" in the pg_dump command.

Here's the weirder part. If I just update the encoding by hand in
pg_database (as cautiously suggested by Tom Lane in the aforementioned
thread), it works. I doubt this will work in the general case, and I'd
like to at least offer this option for other people's databases.

I also tried using GNU recode (version 3.6) as suggested in similar
threads, but I got errors in both the plain and custom pg_dump
formats.

$ recode ascii..utf8 man.sql
recode: man.sql failed: Invalid input in step `ANSI_X3.4-1968..UTF-8'
$ recode ..utf8 man.sql
recode: man.sql failed: Invalid input in step `CHAR..UTF-8'

Any ideas?

Peter

Re: Converting from SQL_ASCII to UTF8

От
Tom Lane
Дата:
"Peter Koczan" <pjkoczan@gmail.com> writes:
> I have some UTF-8 data in my databases, and it's causing dump/restore
> to fail. Specifically, I'm seeing messages like:
> pg_restore: [archiver (db)] COPY failed: ERROR:  invalid byte sequence
> for encoding "UTF8": 0xe14c65

This is, in fact, not UTF8, no matter how much you'd like to think so.
It might possibly be LATIN1 or some other single-byte encoding.  What
you're going to need to do is figure out what you really have and
convert it all to a common encoding.

One of the disadvantages of SQL_ASCII mode is that it will let you wind
up with a mishmash of different encodings in your DB :-(

            regards, tom lane