Обсуждение: Problems importing Unicode

Поиск
Список
Период
Сортировка

Problems importing Unicode

От
matthias@cmklein.de
Дата:
I have batch files with entries such as

INSERT INTO country VALUES (248,'ALA','AX','Åland Islands');
INSERT INTO country VALUES (384,'CIV','CI','Côte d\'Ivoire');

I tried to execute them using "pgsql \i filename.sql"

Unfortunately, I keep getting an error message:
"ERROR:  invalid byte sequence for encoding "UNICODE": 0xc56c"

How can that be possible?
My database is set to encoding "UNICODE" and so are the batchfiles.

Why does that not work?

Thanks

Matt


Re: Problems importing Unicode

От
Tatsuo Ishii
Дата:
> I have batch files with entries such as
>
> INSERT INTO country VALUES (248,'ALA','AX','Åland Islands');
> INSERT INTO country VALUES (384,'CIV','CI','Côte d\'Ivoire');
>
> I tried to execute them using "pgsql \i filename.sql"
>
> Unfortunately, I keep getting an error message:
> "ERROR:  invalid byte sequence for encoding "UNICODE": 0xc56c"
>
> How can that be possible?
> My database is set to encoding "UNICODE" and so are the batchfiles.
>
> Why does that not work?

I bet your batch file is not encoded in UNICODE (UTF-8).
--
Tatsuo Ishii

Re: Problems importing Unicode

От
matthias@cmklein.de
Дата:
Well, they were generated by MySQL and I can open them with e.g. the
Windows Editor Notepad. But I don't know if they are actually encoded in
UNICODE.
Since I can open the file with Notepad and read the statements, I assume,
it is not UNICODE. They look just like in the email below.

The problem are apparently those characters Å or ô and I really would like
to know how to import those files into PostgreSQL 8.0.0

Is there a switch I can use to do a codepage / encoding translation?

Why are MS Access or even MySQL able to read those files without trouble
but PostgreSQL reports an error?

Thanks

Matt



--- Ursprüngliche Nachricht ---
Datum: 17.11.2004 02:25
Von: Tatsuo Ishii <t-ishii@sra.co.jp>
An: matthias@cmklein.de
Betreff: Re: [GENERAL] Problems importing Unicode

> > I have batch files with entries such as
> >
> > INSERT INTO country VALUES (248,'ALA','AX','Åland Islands');
> > INSERT INTO country VALUES (384,'CIV','CI','Côte d\'Ivoire');
> >
> > I tried to execute them using "pgsql \i filename.sql"
> >
> > Unfortunately, I keep getting an error message:
> > "ERROR:  invalid byte sequence for encoding "UNICODE": 0xc56c"
> >
> > How can that be possible?
> > My database is set to encoding "UNICODE" and so are the batchfiles.
> >
> > Why does that not work?
>
> I bet your batch file is not encoded in UNICODE (UTF-8).
> --
> Tatsuo Ishii
>


Re: Problems importing Unicode

От
Richard Huxton
Дата:
matthias@cmklein.de wrote:
> Well, they were generated by MySQL and I can open them with e.g. the
> Windows Editor Notepad. But I don't know if they are actually encoded in
> UNICODE.
> Since I can open the file with Notepad and read the statements, I assume,
> it is not UNICODE. They look just like in the email below.

Probably some WINxxx encoding. I've seen something similar with data
from MS-Access.

> The problem are apparently those characters Å or ô and I really would like
> to know how to import those files into PostgreSQL 8.0.0
>
> Is there a switch I can use to do a codepage / encoding translation?
>
> Why are MS Access or even MySQL able to read those files without trouble
> but PostgreSQL reports an error?

Because they're using the same WIN locale details. What you might want
to try is to set your client encoding at the top of the batch file and
see if PostgreSQL can't convert it for you.

SET CLIENT_ENCODING = WIN1250;

There's a list of encodings PG can convert for you in the manual (see
the chapter "Automatic Character Set Conversion Between Server and
Client" in the Localization section.

--
   Richard Huxton
   Archonet Ltd

Re: Problems importing Unicode

От
"Magnus Hagander"
Дата:
> Well, they were generated by MySQL and I can open them with
> e.g. the Windows Editor Notepad. But I don't know if they are
> actually encoded in UNICODE.
> Since I can open the file with Notepad and read the
> statements, I assume, it is not UNICODE. They look just like
> in the email below.

Windows Notepad handles Unicode just fine, both UTF-16 (labeled Unicode
in notepad) and UTF-8 (labeled UTF-8).
To test, open the file in Notepad, then do "File->Save As". The
"Encoding" dropdown box will default to whatever Notepad detected when
it opened the file. If it's UTF-16 and you need UTF-8, just change the
encoding and save under a different name.

//Magnus