Обсуждение: Problems importing Unicode
I have batch files with entries such as INSERT INTO country VALUES (248,'ALA','AX','Åland Islands'); INSERT INTO country VALUES (384,'CIV','CI','Côte d\'Ivoire'); I tried to execute them using "pgsql \i filename.sql" Unfortunately, I keep getting an error message: "ERROR: invalid byte sequence for encoding "UNICODE": 0xc56c" How can that be possible? My database is set to encoding "UNICODE" and so are the batchfiles. Why does that not work? Thanks Matt
> I have batch files with entries such as > > INSERT INTO country VALUES (248,'ALA','AX','Åland Islands'); > INSERT INTO country VALUES (384,'CIV','CI','Côte d\'Ivoire'); > > I tried to execute them using "pgsql \i filename.sql" > > Unfortunately, I keep getting an error message: > "ERROR: invalid byte sequence for encoding "UNICODE": 0xc56c" > > How can that be possible? > My database is set to encoding "UNICODE" and so are the batchfiles. > > Why does that not work? I bet your batch file is not encoded in UNICODE (UTF-8). -- Tatsuo Ishii
Well, they were generated by MySQL and I can open them with e.g. the Windows Editor Notepad. But I don't know if they are actually encoded in UNICODE. Since I can open the file with Notepad and read the statements, I assume, it is not UNICODE. They look just like in the email below. The problem are apparently those characters Å or ô and I really would like to know how to import those files into PostgreSQL 8.0.0 Is there a switch I can use to do a codepage / encoding translation? Why are MS Access or even MySQL able to read those files without trouble but PostgreSQL reports an error? Thanks Matt --- Ursprüngliche Nachricht --- Datum: 17.11.2004 02:25 Von: Tatsuo Ishii <t-ishii@sra.co.jp> An: matthias@cmklein.de Betreff: Re: [GENERAL] Problems importing Unicode > > I have batch files with entries such as > > > > INSERT INTO country VALUES (248,'ALA','AX','Åland Islands'); > > INSERT INTO country VALUES (384,'CIV','CI','Côte d\'Ivoire'); > > > > I tried to execute them using "pgsql \i filename.sql" > > > > Unfortunately, I keep getting an error message: > > "ERROR: invalid byte sequence for encoding "UNICODE": 0xc56c" > > > > How can that be possible? > > My database is set to encoding "UNICODE" and so are the batchfiles. > > > > Why does that not work? > > I bet your batch file is not encoded in UNICODE (UTF-8). > -- > Tatsuo Ishii >
matthias@cmklein.de wrote: > Well, they were generated by MySQL and I can open them with e.g. the > Windows Editor Notepad. But I don't know if they are actually encoded in > UNICODE. > Since I can open the file with Notepad and read the statements, I assume, > it is not UNICODE. They look just like in the email below. Probably some WINxxx encoding. I've seen something similar with data from MS-Access. > The problem are apparently those characters Å or ô and I really would like > to know how to import those files into PostgreSQL 8.0.0 > > Is there a switch I can use to do a codepage / encoding translation? > > Why are MS Access or even MySQL able to read those files without trouble > but PostgreSQL reports an error? Because they're using the same WIN locale details. What you might want to try is to set your client encoding at the top of the batch file and see if PostgreSQL can't convert it for you. SET CLIENT_ENCODING = WIN1250; There's a list of encodings PG can convert for you in the manual (see the chapter "Automatic Character Set Conversion Between Server and Client" in the Localization section. -- Richard Huxton Archonet Ltd
> Well, they were generated by MySQL and I can open them with > e.g. the Windows Editor Notepad. But I don't know if they are > actually encoded in UNICODE. > Since I can open the file with Notepad and read the > statements, I assume, it is not UNICODE. They look just like > in the email below. Windows Notepad handles Unicode just fine, both UTF-16 (labeled Unicode in notepad) and UTF-8 (labeled UTF-8). To test, open the file in Notepad, then do "File->Save As". The "Encoding" dropdown box will default to whatever Notepad detected when it opened the file. If it's UTF-16 and you need UTF-8, just change the encoding and save under a different name. //Magnus