Обсуждение: COPY-ing ASCII file into UTF-8 database

Поиск
Список
Период
Сортировка

COPY-ing ASCII file into UTF-8 database

От
Toomas Vendelin
Дата:
THE ISSUE:
I need to import a text file (ASCII) dumped from MySQL 3.23 database
into Postgre table (UTF-8). The file contains "specifically Estonian
characters" Õ, Ä, Ö, Ü. My attempt to COPY this file directly
failed. Postgre said:

ERROR:  invalid byte sequence for encoding "UTF8": 0xf5706509
HINT:  This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".

OK... The file isn't big, and there are just 4 characters causing
problems, so I substitued each of them with a distinctive pattern of
ASCII characters and successfully COPY-ied the file into Postgre
table. With PgAdmin I've changed 1 record back to Estonian – worked
fine. I then exported this table into a file, opened the file in
UltraEdit (the status bar showed it's a UTF-8 indeed) and changed a
few temporary substitutions with "specifically Estonian characters".
Then I cleared the table and tried to COPY data from my now UTF-8 file
again. I've got an error:

ERROR:  invalid input syntax for integer: "2"
CONTEXT:  COPY school, line 1, column sch_id: "2"

PgAdmin actually displayed it with something like chopped hyphen,
which I've failed to copy into this message and which wasn't visible
in the text editor even with "Show hidden characters".

MY QUESTIONS:

1. What is the proper solution for importing data from plain ASCII
files, that contain the type of characters mentioned above?

2. I'd like to fill the gaps (huge, obviously) in my knowlege about
handling different charsets. Can you point me to some sane
resource(s)? I'm sure, there's a lot of them. Sorry for slipping off
the topic here.

Regards,

Toomas




Re: COPY-ing ASCII file into UTF-8 database

От
Tommy Gildseth
Дата:
Toomas Vendelin wrote:
>
> MY QUESTIONS:
>
> 1. What is the proper solution for importing data from plain ASCII
> files, that contain the type of characters mentioned above?

Set the correct client_encoding, before importing. Presumably the
Estonian characters are part of one of the flavours of LATIN 1-9

 > SET CLIENT_ENCODING='LATIN<X>'; -- Where you substitute <X> for the
correct number
 > \copy ....

--
Tommy Gildseth

Re: COPY-ing ASCII file into UTF-8 database

От
Emanuel Calvo Franco
Дата:
2009/4/6 Toomas Vendelin <pg@vendelin.com>:
> THE ISSUE:
> I need to import a text file (ASCII) dumped from MySQL 3.23 database into
> Postgre table (UTF-8). The file contains "specifically Estonian characters"
> Õ, Ä, Ö, Ü. My attempt to COPY this file directly failed. Postgre said:
>
> ERROR:  invalid byte sequence for encoding "UTF8": 0xf5706509
> HINT:  This error can also happen if the byte sequence does not match the
> encoding expected by the server, which is controlled by "client_encoding".
>
> OK... The file isn't big, and there are just 4 characters causing problems,
> so I substitued each of them with a distinctive pattern of ASCII characters
> and successfully COPY-ied the file into Postgre table. With PgAdmin I've
> changed 1 record back to Estonian – worked fine. I then exported this table
> into a file, opened the file in UltraEdit (the status bar showed it's a
> UTF-8 indeed) and changed a few temporary substitutions with "specifically
> Estonian characters". Then I cleared the table and tried to COPY data from
> my now UTF-8 file again. I've got an error:
>
> ERROR:  invalid input syntax for integer: "2"
> CONTEXT:  COPY school, line 1, column sch_id: "2"
>
> PgAdmin actually displayed it with something like chopped hyphen, which I've
> failed to copy into this message and which wasn't visible in the text editor
> even with "Show hidden characters".
>
> MY QUESTIONS:
>
> 1. What is the proper solution for importing data from plain ASCII files,
> that contain the type of characters mentioned above?
>
> 2. I'd like to fill the gaps (huge, obviously) in my knowlege about handling
> different charsets. Can you point me to some sane resource(s)? I'm sure,
> there's a lot of them. Sorry for slipping off the topic here.
>
> Regards,
>
> Toomas
>
>

In adition of set the local encoding (Client and Server), if you are
runnning +nix there is a command called iconv that let you convert
from one charset to another. Always i used it, i never had any
problems.


--
      Emanuel Calvo Franco
        Sumate al ARPUG !
      (www.postgres-arg.org -
         www.arpug.com.ar)
    ArPUG / AOSUG Member
   Postgresql Support & Admin