Обсуждение: Character encoding problems and dump import

Поиск
Список
Период
Сортировка

Character encoding problems and dump import

От
Дата:
Hello,

I have a dump (non-binary, if it matters) of a DB that has some characters in it that my DB doesn't want to take.
I'm using PG 8.0.3 and it was created with Unicode support:

=> \encoding
UNICODE

Characters that cause problems during the import are things like:
é and other characters from the Extended ASCII table (c.f. bottom of http://www.lookuptables.com/ )
Also:
'ÇѱÛÀÌ Á¦´ë·Î µÇ´ÂÁö ½ÇÇè... ¿©±â¿¡´Â ¾Æ¹« ¸µÅ©µµ ¾ø½À´Ï´Ù.±×³É ÀÌ ³»¿ë¹Û¿¡ ¾ø½À´Ï´Ù.'


The errors I get on import are of this type:
  ERROR:  invalid byte sequence for encoding "UNICODE": 0xdb20

The data may not be the cleanest, and I have limited control over that.
But I am wondering if there is any way I can import this data, even if that means converting some of the characters
intosomething else. 

Thanks,
Otis



Re: Character encoding problems and dump import

От
Ivo Rossacher
Дата:
Am Montag, 20. März 2006 21.21 schrieb ogjunk-pgjedan@yahoo.com:
> Hello,
>
> I have a dump (non-binary, if it matters) of a DB that has some characters
> in it that my DB doesn't want to take. I'm using PG 8.0.3 and it was
> created with Unicode support:
>
> => \encoding
> UNICODE

This is the client encoding (see \?).
To get server encoding you can do
show server_encoding;  (see command show in the command reference)

Then have a look at the dump to check what the encoding was when the dump was
taken. (there is a line like set client_encoding = .... somewhere at the
beginning of the dump)

There where some changes within the unicode handling some time ago. If the
dump was taken by an other server version there migth be differences. (search
the archives, there are serveral threads about the issue.

Best regards
Ivo

>
> Characters that cause problems during the import are things like:
> é and other characters from the Extended ASCII table (c.f. bottom of
> http://www.lookuptables.com/ ) Also:
> 'ÇѱÛÀÌ Á¦´ë·Î µÇ´ÂÁö ½ÇÇè... ¿©±â¿¡´Â ¾Æ¹« ¸µÅ©µµ ¾ø½À´Ï´Ù.±×³É ÀÌ
> ³»¿ë¹Û¿¡ ¾ø½À´Ï´Ù.'
>
>
> The errors I get on import are of this type:
>   ERROR:  invalid byte sequence for encoding "UNICODE": 0xdb20
>
> The data may not be the cleanest, and I have limited control over that.
> But I am wondering if there is any way I can import this data, even if that
> means converting some of the characters into something else.
>
> Thanks,
> Otis
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq

Re: Character encoding problems and dump import

От
John DeSoi
Дата:
On Mar 20, 2006, at 3:21 PM, <ogjunk-pgjedan@yahoo.com> <ogjunk-
pgjedan@yahoo.com> wrote:

> The data may not be the cleanest, and I have limited control over
> that.
> But I am wondering if there is any way I can import this data, even
> if that means converting some of the characters into something else.

inconv might be able to help you fix encoding problems

http://www.gnu.org/software/libiconv/documentation/libiconv/iconv.1.html



John DeSoi, Ph.D.
http://pgedit.com/
Power Tools for PostgreSQL


Re: Character encoding problems and dump import

От
Дата:
Thanks John and Ivo for help.
It turned out that I had to manually SET CLIENT_ENCODING TO 'LATIN1' before processing the dump (which didn't have this
specified). This fixed the problem. 

I thought a DB set to UNICODE char encoding (server_encoding) would process the Extended ASCII characters, but it
didn't...not sure why. 

Otis

----- Original Message ----
From: John DeSoi <desoi@pgedit.com>
To: ogjunk-pgjedan@yahoo.com
Cc: pgsql-admin@postgresql.org
Sent: Tuesday, March 21, 2006 12:31:16 AM
Subject: Re: [ADMIN] Character encoding problems and dump import


On Mar 20, 2006, at 3:21 PM, <ogjunk-pgjedan@yahoo.com> <ogjunk-
pgjedan@yahoo.com> wrote:

> The data may not be the cleanest, and I have limited control over
> that.
> But I am wondering if there is any way I can import this data, even
> if that means converting some of the characters into something else.

inconv might be able to help you fix encoding problems

http://www.gnu.org/software/libiconv/documentation/libiconv/iconv.1.html



John DeSoi, Ph.D.
http://pgedit.com/
Power Tools for PostgreSQL


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org