Обсуждение: Problem with character encodings.

Поиск
Список
Период
Сортировка

Problem with character encodings.

От
"Korumilli, Bala S (GE Healthcare)"
Дата:
Hi,
 
We recently moved to postgres.
We need a special character '°'  degree symbol in our database.
 
Query of interest:   INSERT INTO test1 VALUES(3,'sree°kanth','05/05/1983','06:06:06.234',3.2345677787);
 
If i use PGADMIN to run this query, '°' is properly inserted into database.
 
But If i write a c++ program using LIBPQ to run this query, I get an error of invalid byte sequence like this..
 
ERROR:  invalid byte sequence for encoding "UTF8".
 
I also observed the same error comes for all the characters other than ascii(0-127).
Is it a limitation of LibPQ library?
 
The follwing information may be helpful to you.
DataBase encoding: UTF8.
client_encoding: UTF8
Postgres version: 8.2.3
 
Can you please tell me what is the reason for this strange behaviour?
 
 
Thanks & Regards
Bala Sreekanth.

Re: Problem with character encodings.

От
"Jeroen T. Vermeulen"
Дата:
On Wed, August 29, 2007 13:59, Korumilli, Bala S (GE Healthcare) wrote:

> If i use PGADMIN to run this query, '°' is properly inserted into
> database.
>
> But If i write a c++ program using LIBPQ to run this query, I get an error
> of invalid byte sequence like this..
>
> ERROR:  invalid byte sequence for encoding "UTF8".
>
> I also observed the same error comes for all the characters other than
> ascii(0-127).
> Is it a limitation of LibPQ library?

No.  Inserting and extracting characters like this should be no problem,
*if* they are properly encoded.


> The follwing information may be helpful to you.
> DataBase encoding: UTF8.
> client_encoding: UTF8
> Postgres version: 8.2.3

The database doesn't expose its encoding IIRC, only its character set.  So
it's possible that that should be Unicode, not UTF-8.

But how did you specify this special character in your C++ source file? 
Because if you typed it in directly as a non-ASCII character, it's quite
possible that your editor and your compiler disagree on encoding.  Or that
the compiler uses an encoding that does not happen to match the client
encoding.


Jeroen




Re: Problem with character encodings.

От
"Jeroen T. Vermeulen"
Дата:
On Thu, August 30, 2007 16:29, Korumilli, Bala S (GE Healthcare) wrote:

> I also tried inserting the character using its CODE. Characters 176, 186
> for  code page 1252. And character 248 for code page 437 .
> And I also tried all the characters from 128 to 255. For all these
> characters, Postgres is giving the same error.

So you also tried sending raw byte values for the UTF-8 character?  For
codepoint 176, I think that would be 0xc2 (194) followed by 0xb0 (176, by
pure coincidence).

If that failed as well then it sounds as if the database was not set up as
a Unicode one after all.  That's where it get system-specific and I
probably can't help you with it, but it sounds as if your database may
have been set up for ASCII.


> Can you please eloborate on this I could not understand this sentence.
> "The database doesn't expose its encoding IIRC, only its character set.
> So it's possible that that should be Unicode, not UTF-8."

It's only a small point, but IIRC the "encoding" given for a postgres
database set up to support Unicode is normally "Unicode," not "UTF-8."


Jeroen




Re: Problem with character encodings.

От
Ivo Rossacher
Дата:
Am Mittwoch, 29. August 2007 08:59 schrieb Korumilli, Bala S (GE Healthcare):
If the client_encoding is UTF8 you need to confert the strings in your c++ 
code to UTF8.
The other options to adjust the client_encoding to the encoding the client 
machine uses and let the server convert from and to UTF8 for you. This would 
your client program require to anlyze the encoding of the system and send set 
client_encoding to '<found encoding>';  to the server.
See chapter 21 in the manual for more details about the issue.

Best regards
Ivo

> But If i write a c++ program using LIBPQ to run this query, I get an error
> of invalid byte sequence like this..
>
> ERROR:  invalid byte sequence for encoding "UTF8".
>
> I also observed the same error comes for all the characters other than
> ascii(0-127). Is it a limitation of LibPQ library?
>
> The follwing information may be helpful to you.
> DataBase encoding: UTF8.
> client_encoding: UTF8
> Postgres version: 8.2.3
>
> Can you please tell me what is the reason for this strange behaviour?


Re: Problem with character encodings.

От
"Hilton Perantunes"
Дата:
I also had similar problems with encoding. I'm using libpqxx, and as Ivo stated, I need do this to get my queries working properly:

    work T(C, "my_transaction");
    T.exec("SET client_encoding = UTF8");
    result R(T.exec("SELECT * FROM my_table_with_portuguese_strings"));

--


On 8/30/07, Ivo Rossacher < rossacher@bluewin.ch> wrote:
Am Mittwoch, 29. August 2007 08:59 schrieb Korumilli, Bala S (GE Healthcare):
If the client_encoding is UTF8 you need to confert the strings in your c++
code to UTF8.
The other options to adjust the client_encoding to the encoding the client
machine uses and let the server convert from and to UTF8 for you. This would
your client program require to anlyze the encoding of the system and send set
client_encoding to '<found encoding>';  to the server.
See chapter 21 in the manual for more details about the issue.

Best regards
Ivo

> But If i write a c++ program using LIBPQ to run this query, I get an error
> of invalid byte sequence like this..
>
> ERROR:  invalid byte sequence for encoding "UTF8".
>
> I also observed the same error comes for all the characters other than
> ascii(0-127). Is it a limitation of LibPQ library?
>
> The follwing information may be helpful to you.
> DataBase encoding: UTF8.
> client_encoding: UTF8
> Postgres version: 8.2.3
>
> Can you please tell me what is the reason for this strange behaviour?

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

                http://www.postgresql.org/about/donate



--
Hilton William Ganzo Perantunes
Sistemas de Informação - Universidade Federal de Santa Catarina
--
Dinheiro não traz felicidade, mas dá uma sensação tão parecida...  -_-