Обсуждение: Multibyte characters handling bug in varchar()


Multibyte characters handling bug in varchar()


I am using Postgresql 7.1 on Linux platform (RedHat 7.1).

My database encoding is 'EUC_CN'.

The application is accessing database with PG JDBC2.0.

I had define a field in a table like:

create table test1 (

id integer default not null,
memo varchar(128)


The memo field is for user to record some comment or alike. They input Chin=
ese (GB2312 or GBK encoding) mixed with ASCII.

Problem happens when:

The length of the input string is larger than 128,  and the 128th and 129th=
 byte consists of a Chinese character (you know Chinese characters use two =
bytes in GB2312 or GBK encoding).

The problem is:

The insert query will be running well without any error. But the getString =
method will get a zero length String from the field.

More complications:

When I pg_dump the database and restore it, the scripts produced by pg_dump=
 (with -D flag, which means dump with attribute) can not be restored. When =
I check the scripts I found that the memo field of this record is dumped wi=
thout the ending single quote (this is because the 128th byte and the singl=
e quote followed acutally consists of another unrecognized chinese characte=
r) and that is why it failed to be restored.

Below is the dump for this record:

"test1" ("id","memo") VALUES (5,'=A8=B0=A8=B0=A1=C1??=A1=EC=A8=AA???=A6=CC?=

I feel the Multibyte is not properly handled in this case. Looking forward =
to hearing from dev team.

Finally I think PostgreSQL is an excellent database, but the name postgresq=
l seems very difficult to pronounce and it is probably one obstacle prevent=
ing people knowing more about it.

Thanks for the hardworking of the dev team, you have done excellent work!

Best Regards,