Multibyte characters handling bug in varchar()

Поиск
Список
Период
Сортировка
От Edward
Тема Multibyte characters handling bug in varchar()
Дата
Msg-id 001d01c227ce$a28ed0d0$06c6a8c0@ncvillas.com
обсуждение исходный текст
Список pgsql-bugs
Hello,

I am using Postgresql 7.1 on Linux platform (RedHat 7.1).

My database encoding is 'EUC_CN'.

The application is accessing database with PG JDBC2.0.

I had define a field in a table like:

create table test1 (

id integer default not null,
memo varchar(128)


);

The memo field is for user to record some comment or alike. They input Chin=
ese (GB2312 or GBK encoding) mixed with ASCII.

Problem happens when:

The length of the input string is larger than 128,  and the 128th and 129th=
 byte consists of a Chinese character (you know Chinese characters use two =
bytes in GB2312 or GBK encoding).

The problem is:

The insert query will be running well without any error. But the getString =
method will get a zero length String from the field.

More complications:

When I pg_dump the database and restore it, the scripts produced by pg_dump=
 (with -D flag, which means dump with attribute) can not be restored. When =
I check the scripts I found that the memo field of this record is dumped wi=
thout the ending single quote (this is because the 128th byte and the singl=
e quote followed acutally consists of another unrecognized chinese characte=
r) and that is why it failed to be restored.

Below is the dump for this record:

INSERT INTO=20
"test1" ("id","memo") VALUES (5,'=A8=B0=A8=B0=A1=C1??=A1=EC=A8=AA???=A6=CC?=
=A8=BA?=A8=B0=A8=A4=A8=A6??=A8=AEGH=A6=CC=A3=A4?a??=A8=AC????=A8=A4?=A1=C2=
=A8=B0a=A8=BA?5??1=A8=A8??=A8=A23=A8=A8?=A8=B0=A8=B0=A6=CC=A8=B2=A8=B0??=A8=
=AE?a?=A8=AC=A1=E32??=A8=A2?=A8=B0???D??=A1=C01=A1=E8?=A3=A4=A8=AC?=A8=A2??=
?=A8=AC=A8=AC=A1=EA?=A8=AA?=A6=CC?=A8=BA=A1=C0????=A1=C1=A1=E9=A8=B0a?1?=A8=
=A6??=A8=BA1=A8=B0=A6=CC?=A1=C2=A8=AA???=A8=B0?=A8=B0a?=A8=AE?');


I feel the Multibyte is not properly handled in this case. Looking forward =
to hearing from dev team.

Finally I think PostgreSQL is an excellent database, but the name postgresq=
l seems very difficult to pronounce and it is probably one obstacle prevent=
ing people knowing more about it.

Thanks for the hardworking of the dev team, you have done excellent work!

Best Regards,

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tatsuo Ishii
Дата:
Сообщение: Re: Multibyte characters handling bug in varchar()
Следующее
От: pgsql-bugs@postgresql.org
Дата:
Сообщение: Bug #707: Cannot connect Java Applet to Postgresql database