Re: Multi-byte character bug

Поиск
Список
Период
Сортировка
От Richard So
Тема Re: Multi-byte character bug
Дата
Msg-id 000401c237f6$87f63fa0$0a00a8c0@netrogen.local
обсуждение исходный текст
Ответ на Multi-byte character bug  ("Richard So" <richso@i-cable.com>)
Ответы Re: Multi-byte character bug  (Tatsuo Ishii <t-ishii@sra.co.jp>)
Список pgsql-bugs
>> Two bugs has been found in the SQL parser and Multibyte char support:
>>=20=20

>What is the encoding for "chinese char"? You need to give us more
>info.

By Chinese here, I mean BIG5 encoding character which is a widely used
encoding in HK and Taiwan.
My setup:
    Db encoding: EUC_TW
    Client (JDBC / ODBC) Encoding: BIG5
        JDBC: I supplied the parameter 'charSet=3DBig5' to the
connection string
        ODBC: my locale (Chinese Win2000 machine) is Chinese
Taiwan
    Client application: Tomcat4 jsp page (see the attached)
    App / Db Server: Redhat 7.3 Linux + postgresql (set) 7.2.1-2PGDG
(download binary rpm) + Tomcat4
    App / DB Server locale: zh_TW.Big5
    JDBC driver: pgjdbc2.jar
    Client Machine: Win2000 Chinese (Taiwan) Version with SP2 + I.E.
(jsp) +             Delphi SQL Explorer (ODBC)
    Client Machine locale: Chinese (Taiwan)

>> 1.       'Problem connecting to database: java.sql.SQLException:
ERROR:
>> Invalid EUC_TW character sequence found (0xb27a)' was reported in
using
>> JDBC driver to insert record, similar error reported when using ODBC
>>driver and psql, since auto-conversion from client to server should
>>convert the charcter to a valid EUC_TW char, therefore this is a bug

>How did you set the auto-conversion settings for psql? I suspect you
>did something wrong with it.

I've done a new check on it, I found JDBC and ODBC driver still report
the error message but psql do not (may be as you said, I've done a wrong
procedure).  However, the problem still there: why JDBC and ODBC still
report the error ?
I just tried some Chinese words, but there may be some of other
character will also cause the problem.=20=20
I know Tomcat4 default will return the request parameters in ISO-8859
and therefore I've added code=20
<%@ page contentType=3D"text/html; charset=3DBig5"%>
<%
    request.setCharacterEncoding("BIG5");
%>
to the JSP page and dump the actual SQL posted to postgresql server to
make sure the SQL is correct and its attached (pls see attached file:
offence1.zip).

>> 2.       inserting record with xx =A8 chinese char, the SQL parser
>>report something like 'Problem connecting to database:
>> java.sql.SQLException: ERROR: parser: parse error at or near
"4567891"'
>> (similar in jdbc and odbc), and the error 'unterminated string' has
>> been reported when using psql.
>>=20=20

The character code is 0xc05c, in which the second byte is actually a "\"
(back-slash)
(pls see the attached file: offence2.zip)

>> I=A1=A6ve found the problem exists since 7.1.x till 7.2.*.

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Bug #723: XlogFlush
Следующее
От: "Richard So"
Дата:
Сообщение: Re: Multi-byte character bug (resend for clarify)