Обсуждение: Re: DataDirect Driver, ExecDirect and UTF-8

Поиск
Список
Период
Сортировка

Re: DataDirect Driver, ExecDirect and UTF-8

От
Ken Sell
Дата:
Hi,

I wasn’t a member of the mailing list when I sent this, so I’m not sure it actually made it out there.
I apologize if this is a duplicate.

....Ken


On 5/4/09 11:49 AM, "Ken Sell" <ksell@greenplum.com> wrote:

Hi,

I’m a the new connectivity developer at GreenPlum. GreenPlum makes a data warehouse DBMS based on PostgreSQL.
I’m working on a problem where a user is attempting to insert a non-ASCII UTF-8 values (i.e. An umlaut over an ‘o’).
The test does an insert via an SQLExecDirectW into a table.  The text looks like this:

  "insert into t1 values ('ö')"

I’ve built and debugged the postgreSQL driver (version 8.02.0500). It looks like the text makes it through the
Driver Manager (i.e. DataDirect) ok. I see the correct value in SQLExecDirectW in odbcapiw.c, but I
also see the code in SQLExecDirectW call ucs2_to_utf8. ucs2_to_utf8 tries to interpret the value as
UCS2, but the value is UTF-8. The value is corrupted by ucs2_to_utf8.

I also attempted to call SQLExecDirect (i.e. no W), but the DataDirect driver manager tries to convert the
umlaut value to ASCII and calls SQLExecDirectW instead.

Can someone elaborate on the driver’s correct behavior in this situation? If the database is UTF-8 and
The application is UTF-8, should the driver handle this? Does the application (or driver manager) have
to convert the string to UCS2 first?
 
Thanks,

....Ken

Re: DataDirect Driver, ExecDirect and UTF-8

От
Hiroshi Inoue
Дата:
Ken Sell wrote:
> Hi,
>
> I wasn’t a member of the mailing list when I sent this, so I’m not sure
> it actually made it out there.
> I apologize if this is a duplicate.
>
> ....Ken
>
>
> On 5/4/09 11:49 AM, "Ken Sell" <ksell@greenplum.com> wrote:
>
>     Hi,
>
>     I’m a the new connectivity developer at GreenPlum. GreenPlum makes a
>     data warehouse DBMS based on PostgreSQL.
>     I’m working on a problem where a user is attempting to insert a
>     non-ASCII UTF-8 values (i.e. An umlaut over an ‘o’).
>     The test does an insert via an SQLExecDirectW into a table.  The
>     text looks like this:
>
>       "insert into t1 values ('ö')"
>
>     I’ve built and debugged the postgreSQL driver (version 8.02.0500).
>     It looks like the text makes it through the
>     Driver Manager (i.e. DataDirect) ok. I see the correct value in
>     SQLExecDirectW in odbcapiw.c, but I
>     also see the code in SQLExecDirectW call ucs2_to_utf8. ucs2_to_utf8
>     tries to interpret the value as
>     UCS2, but the value is UTF-8. The value is corrupted by ucs2_to_utf8.

Psqlodbc Unicode driver uses UTF-16 encoding and your application uses
UTF-8 encoding. Isn't the URL
http://media.datadirect.com/download/docs/odbc/allodbc/reference/unicode6.html
related to your problem ?

regards,
Hiroshi Inoue

Re: DataDirect Driver, ExecDirect and UTF-8

От
Ken Sell
Дата:
Hi,

Thanks very much for the information. I think I have it figured out.

Here is my understanding. Feel free to correct me if I have gotten something wrong.
In order for everything to work correctly with a UTF-8 application, a DataDirect driver manager, and psqlodbc you must do the following.

 - Use the header files (e.g. sql.h) from the DataDirect driver manager installation.

 - Compile the driver with the SQLWCHARSHORT define (e.g. #define SQLWCHARSHORT=1).
    This will cause SQLWCHAR to be defined as an unsigned short.

  - Compile the application with the SQLWCHARSHORT define (e.g. #define SQLWCHARSHORT=1).
    This will cause SQLWCHAR to be defined as an unsigned short.

 - Set the DriverUnicodeType data source attribute to 1  (i.e. DriverUnicodeType=1 in your odbc.ini file).
   This tells the driver manager that your driver is expecting UTF-16 encoding for all SQLWCHAR
   parameters.

 - If your database encoding is UTF-8, the descriptor for character columns will be a wide type.
   For example, a char(1) column will be SQL_WCHAR.

- If you do something with a sql statement (e.g. Do a SQLExecDirectW(“insert into T1 values (‘a’)),
  the driver manager (DataDirect) will convert the string from UTF-8 to UTF-16, pass it to the
  the driver as UTF-16. The driver will then translate it back to UTF-8. When the driver is
  unicode (i.e your compiled psqlodbc with as a unicode driver), then this translation
  in the driver manager happens weather or not you use a wide function. The driver manager
  converts both SQLCHAR and SQLWCHAR parameters to UTF-16. It also changes all calls to
  non-wide functions to calls to wide functions to the driver (e.g. SQLExecDirect gets changed
  to SQLExecDirectW). Note, SQLCHAR and SQLWCHAR strings are not translated from UTF-8
  to UTF-16 in the same way. If the call is to a non-wide function, then the translation assumes
  that the string contains only ASCII characters. A non-ASCII character will probably get
  translated incorrectly. A call to wide functions translates UTF-8 to UTF-16 correctly.

 - When using parameters, you must be use the appropriate SQL and C type when describing
    the parameters column values. If you say the C type is SQL_C_CHAR, the value is assumed
    to be in UTF-8 by the driver for a database encoded in UTF-8. IF you say the C type is
   SQL_C_WCHAR, I think the driver will assume the value is in UTF-16 and try to translate
   it to UTF-8.

 - When using column values, you must use the appropriate SQL C type. If you say the column C
   type is SQL_C_WCHAR, then the returned value will be translated to UTF-16 by the driver.
   If you say the column C type is SQL_C_CHAR, then the value will be returned as UTF-8.

....Ken


On 5/5/09 11:13 PM, "Hiroshi Inoue" <inoue@tpf.co.jp> wrote:

Ken Sell wrote:
> Hi,
>
> I wasn’t a member of the mailing list when I sent this, so I’m not sure
> it actually made it out there.
> I apologize if this is a duplicate.
>
> ....Ken
>
>
> On 5/4/09 11:49 AM, "Ken Sell" <ksell@greenplum.com> wrote:
>
>     Hi,
>
>     I’m a the new connectivity developer at GreenPlum. GreenPlum makes a
>     data warehouse DBMS based on PostgreSQL.
>     I’m working on a problem where a user is attempting to insert a
>     non-ASCII UTF-8 values (i.e. An umlaut over an ‘o’).
>     The test does an insert via an SQLExecDirectW into a table.  The
>     text looks like this:
>
>       "insert into t1 values ('ö')"
>
>     I’ve built and debugged the postgreSQL driver (version 8.02.0500).
>     It looks like the text makes it through the
>     Driver Manager (i.e. DataDirect) ok. I see the correct value in
>     SQLExecDirectW in odbcapiw.c, but I
>     also see the code in SQLExecDirectW call ucs2_to_utf8. ucs2_to_utf8
>     tries to interpret the value as
>     UCS2, but the value is UTF-8. The value is corrupted by ucs2_to_utf8.

Psqlodbc Unicode driver uses UTF-16 encoding and your application uses
UTF-8 encoding. Isn't the URL
http://media.datadirect.com/download/docs/odbc/allodbc/reference/unicode6.html
related to your problem ?

regards,
Hiroshi Inoue