JDBC driver is too fussy about Unicode
От | James Pharaoh |
---|---|
Тема | JDBC driver is too fussy about Unicode |
Дата | |
Msg-id | CALSKDaQw6qbZdjV+DobTNOtuUVxco_rzfEe0=uiRWKLjCSiq9g@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: JDBC driver is too fussy about Unicode
(Oliver Jowett <oliver@opencloud.com>)
|
Список | pgsql-jdbc |
Hi, I have been trying to track down a bug in an application. This is a message broker for a chat application, written in Hibernate using PostgreSQL underneath. I was rather confused when I started getting decoding errors from data written into the database by the same application. Here's part of the stack trace from one such exception: org.hibernate.exception.DataException: could not load an entity: [com.pharaohsystems.txt2.db.TextRec#24413864] (...) Caused by: org.postgresql.util.PSQLException: Invalid character data was found. This is most likely caused by stored data containing characters that are invalid for the character set the database was created in. The most common example of this is storing 8bit data in a SQL_ASCII database. at org.postgresql.jdbc2.AbstractJdbc2ResultSet.getString(AbstractJdbc2ResultSet.java:1891) at org.postgresql.jdbc2.AbstractJdbc2ResultSet.getString(AbstractJdbc2ResultSet.java:2190) I've gone in and looked at the underlying record. It contains the following character: http://www.fileformat.info/info/unicode/char/1f60a/index.htm Actually the exception above was generated after I went in and doctored the database so that the field in question contains only this particular character. I am confused because this is clearly a valid UTF-8 character. This can be easily seen by matching the UTF-8 character code shows on the page above, f09f988a, with the output from a simple query: txt2=# select text, encode (text::bytea, 'hex') from text where id = 24413864; text | encode ------+---------- 😊 | f09f988a (1 row) Thinking about this I have come to the conclusion that it is because of the Unicode version being used. This character is present in Unicode 6.0.0, but I am using Java 6, which apparently uses Unicode 4.0. My point is that this seems like an absurd restriction. Java allows me to use characters which aren't included in the Unicode specification it was built with. The postgresql driver allows me to store these values. But when I try and bring them out it gives me an error. This is a simple and well defined conversion. This code point exists in both UTF-8 (the encoding used by the database) and UTF-16 (the encoding used in Java). The conversion has clearly taken place to store this in the database. Why can't it pull it out? I can only think of two solutions (or workarounds). One is to perform the conversion myself. This is a pain because with a lot of user specified data these characters could get all over the place. The other is to filter out these characters. Neither of these are very appealing. Does anyone have any thoughts on this? I could of course be wrong and there could be something else going on here. Any ideas? Thanks, James
В списке pgsql-jdbc по дате отправления: