A JDBC bug or problem relating to string length in Java vs. PG (long)
От | joe user |
---|---|
Тема | A JDBC bug or problem relating to string length in Java vs. PG (long) |
Дата | |
Msg-id | 20030901015316.1284.qmail@web20416.mail.yahoo.com обсуждение исходный текст |
Ответы |
Re: A JDBC bug or problem relating to string length in Java
|
Список | pgsql-jdbc |
String lengths aren't the same in Java and Postgres. _All_ Strings in Java are stored in UTF-16 (1 char = 16 bits). That means that a string of 100 Chinese characters will have a Java String length of 100. But, Postgres by default does not store all strings as UTF-16. (I couldn't find in the docs what the default is, except that it takes it from the computer it is installed on, which in my case is plain-vanilla Redhat 9, which I assume uses a Latin encoding as the default, which is 1 char = 1 byte, right?) It has 8-bit chars (is this correct?), so that the string of 100 Chinese chars would have a length greater than 100 in Postgres-world. Where this causes problems is in VARCHAR types. Let's say I have a table with a VARCHAR(100). If I try to store a string longer than 100 chars into that, I will get an SQLException, so, being the defensive programmer that I am, I write a little static method called truncate that looks like this: static String truncate(String s, int l) { if(s.length() < l) return s; else return s.substring(0, l); } and then I do this kind of thing: preparedStatement.setString(1, truncate(theString, 100)); which should never throw an SQLException because I know the String length is safe... except that this is a web application, being used by people all over the world, including people with non-Latin characters. What is the solution to this? I think that the real solution would be to ammend the JDBC spec to say that there should be an SQL string length method somewhere that lets us test what the database thinks is going to be the length. Another solution would be to convert the string to a byte[] using UTF-8 (I assume this is what Postgres does) and then look at the length of that byte[]. It would be possible to take the be-very-conservative approach, and, for a VARCHAR(100), truncate the Java String to 50, but even this will not work. I think there are some characters in UTF-16 that encode to more than 2 bytes in UTF-8. Another approach would be for me to use TEXT instead of VARCHAR. I have various reasons for not wanting to do that if I could find some other solution. Another option would be to tell Postgres that I want to use UTF-16 for its encoding. I don't really care about the disk space, but the problem is that this means that a given page of disk only holds half as much data, which means that the OS, which caches disk pages in RAM, will only be able to do half the effective caching, which means that I need to buy twice as much RAM to get the same performance, and that would be a major hardware investment. Finally, it would be good if there were a Postgres option that says, "Silently truncate strings which are too long." That's all I really want to do anyway. It would actually be cool if java.lang.String had a length(String encoding) method for this purpose. Any thoughts on this? All this is with PG 7.3.4, btw. __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
В списке pgsql-jdbc по дате отправления: