Обсуждение: A question about postgresql 8.1 and UTF strings

Поиск
Список
Период
Сортировка

A question about postgresql 8.1 and UTF strings

От
"Yair Zas"
Дата:
Good day all,
I have postgresql 8.1 (on WINXP service pack 2).
I have a table t_users with the columns id (serial),  username ( varchar(256) ), and password ( varchar(256) )
The database was created with UT8 encoding.
I have 5 users at the database, and some of them have non-latin (especially hebrew) usernames, i entered these non latin values simply by using the pgadmin gui (was i correct to do so?) to enter the non-latin strings.
I opened a jdbc connection using a standard postgres JDBC URL and selected a recordset with a hebrew username.
I used String user = ResultSet.GetString("username") to get the String (which contains 4 hebrew letters)
and then I used
 
System.out.println(user.getBytes().length) - however, instead of seeing 8 bytes (2 bytes per each character, 4 characters ), i saw 4 bytes ....
Can you please tell me what is it that I'm doing wrong?
 
Thanks
 
Yair
 

Re: A question about postgresql 8.1 and UTF strings

От
Oliver Jowett
Дата:
Yair Zas wrote:

> System.out.println(user.getBytes().length) - however, instead of seeing
> 8 bytes (2 bytes per each character, 4 characters ), i saw 4 bytes ....
> Can you please tell me what is it that I'm doing wrong?

getBytes() uses the JVM's default encoding to translate the String to
bytes.. This is usually something like ISO-8859-1, which is a
one-byte-per-character encoding that can't represent Hebrew letters.

If you want to generate a representation in a particular encoding (e.g.
your description implies you're expecting a particular
2-byte-per-character encoding) then you should use the getBytes()
variant that takes an encoding name.

This is not something specific to JDBC, it's standard Java. If you are
working with characters beyond 7-bit US-ASCII, I'd strongly recommend
doing some research into Java's internal string representation and how
that is transformed into bytes .. The javadoc for Charset is one
starting point:
http://java.sun.com/j2se/1.4.2/docs/api/java/nio/charset/Charset.html

-O