Обсуждение: Encoding weirdness with JDBC, driver crashing?

Поиск
Список
Период
Сортировка

Encoding weirdness with JDBC, driver crashing?

От
Nikola Milutinovic
Дата:
Hi all.

I'm having a weird episode with JDBC connection and charSet encoding.

OS: Digital UNIX 4.0D/F
DB: PostgreSQL 7.1.2 and 7.1.3

I have created a database with "-E LATIN2" option. Then I imported a WIN1250
encoded data into it - the data was generated from a set of static HTML pages
and loading was with WIN1250 client encoding.

The data looks OK from "psql", changing client encoding yields the expected
result. I'm preety sure it is as it should be.

JDBC interface behaves in a very weird manner:

URL: jdbc:postgresql://localhost/mercury
OUT: all our alphabet specific characters are tuned into "?"

URL: jdbc:postgresql://localhost/mercury?charSet=LATIN1
OUT: I get data OK - LATIN2 encoded!!!

URL: jdbc:postgresql://localhost/mercury?charSet=LATIN2
OUT: all our alphabet specific characters are tuned into "?"

URL: jdbc:postgresql://localhost/mercury?charSet=UNICODE
OUT: JDBC connection crashes with:

Exception in thread "main" java.sql.SQLException:
  at org.postgresql.Connection.ExecSQL(Connection.java, Compiled Code)
  at org.postgresql.jdbc2.Statement.execute(Statement.java, Compiled Code)
  at org.postgresql.jdbc2.Statement.executeQuery(Statement.java, Compiled Code)
  at test2PostgreSQL.main(test2PostgreSQL.java, Compiled Code)

On the server side, PostgreSQL spits out:

ERROR:  parser: parse error at or near "t?"
FATAL 1:  Socket command type S unknown

(on my terminal, that "t?" looks really strange, two chars I cannot even
describe, I guess Copy/Paste changed it to "t?")

So, anyone has an idea what is going on? I can live with "charSet=LATIN1" for
the moment, but I have a nasty feeling, the data is not loaded as it should be.
Namely, I'm not sure that, for instance, "c-acsan" letter Latin-2 encoded in
PostgreSQL is really transformed into "c-acsan" Unicode encoded inside my Java
application.

Since I'm more oriented to JSP for this matter, I'll live with it, but I have an
uneasy feeling about it. I think this issue should be addressed.

PostgreSQL was built with:

--enable-locale              enable locale support
--enable-recode              enable character set recode support
--enable-multibyte           enable multibyte character support
--enable-unicode-conversion  enable unicode conversion support

TYIA,
Nix.


Re: Encoding weirdness with JDBC, driver crashing?

От
Barry Lind
Дата:
Nikola,

You shouldn't need to pass any encoding and you certainly shouldn't pass
an encoding that is different than the database encoding.

If the database is created with LATIN2 (verify by 'select
getdatabaseencoding()' from psql) then the jdbc driver will
automatically convert from/to the database encoding to/from the unicode
internal representation java uses.

If you explicitly set an encoding different than the database is using
you will likely have problems since then the driver will convert from/to
this encoding instead of the encoding the database is using.

If you are still having problems, please post a simple test case that I
can run locally to reproduce your problem

thanks,
--Barry



Nikola Milutinovic wrote:

> Hi all.
>
> I'm having a weird episode with JDBC connection and charSet encoding.
>
> OS: Digital UNIX 4.0D/F
> DB: PostgreSQL 7.1.2 and 7.1.3
>
> I have created a database with "-E LATIN2" option. Then I imported a
> WIN1250 encoded data into it - the data was generated from a set of
> static HTML pages and loading was with WIN1250 client encoding.
>
> The data looks OK from "psql", changing client encoding yields the
> expected result. I'm preety sure it is as it should be.
>
> JDBC interface behaves in a very weird manner:
>
> URL: jdbc:postgresql://localhost/mercury
> OUT: all our alphabet specific characters are tuned into "?"
>
> URL: jdbc:postgresql://localhost/mercury?charSet=LATIN1
> OUT: I get data OK - LATIN2 encoded!!!
>
> URL: jdbc:postgresql://localhost/mercury?charSet=LATIN2
> OUT: all our alphabet specific characters are tuned into "?"
>
> URL: jdbc:postgresql://localhost/mercury?charSet=UNICODE
> OUT: JDBC connection crashes with:
>
> Exception in thread "main" java.sql.SQLException:
>  at org.postgresql.Connection.ExecSQL(Connection.java, Compiled Code)
>  at org.postgresql.jdbc2.Statement.execute(Statement.java, Compiled Code)
>  at org.postgresql.jdbc2.Statement.executeQuery(Statement.java, Compiled
> Code)
>  at test2PostgreSQL.main(test2PostgreSQL.java, Compiled Code)
>
> On the server side, PostgreSQL spits out:
>
> ERROR:  parser: parse error at or near "t?"
> FATAL 1:  Socket command type S unknown
>
> (on my terminal, that "t?" looks really strange, two chars I cannot even
> describe, I guess Copy/Paste changed it to "t?")
>
> So, anyone has an idea what is going on? I can live with
> "charSet=LATIN1" for the moment, but I have a nasty feeling, the data is
> not loaded as it should be. Namely, I'm not sure that, for instance,
> "c-acsan" letter Latin-2 encoded in PostgreSQL is really transformed
> into "c-acsan" Unicode encoded inside my Java application.
>
> Since I'm more oriented to JSP for this matter, I'll live with it, but I
> have an uneasy feeling about it. I think this issue should be addressed.
>
> PostgreSQL was built with:
>
> --enable-locale              enable locale support
> --enable-recode              enable character set recode support
> --enable-multibyte           enable multibyte character support
> --enable-unicode-conversion  enable unicode conversion support
>
> TYIA,
> Nix.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>



Re: Encoding weirdness with JDBC, driver crashing?

От
Barry Lind
Дата:
Nikola,

Thank you for the test case.  I have been able to reproduce the problem
with your test code and discovered that the problem appears to be a bug
in your code, not in the jdbc driver.

In your test program you had the following code:

         System.out.println ("ID: " + id + " NAME: " + name);

This use of println is going to convert the string to be printed into
the default character encoding for the JVM, which is probably not what
you want (and gives me the wrong result).  However if you explicitly
convert to the desired character set (LATIN2 in this case) then the
output is correct.  So I changed your test program and replaced the line
above with:

         String l_mesg = "ID: " + id + " NAME: " + name;
         //output using the default jvm encoding
         System.out.println (l_mesg);
         try {
           //output using an explicit encoding
           System.out.write( l_mesg.getBytes("LATIN2") );
           System.out.println();
    } catch (Exception l_e) {
        System.out.println("error : "+l_e.toString());
             System.exit(1);
    }

Below is the output of my testing showing that the jdbc driver returns
the same values as psql does (when using the correct charset to print
out the results).  I performed this test with both the 7.1 driver and
the 7.2 driver.  Therefore I don't see any bug with the jdbc driver.

thanks,
--Barry


[blind@barry pgsql]$ createdb -E LATIN2 test
CREATE DATABASE
[blind@barry pgsql]$ psql test
Welcome to psql, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
        \h for help with SQL commands
        \? for help on internal slash commands
        \g or terminate with semicolon to execute query
        \q to quit

test=# \l
        List of databases
    Name    | Owner | Encoding
-----------+-------+-----------
  template0 | blind | SQL_ASCII
  template1 | blind | SQL_ASCII
  test      | blind | LATIN2
(3 rows)

test=# create table test_temp (id integer, test_text text);
CREATE
test=# insert into test_temp values( 1, 'Milutinovi\346 \251 \271 \306
\346 \310 \350 \320 \360 \256 \276');
INSERT 126055 1
test=# select * from test_temp;
  id |            test_text
----+---------------------------------
   1 | Milutinoviæ © ¹ Æ æ È è Ð ð ® ¾
(1 row)

test=# \q
[blind@barry pgsql]$ cd ~/work
[blind@barry work]$ java test3psql
Connecting with: jdbc:postgresql://localhost/test
ID: 1 NAME: Milutinovi? ? ? ? ? ? ? ? ? ? ?
ID: 1 NAME: Milutinoviæ © ¹ Æ æ È è Ð ð ® ¾
[blind@barry work]$






Nikola Milutinovic wrote:

> Barry Lind wrote:
>
>> Nikola,
>>
>> You shouldn't need to pass any encoding and you certainly shouldn't
>> pass an encoding that is different than the database encoding.
>>
>> If the database is created with LATIN2 (verify by 'select
>> getdatabaseencoding()' from psql) then the jdbc driver will
>> automatically convert from/to the database encoding to/from the
>> unicode internal representation java uses.
>>
>> If you explicitly set an encoding different than the database is using
>> you will likely have problems since then the driver will convert
>> from/to this encoding instead of the encoding the database is using.
>>
>> If you are still having problems, please post a simple test case that
>> I can run locally to reproduce your problem
>
>
> Hi Barry.
>
> After some time, I have decided to clean-up the mess of my DB and I
> created TWO additional databases. First, I wanted to convert all my data
> from WIN1250 (inside a LATIN1 database, which is obviously an invitation
> for the Devil...) and to get to a new data structure (I had a company's
> e-mail address book inside and I wanted to switch from a fixed 3-layer
> hierarchy to a regular n-tree hierarchy). While I was at it, I thought I
> could switch from LATIN2 to UNICODE.
>
> So, I have two databases: "import" with LATIN2 encoding and "www" with
> UNICODE encoding. I first exported the old database, using:
>
> pg_dump -C -d -f mercury_dump.sql mercury
>
> Then I editted the resulting file, namely, I added "\encoding WIN1250"
> to it, so the data on the input would be treated for what it was - a
> WIN1250 encoded data.
>
> Then I fed that to "import" database. After that I wrote a small Java
> application which took data from "import" and transformed it to "www". I
> thought it would convert the strings from LATIN2 to UNICODE, along the way.
>
> I think all of the conversions went OK. If I could get pg_dump to dump
> extended ASCII as octal values, I could verify for sure (know how I
> could do it?).
>
> THE PROBLEM
> -----------
>
> Of course, the problem is JDBC. Again, I see those dreaded "?" instead
> of our alphabet characters. So, I took the old test Java application and
> fired it up, just to see what is going on. Just to make myself perfectly
> clear, the letters I'm interested in are: c-acsan, c-caron, s-caron,
> z-caron and d-slash
>
> This is the result:
>
> Database "import" LATIN2 encoding
> ---------------------------------
>
> charSet=(unspecified): I get "?" for all of our letters
> charSet=LATIN1: I get what appears to be LATIN2 encoded data
> charSet=LATIN2: I get "?"
> charSet=UNICODE: I get SQLException.
>
> Database "www" UNICODE encoding
> -------------------------------
>
> charSet=(unspecified): I get "?" for all of our letters
> charSet=LATIN1: I get what appears to be depricated UNICODE encoded data
> charSet=LATIN2: I get what looks like depricated UNICODE, but
>                 slightly different from the former case.
> charSet=UNICODE: I get SQLException.
>
> THE TEST CASE
> -------------
>
> You asked me if I could provide a test example, well here goes.
>
> 1. create a database with LATIN2 encoding
> 2. create a table
>    CREATE TABLE (
>      id int4,
>      test_text text
>    );
> 3. insert some data into it.
>    insert into test_temp values( 1, 'Milutinovi\346 \251 \271 \306 \346
> \310 \350 \320 \360 \256 \276');
>
> 4. try to fetch this textual field. I'm attaching a Java application I
> used for testing.
>
> ADDITIONAL
> ----------
>
> Should it appear to work, could you send me the JDBC driver you're
> using, so I could test it on my side?
>
> Nix.
>
>
> ------------------------------------------------------------------------
>
> import java.sql.*;
>
> public class test3psql {
> //  static public final String CONN_URL = "jdbc:postgresql://Mercury.ev.co.yu/www";
>   static public final String CONN_URL = "jdbc:postgresql://Mercury.ev.co.yu/import";
>   static public final String user = "www";
>   static public final String pass = "test00";
>
>   static private Connection conn;
>   static private String connString;
>   static private Statement st;
>   static private ResultSet rs;
>   static private int id=1;
>
>   static public void main( String args[] )
>     throws SQLException, ClassNotFoundException {
>     conn=null;
>     Class.forName( "org.postgresql.Driver" );
>     connString = CONN_URL;
>     if( args.length == 1 ) {
>       connString += "?charSet=" + args[0];
>     }
>     System.out.println( "Connecting with: " + connString );
>     conn = DriverManager.getConnection( connString, user, pass );
>     if( conn != null ) {
>       st = conn.createStatement();
> //      rs = st.executeQuery( "SELECT * FROM e_user WHERE surname LIKE 'Ivkovi%'" );
>       rs = st.executeQuery( "SELECT * FROM test_temp" );
>       while( rs.next() ) {
>         //String name = rs.getString( "name" );
>         //String surn = rs.getString( "surname" );
>         String name = rs.getString( "tekst" );
>         String surn = "EMPTY";
>         int id = rs.getInt( "id" );
>
>         System.out.println ( "ID: " + id + " NAME: " + name + " SURNAME: " + surn );
>       }
>       rs.close();
>       st.close();
>       conn.close();
>     }
>   }
> }
>
>
> ------------------------------------------------------------------------
>
> java -classpath /usr/local/pgsql/share/java/postgresql.jar:. test3psql $1
>



Re: Encoding weirdness with JDBC, driver crashing?

От
Rene Pijlman
Дата:
On Mon, 03 Dec 2001 19:51:16 -0800, you wrote:
>I have been able to reproduce the problem with your test code and
>discovered that the problem appears to be a bug in your code, not
>in the jdbc driver.

Wow, great service Barry!

Regards,
René Pijlman <rene@lab.applinet.nl>