new String(byte[]) performance

Поиск
Список
Период
Сортировка
От Teofilis Martisius
Тема new String(byte[]) performance
Дата
Msg-id 20020911095735.GA6185@teohome.lzua.lt
обсуждение исходный текст
Список pgsql-jdbc
Hello,

While looking through postgresql JDBC driver sources and profiling, I
noticed that the driver uses new String(byte[]) a lot while iterating a
ResultSet. And I noticed that this String constructor takes a lot of
time. I wrote a custom byte[]->String conversion method for UTF-8 that
speeds up iterating over ResultSet 2 times or even more. I have a patch
for PostgreSQL JDBC drivers, but well, this is a workaround and I am not
sure it gets accepted. It does speed things up quite a noticable amount.

Hmm, maybe decodeUTF8() should be synchronized on cdata, or maybe cdata
should be allocated for each call. static cdata version was faster.

By the way. What should a JDBC driver do when f.e. ResultSet.getInt() is
called for a VARCHAR field? I would suggest converting byte arrays to
Strings or even to more precisely typed values (Integers, Doubles and so
on) on QueryExecutor().execute(). This should save some RAM allocation
for receiveTuple, because now memory gets allocated several times- once
for byte[], and second time for String, and third time for Integer or
other object in getObject(). Memory allocation takes a considerable
amount of time. But this stronger typing would remove some of
flexibility to any getXXX for any SQL type field. And it would probably
make the querying itself (QueryExecutor.execute() slower, i don't know
:/

Teofilis Martisius

Anyway, here is the patch to fix string decoding:

diff -r -u ./org/postgresql/core/Encoding.java
/usr/src/postgresql-7.2.2fixed/src/interfaces/jdbc/org/postgresql/core/Encoding.java
--- ./org/postgresql/core/Encoding.java    2001-11-20 00:33:37.000000000 +0200
+++ /usr/src/postgresql-7.2.2fixed/src/interfaces/jdbc/org/postgresql/core/Encoding.java    2002-09-11
15:56:10.000000000+0200 
@@ -155,6 +155,9 @@
             }
             else
             {
+                if (encoding.equals("UTF-8")) {
+                    return decodeUTF8(encodedString, offset, length);
+                }
                 return new String(encodedString, offset, length, encoding);
             }
         }
@@ -163,6 +166,43 @@
             throw new PSQLException("postgresql.stream.encoding", e);
         }
     }
+    /**
+     * custom byte[] -> String conversion routine, 3x-10x faster then standard new String(byte[])
+      */
+    static final int pow2_6 = 64;        // 2^6
+    static final int pow2_12 = 4096;    // 2^12
+    static char cdata[] = new char[50];
+
+    public static final String decodeUTF8(byte data[], int offset, int length) {
+        if (cdata.length < (length-offset)) {
+            cdata = new char[length-offset];
+        }
+        int i = offset;
+        int j = 0;
+        int z, y, x, val;
+        while (i < length) {
+            z = data[i] & 0xFF;
+            if (z < 0x80) {
+                cdata[j++] = (char)data[i];
+                i++;
+            } else if (z >= 0xE0) {        // length == 3
+                y = data[i+1] & 0xFF;
+                x = data[i+2] & 0xFF;
+                val = (z-0xE0)*pow2_12 + (y-0x80)*pow2_6 + (x-0x80);
+                cdata[j++] = (char) val;
+                i+= 3;
+            } else {        // length == 2 (maybe add checking for length > 3, throw exception if it is
+                y = data[i+1] & 0xFF;
+                val = (z - 0xC0)* (pow2_6)+(y-0x80);
+                cdata[j++] = (char) val;
+                i+=2;
+            }
+        }
+
+        String s = new String(cdata, 0, j);
+        return s;
+    }
+

     /*
      * Decode an array of bytes into a string.

В списке pgsql-jdbc по дате отправления:

Предыдущее
От: Vernon Wu
Дата:
Сообщение: Does the JDBC driver support XADataSource interface?
Следующее
От: Dave Cramer
Дата:
Сообщение: Re: Speedup patch for getTables() and getIndexInfo()