Re: new String(byte[]) performance

Поиск
Список
Период
Сортировка
От Barry Lind
Тема Re: new String(byte[]) performance
Дата
Msg-id 3DB21CCD.1030108@xythos.com
обсуждение исходный текст
Ответ на new String(byte[]) performance  (Teofilis Martisius <teo@teohome.lzua.lt>)
Список pgsql-jdbc
Teofilis,

I have applied this patch.  I also made the change that so that when
connected to a 7.3 database this optimization will always be used.  This
is done by having the server do the character set encoding/decoding and
always using UTF-8 when dealing with the jdbc client.

thanks,
--Barry



Teofilis Martisius wrote:
> Hello,
>
> While looking through postgresql JDBC driver sources and profiling, I
> noticed that the driver uses new String(byte[]) a lot while iterating a
> ResultSet. And I noticed that this String constructor takes a lot of
> time. I wrote a custom byte[]->String conversion method for UTF-8 that
> speeds up iterating over ResultSet 2 times or even more. I have a patch
> for PostgreSQL JDBC drivers, but well, this is a workaround and I am not
> sure it gets accepted. It does speed things up quite a noticable amount.
>
> Hmm, maybe decodeUTF8() should be synchronized on cdata, or maybe cdata
> should be allocated for each call. static cdata version was faster.
>
> By the way. What should a JDBC driver do when f.e. ResultSet.getInt() is
> called for a VARCHAR field? I would suggest converting byte arrays to
> Strings or even to more precisely typed values (Integers, Doubles and so
> on) on QueryExecutor().execute(). This should save some RAM allocation
> for receiveTuple, because now memory gets allocated several times- once
> for byte[], and second time for String, and third time for Integer or
> other object in getObject(). Memory allocation takes a considerable
> amount of time. But this stronger typing would remove some of
> flexibility to any getXXX for any SQL type field. And it would probably
> make the querying itself (QueryExecutor.execute() slower, i don't know
> :/
>
> Teofilis Martisius
>
> Anyway, here is the patch to fix string decoding:
>
> diff -r -u ./org/postgresql/core/Encoding.java
/usr/src/postgresql-7.2.2fixed/src/interfaces/jdbc/org/postgresql/core/Encoding.java
> --- ./org/postgresql/core/Encoding.java    2001-11-20 00:33:37.000000000 +0200
> +++ /usr/src/postgresql-7.2.2fixed/src/interfaces/jdbc/org/postgresql/core/Encoding.java    2002-09-11
15:56:10.000000000+0200 
> @@ -155,6 +155,9 @@
>              }
>              else
>              {
> +                if (encoding.equals("UTF-8")) {
> +                    return decodeUTF8(encodedString, offset, length);
> +                }
>                  return new String(encodedString, offset, length, encoding);
>              }
>          }
> @@ -163,6 +166,43 @@
>              throw new PSQLException("postgresql.stream.encoding", e);
>          }
>      }
> +    /**
> +     * custom byte[] -> String conversion routine, 3x-10x faster then standard new String(byte[])
> +      */
> +    static final int pow2_6 = 64;        // 2^6
> +    static final int pow2_12 = 4096;    // 2^12
> +    static char cdata[] = new char[50];
> +
> +    public static final String decodeUTF8(byte data[], int offset, int length) {
> +        if (cdata.length < (length-offset)) {
> +            cdata = new char[length-offset];
> +        }
> +        int i = offset;
> +        int j = 0;
> +        int z, y, x, val;
> +        while (i < length) {
> +            z = data[i] & 0xFF;
> +            if (z < 0x80) {
> +                cdata[j++] = (char)data[i];
> +                i++;
> +            } else if (z >= 0xE0) {        // length == 3
> +                y = data[i+1] & 0xFF;
> +                x = data[i+2] & 0xFF;
> +                val = (z-0xE0)*pow2_12 + (y-0x80)*pow2_6 + (x-0x80);
> +                cdata[j++] = (char) val;
> +                i+= 3;
> +            } else {        // length == 2 (maybe add checking for length > 3, throw exception if it is
> +                y = data[i+1] & 0xFF;
> +                val = (z - 0xC0)* (pow2_6)+(y-0x80);
> +                cdata[j++] = (char) val;
> +                i+=2;
> +            }
> +        }
> +
> +        String s = new String(cdata, 0, j);
> +        return s;
> +    }
> +
>
>      /*
>       * Decode an array of bytes into a string.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>



В списке pgsql-jdbc по дате отправления:

Предыдущее
От: Jean-Christian Imbeault
Дата:
Сообщение: Re: null: was is the default returned value?
Следующее
От: Barry Lind
Дата:
Сообщение: Re: build.xml patch