Обсуждение: Performance improvement proposal. Removal of toLowerCase calls.

Поиск
Список
Период
Сортировка

Performance improvement proposal. Removal of toLowerCase calls.

От
Jeremy Whiting
Дата:
Hi,
 I would like to propose an optimization to improve performance in the
jdbc driver. The performance improvement has been tested on commodity
hardware using an industry standard Java benchmark. The overall
benchmark metric reports an improvement in performance. Profiling using
sampling showed calls reduced from 1100 to 0 when the benchmark workload
is running.

 The optimization will eliminate calls to the method
toLowerCase(java.util.Locale). This in the pg-jdbc
org.postgresql.jdbc2.AbstractJdbc2ResultSet.findColumnIndex(java.lang.String)
method when setting up the column index registry.

 For the optimization to be enabled I suggest relying on a new system
property. Making the existing functionality the default behaviour to
ensure existing applications do not break when the driver is upgraded.

 The change removes the call toLowerCase when putting items in the
registry [1].  Essentially what's being proposed is removing sanitizing
the key names. For best performance application code should pass SQL to
the driver with column names already folded to lower case. But upper
case names will still be matched in the second lookup [2] in the method.
 For this optimization to work this feature introduces a requirement on
applications. To use all lower or upper case column names.

 Are there other factors that might affect this optimization that need
to be taken into consideration ?

Regards,
Jeremy Whiting

Senior Software Engineer
Red Hat

[1]
https://github.com/pgjdbc/pgjdbc/blob/REL9_3_STABLE/org/postgresql/jdbc2/AbstractJdbc2ResultSet.java?#L2742
[2]
https://github.com/pgjdbc/pgjdbc/blob/REL9_3_STABLE/org/postgresql/jdbc2/AbstractJdbc2ResultSet.java?#L2752



Re: Performance improvement proposal. Removal of toLowerCase calls.

От
Dave Cramer
Дата:
Jeremy.

I am even wondering why we would be doing that and even worse using a locale to do it.

AFAICT, the db would return everything in lower case anyway, and if the column really had mixed case it should be returned with it's original case intact.

Dave

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


On Thu, Jan 16, 2014 at 10:44 AM, Jeremy Whiting <jwhiting@redhat.com> wrote:
Hi,
 I would like to propose an optimization to improve performance in the
jdbc driver. The performance improvement has been tested on commodity
hardware using an industry standard Java benchmark. The overall
benchmark metric reports an improvement in performance. Profiling using
sampling showed calls reduced from 1100 to 0 when the benchmark workload
is running.

 The optimization will eliminate calls to the method
toLowerCase(java.util.Locale). This in the pg-jdbc
org.postgresql.jdbc2.AbstractJdbc2ResultSet.findColumnIndex(java.lang.String)
method when setting up the column index registry.

 For the optimization to be enabled I suggest relying on a new system
property. Making the existing functionality the default behaviour to
ensure existing applications do not break when the driver is upgraded.

 The change removes the call toLowerCase when putting items in the
registry [1].  Essentially what's being proposed is removing sanitizing
the key names. For best performance application code should pass SQL to
the driver with column names already folded to lower case. But upper
case names will still be matched in the second lookup [2] in the method.
 For this optimization to work this feature introduces a requirement on
applications. To use all lower or upper case column names.

 Are there other factors that might affect this optimization that need
to be taken into consideration ?

Regards,
Jeremy Whiting

Senior Software Engineer
Red Hat

[1]
https://github.com/pgjdbc/pgjdbc/blob/REL9_3_STABLE/org/postgresql/jdbc2/AbstractJdbc2ResultSet.java?#L2742
[2]
https://github.com/pgjdbc/pgjdbc/blob/REL9_3_STABLE/org/postgresql/jdbc2/AbstractJdbc2ResultSet.java?#L2752



--
Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc

Re: Performance improvement proposal. Removal of toLowerCase calls.

От
Jeremy Whiting
Дата:
Hi Dave,

On 16/01/14 16:12, Dave Cramer wrote:
Jeremy.

I am even wondering why we would be doing that and even worse using a locale to do it.

 I suspect it's caters for when the request SQL is mixed case. To get a match from the db server the response SQL (as you say is lower case) you'd need to make sure the .put(...) on the registry is either all upper or lower.
AFAICT, the db would return everything in lower case anyway, and if the column really had mixed case it should be returned with it's original case intact.
 Shall I prepare a PR with a test case ?

Jeremy

Dave

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


On Thu, Jan 16, 2014 at 10:44 AM, Jeremy Whiting <jwhiting@redhat.com> wrote:
Hi,
 I would like to propose an optimization to improve performance in the
jdbc driver. The performance improvement has been tested on commodity
hardware using an industry standard Java benchmark. The overall
benchmark metric reports an improvement in performance. Profiling using
sampling showed calls reduced from 1100 to 0 when the benchmark workload
is running.

 The optimization will eliminate calls to the method
toLowerCase(java.util.Locale). This in the pg-jdbc
org.postgresql.jdbc2.AbstractJdbc2ResultSet.findColumnIndex(java.lang.String)
method when setting up the column index registry.

 For the optimization to be enabled I suggest relying on a new system
property. Making the existing functionality the default behaviour to
ensure existing applications do not break when the driver is upgraded.

 The change removes the call toLowerCase when putting items in the
registry [1].  Essentially what's being proposed is removing sanitizing
the key names. For best performance application code should pass SQL to
the driver with column names already folded to lower case. But upper
case names will still be matched in the second lookup [2] in the method.
 For this optimization to work this feature introduces a requirement on
applications. To use all lower or upper case column names.

 Are there other factors that might affect this optimization that need
to be taken into consideration ?

Regards,
Jeremy Whiting

Senior Software Engineer
Red Hat

[1]
https://github.com/pgjdbc/pgjdbc/blob/REL9_3_STABLE/org/postgresql/jdbc2/AbstractJdbc2ResultSet.java?#L2742
[2]
https://github.com/pgjdbc/pgjdbc/blob/REL9_3_STABLE/org/postgresql/jdbc2/AbstractJdbc2ResultSet.java?#L2752



--
Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc


Re: Performance improvement proposal. Removal of toLowerCase calls.

От
Dave Cramer
Дата:
Well postgresql folds to lower case by default so the case of the request SQL is irrelevant unless the column is double quoted or the column really does have mixed case.

Please do prepare  PR and test case

Thanks

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


On Thu, Jan 16, 2014 at 12:50 PM, Jeremy Whiting <jwhiting@redhat.com> wrote:
Hi Dave,


On 16/01/14 16:12, Dave Cramer wrote:
Jeremy.

I am even wondering why we would be doing that and even worse using a locale to do it.

 I suspect it's caters for when the request SQL is mixed case. To get a match from the db server the response SQL (as you say is lower case) you'd need to make sure the .put(...) on the registry is either all upper or lower.

AFAICT, the db would return everything in lower case anyway, and if the column really had mixed case it should be returned with it's original case intact.
 Shall I prepare a PR with a test case ?

Jeremy


Dave

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


On Thu, Jan 16, 2014 at 10:44 AM, Jeremy Whiting <jwhiting@redhat.com> wrote:
Hi,
 I would like to propose an optimization to improve performance in the
jdbc driver. The performance improvement has been tested on commodity
hardware using an industry standard Java benchmark. The overall
benchmark metric reports an improvement in performance. Profiling using
sampling showed calls reduced from 1100 to 0 when the benchmark workload
is running.

 The optimization will eliminate calls to the method
toLowerCase(java.util.Locale). This in the pg-jdbc
org.postgresql.jdbc2.AbstractJdbc2ResultSet.findColumnIndex(java.lang.String)
method when setting up the column index registry.

 For the optimization to be enabled I suggest relying on a new system
property. Making the existing functionality the default behaviour to
ensure existing applications do not break when the driver is upgraded.

 The change removes the call toLowerCase when putting items in the
registry [1].  Essentially what's being proposed is removing sanitizing
the key names. For best performance application code should pass SQL to
the driver with column names already folded to lower case. But upper
case names will still be matched in the second lookup [2] in the method.
 For this optimization to work this feature introduces a requirement on
applications. To use all lower or upper case column names.

 Are there other factors that might affect this optimization that need
to be taken into consideration ?

Regards,
Jeremy Whiting

Senior Software Engineer
Red Hat

[1]
https://github.com/pgjdbc/pgjdbc/blob/REL9_3_STABLE/org/postgresql/jdbc2/AbstractJdbc2ResultSet.java?#L2742
[2]
https://github.com/pgjdbc/pgjdbc/blob/REL9_3_STABLE/org/postgresql/jdbc2/AbstractJdbc2ResultSet.java?#L2752



--
Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc



Re: Performance improvement proposal. Removal of toLowerCase calls.

От
dmp
Дата:
Jeremy Whiting wrote:
> Hi,
>   I would like to propose an optimization to improve performance in the
> jdbc driver. The performance improvement has been tested on commodity
> hardware using an industry standard Java benchmark. The overall
> benchmark metric reports an improvement in performance. Profiling using
> sampling showed calls reduced from 1100 to 0 when the benchmark workload
> is running.
>
>   The optimization will eliminate calls to the method
> toLowerCase(java.util.Locale). This in the pg-jdbc
> org.postgresql.jdbc2.AbstractJdbc2ResultSet.findColumnIndex(java.lang.String)
> method when setting up the column index registry.
>
>   For the optimization to be enabled I suggest relying on a new system
> property. Making the existing functionality the default behaviour to
> ensure existing applications do not break when the driver is upgraded.
>
>   The change removes the call toLowerCase when putting items in the
> registry [1].  Essentially what's being proposed is removing sanitizing
> the key names. For best performance application code should pass SQL to
> the driver with column names already folded to lower case. But upper
> case names will still be matched in the second lookup [2] in the method.

****************
>   For this optimization to work this feature introduces a requirement on
> applications. To use all lower or upper case column names.
****************

This is going to break existing applications, if the requirement is to have
either upper or lowercase. I test explicitly with the MyJSQLView application
the use of mixed case column, key index names because there are applications
that used mixed case. The proper way to handle is quote to handle the mixed
case so it is store as such.

Perhaps the optimization could check for quoting then do no addition process
and store directly intact. Other wise the optimization could initiate as
proposed.

danap.

>
>   Are there other factors that might affect this optimization that need
> to be taken into consideration ?
>
> Regards,
> Jeremy Whiting
>
> Senior Software Engineer
> Red Hat
>
> [1]
> https://github.com/pgjdbc/pgjdbc/blob/REL9_3_STABLE/org/postgresql/jdbc2/AbstractJdbc2ResultSet.java?#L2742
> [2]
> https://github.com/pgjdbc/pgjdbc/blob/REL9_3_STABLE/org/postgresql/jdbc2/AbstractJdbc2ResultSet.java?#L2752
>



Re: Performance improvement proposal. Removal of toLowerCase calls.

От
Dave Cramer
Дата:


On Thu, Jan 16, 2014 at 2:21 PM, dmp <danap@ttc-cmc.net> wrote:
Jeremy Whiting wrote:
Hi,
  I would like to propose an optimization to improve performance in the
jdbc driver. The performance improvement has been tested on commodity
hardware using an industry standard Java benchmark. The overall
benchmark metric reports an improvement in performance. Profiling using
sampling showed calls reduced from 1100 to 0 when the benchmark workload
is running.

  The optimization will eliminate calls to the method
toLowerCase(java.util.Locale). This in the pg-jdbc
org.postgresql.jdbc2.AbstractJdbc2ResultSet.findColumnIndex(java.lang.String)
method when setting up the column index registry.

  For the optimization to be enabled I suggest relying on a new system
property. Making the existing functionality the default behaviour to
ensure existing applications do not break when the driver is upgraded.

  The change removes the call toLowerCase when putting items in the
registry [1].  Essentially what's being proposed is removing sanitizing
the key names. For best performance application code should pass SQL to
the driver with column names already folded to lower case. But upper
case names will still be matched in the second lookup [2] in the method.

****************

  For this optimization to work this feature introduces a requirement on
applications. To use all lower or upper case column names.
****************

This is going to break existing applications, if the requirement is to have
either upper or lowercase. I test explicitly with the MyJSQLView application
the use of mixed case column, key index names because there are applications
that used mixed case. The proper way to handle is quote to handle the mixed
case so it is store as such.

Perhaps the optimization could check for quoting then do no addition process
and store directly intact. Other wise the optimization could initiate as
proposed.

danap.



Are you sure ? This is in the resultset, so any column names should have come back from the db. 
Which means that they should come back in lower case anyway.

I would think the current code would break apps that having real mixed case columns in the database ?

Jeremy's proposal would leave the case alone and store it in the map with mixed case.

Re: Performance improvement proposal. Removal of toLowerCase calls.

От
Thomas Kellerer
Дата:
> Are you sure ? This is in the resultset, so any column names should have come back from the db.
> Which means that they should come back in lower case anyway.

   create table foo
   (
     "ID" integer,
     "Bar" varchar
   );

   select *
   from foo;

Should return ID and Bar as the column names.

Danap probably also referred to the column names that are returned through getTableColumns() or getIndexColumns() or
anyother API call that returns information about tables, columns or other identifiers. I do the same thing in SQL
Workbench/J:check the returned column names whether they are upper, lower or mixed case in order to know whether
generatedSQL statements (e.g. an update resulting from editing the data) need to quote the column names or not. 

Thomas


Re: Performance improvement proposal. Removal of toLowerCase calls.

От
Dave Cramer
Дата:


On Thu, Jan 16, 2014 at 2:47 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:
Are you sure ? This is in the resultset, so any column names should have come back from the db.
Which means that they should come back in lower case anyway.

  create table foo
  (
    "ID" integer,
    "Bar" varchar
  );

  select *
  from foo;

Should return ID and Bar as the column names.


Exactly! While I haven't actually tested the code. The lines that Jeremy is referring to would seem to suggest that  the return columns would be id and bar.

Does anyone have time to test this ?



Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca

Re: Performance improvement proposal. Removal of toLowerCase calls.

От
Thomas Kellerer
Дата:
Dave Cramer wrote on 16.01.2014 20:51:
>
>  Are you sure ? This is in the resultset, so any column names should have come back from the db.
>  Which means that they should come back in lower case anyway.
>
>        create table foo
>        (
>          "ID" integer,
>          "Bar" varchar
>        );
>
>        select *
>        from foo;
>
>     Should return ID and Bar as the column names.
>
>
> Exactly! While I haven't actually tested the code. The lines that Jeremy is referring to would seem to suggest that
>the return columns would be id and bar.
>
> Does anyone have time to test this ?

The current driver returns this correctly for a ResultSet.

Here is a "transcript" from running my application on the console
(easier to copy & paste than a screenshot)

SQL Workbench/J console interface started.
Enter 'exit' to quit.
Enter 'WbHelp' for a list of SQL Workbench/J specific commands
Config directory: C:\Projects\sqlworkbench\conf

SQL> wbconnect {Postgres}/WbTest;
Connection to "User=thomas, Schema=public, URL=jdbc:postgresql://localhost/wbtest" successful

thomas@wbtest/public> create table foo ("ID" integer, "Bar" varchar, "FoBaR" varchar);
Table 'foo' created
thomas@wbtest/public> insert into foo values (1, 'foo', 'bar');
INSERT executed successfully
1 row affected
thomas@wbtest/public> select * from foo;
ID | Bar | FoBaR
---+-----+------
  1 | foo | bar

(1 Row)
SELECT executed successfully
thomas@wbtest/public>

All the various API calls in DatabaseMetaData do return identifiers in the correct case as well.



Re: Performance improvement proposal. Removal of toLowerCase calls.

От
dmp
Дата:
Thomas Kellerer wrote:
> Dave Cramer wrote on 16.01.2014 20:51:
>>
>> Are you sure ? This is in the resultset, so any column names should
>> have come back from the db.
>> Which means that they should come back in lower case anyway.
>>
>> create table foo
>> (
>> "ID" integer,
>> "Bar" varchar
>> );
>>
>> select *
>> from foo;
>>
>> Should return ID and Bar as the column names.
>>
>>
>> Exactly! While I haven't actually tested the code. The lines that
>> Jeremy is referring to would seem to suggest that
>> the return columns would be id and bar.
>>
>> Does anyone have time to test this ?
>
> The current driver returns this correctly for a ResultSet.
>
> Here is a "transcript" from running my application on the console
> (easier to copy & paste than a screenshot)
>
> SQL Workbench/J console interface started.
> Enter 'exit' to quit.
> Enter 'WbHelp' for a list of SQL Workbench/J specific commands
> Config directory: C:\Projects\sqlworkbench\conf
>
> SQL> wbconnect {Postgres}/WbTest;
> Connection to "User=thomas, Schema=public,
> URL=jdbc:postgresql://localhost/wbtest" successful
>
> thomas@wbtest/public> create table foo ("ID" integer, "Bar" varchar,
> "FoBaR" varchar);
> Table 'foo' created
> thomas@wbtest/public> insert into foo values (1, 'foo', 'bar');
> INSERT executed successfully
> 1 row affected
> thomas@wbtest/public> select * from foo;
> ID | Bar | FoBaR
> ---+-----+------
> 1 | foo | bar
>
> (1 Row)
> SELECT executed successfully
> thomas@wbtest/public>
>
> All the various API calls in DatabaseMetaData do return identifiers in
> the correct case as well.
>

This is also the type of results that I have observed in the past in testing.
The tableMetaData.getColumnName(i) will also return back the correct mixed
case result. Ex. from table below.

1 Host
2 Db
3 Username
4 select_priv

Not only does this apply to columns but also to table names that are quoted
Normally I use the following table to test such occurrences in my code.


DROP TABLE IF EXISTS "keY_tAble2";
CREATE TABLE "keY_tAble2" (
   "Host" char(60) NOT NULL default '',
   "Db" char(64) NOT NULL default '',
   "Username" char(16) NOT NULL default '',
   Select_priv boolean NOT NULL default TRUE,
   PRIMARY KEY ("Host","Db","Username")
);

danap.




Re: Performance improvement proposal. Removal of toLowerCase calls.

От
Jeremy Whiting
Дата:
Hi Dave and danap,


On 16/01/14 19:33, Dave Cramer wrote:


On Thu, Jan 16, 2014 at 2:21 PM, dmp <danap@ttc-cmc.net> wrote:
Jeremy Whiting wrote:
Hi,
  I would like to propose an optimization to improve performance in the
jdbc driver. The performance improvement has been tested on commodity
hardware using an industry standard Java benchmark. The overall
benchmark metric reports an improvement in performance. Profiling using
sampling showed calls reduced from 1100 to 0 when the benchmark workload
is running.

  The optimization will eliminate calls to the method
toLowerCase(java.util.Locale). This in the pg-jdbc
org.postgresql.jdbc2.AbstractJdbc2ResultSet.findColumnIndex(java.lang.String)
method when setting up the column index registry.

  For the optimization to be enabled I suggest relying on a new system
property. Making the existing functionality the default behaviour to
ensure existing applications do not break when the driver is upgraded.

  The change removes the call toLowerCase when putting items in the
registry [1].  Essentially what's being proposed is removing sanitizing
the key names. For best performance application code should pass SQL to
the driver with column names already folded to lower case.
 I have found this to be incorrect. The column registry is built using fields/columns that have come back from the db.
But upper
case names will still be matched in the second lookup [2] in the method.

****************

  For this optimization to work this feature introduces a requirement on
applications. To use all lower or upper case column names.
****************

This is going to break existing applications, if the requirement is to have
either upper or lowercase.
 Yes it's incompatible. Mixed case situations should use the existing behaviour in 9.3-1100 which is proposed to be the default.
I test explicitly with the MyJSQLView application
the use of mixed case column, key index names because there are applications
that used mixed case. The proper way to handle is quote to handle the mixed
case so it is store as such.

Perhaps the optimization could check for quoting then do no addition process
and store directly intact. Other wise the optimization could initiate as
proposed.

danap.



Are you sure ? This is in the resultset, so any column names should have come back from the db. 
Which means that they should come back in lower case anyway.

I would think the current code would break apps that having real mixed case columns in the database ?

 Column names coming back from the database are sanitized. Before putting into the column registry.
Jeremy's proposal would leave the case alone and store it in the map with mixed case.

 The proposal has a hard requirement of either all upper or lower case columns defined in the database.

 If a user attempts to enable the optimization with an application that uses mixed case the driver does throw an exception of type org.postgresql.util.PSQLException. With this message "The column name <column name> was not found in this ResultSet.".

Jeremy


-- 
Jeremy Whiting
Senior Software Engineer, Performance Team
Red Hat

------------------------------------------------------------
Registered Address: Red Hat UK Ltd, 64 Baker Street, 4th Floor, London. W1U 7DF. United Kingdom.
Registered in England and Wales under Company Registration No. 03798903. Directors: Michael Cunningham (USA), Charlie Peters (USA), Matt Parson (USA), Paul Hickey (Ireland) 

Re: Performance improvement proposal. Removal of toLowerCase calls.

От
Dave Cramer
Дата:
Jeremy,

At this point send a PR and we can test it. If you are proposing having an optimization switch which defaults to off I can't see why I would not merge it.

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


On Fri, Jan 17, 2014 at 11:48 AM, Jeremy Whiting <jwhiting@redhat.com> wrote:
Hi Dave and danap,



On 16/01/14 19:33, Dave Cramer wrote:


On Thu, Jan 16, 2014 at 2:21 PM, dmp <danap@ttc-cmc.net> wrote:
Jeremy Whiting wrote:
Hi,
  I would like to propose an optimization to improve performance in the
jdbc driver. The performance improvement has been tested on commodity
hardware using an industry standard Java benchmark. The overall
benchmark metric reports an improvement in performance. Profiling using
sampling showed calls reduced from 1100 to 0 when the benchmark workload
is running.

  The optimization will eliminate calls to the method
toLowerCase(java.util.Locale). This in the pg-jdbc
org.postgresql.jdbc2.AbstractJdbc2ResultSet.findColumnIndex(java.lang.String)
method when setting up the column index registry.

  For the optimization to be enabled I suggest relying on a new system
property. Making the existing functionality the default behaviour to
ensure existing applications do not break when the driver is upgraded.

  The change removes the call toLowerCase when putting items in the
registry [1].  Essentially what's being proposed is removing sanitizing
the key names. For best performance application code should pass SQL to
the driver with column names already folded to lower case.
 I have found this to be incorrect. The column registry is built using fields/columns that have come back from the db.

But upper
case names will still be matched in the second lookup [2] in the method.

****************

  For this optimization to work this feature introduces a requirement on
applications. To use all lower or upper case column names.
****************

This is going to break existing applications, if the requirement is to have
either upper or lowercase.
 Yes it's incompatible. Mixed case situations should use the existing behaviour in 9.3-1100 which is proposed to be the default.

I test explicitly with the MyJSQLView application
the use of mixed case column, key index names because there are applications
that used mixed case. The proper way to handle is quote to handle the mixed
case so it is store as such.

Perhaps the optimization could check for quoting then do no addition process
and store directly intact. Other wise the optimization could initiate as
proposed.

danap.



Are you sure ? This is in the resultset, so any column names should have come back from the db. 
Which means that they should come back in lower case anyway.

I would think the current code would break apps that having real mixed case columns in the database ?

 Column names coming back from the database are sanitized. Before putting into the column registry.

Jeremy's proposal would leave the case alone and store it in the map with mixed case.

 The proposal has a hard requirement of either all upper or lower case columns defined in the database.

 If a user attempts to enable the optimization with an application that uses mixed case the driver does throw an exception of type org.postgresql.util.PSQLException. With this message "The column name <column name> was not found in this ResultSet.".

Jeremy
-- 
Jeremy Whiting
Senior Software Engineer, Performance Team
Red Hat

------------------------------------------------------------
Registered Address: Red Hat UK Ltd, 64 Baker Street, 4th Floor, London. W1U 7DF. United Kingdom.
Registered in England and Wales under Company Registration No. 03798903. Directors: Michael Cunningham (USA), Charlie Peters (USA), Matt Parson (USA), Paul Hickey (Ireland) 

Re: Performance improvement proposal. Removal of toLowerCase calls. PR

От
Jeremy Whiting
Дата:
Hi Dave,
 A pull request has been created. See [1].

 In the commit are:

 Changes to the Driver and DataSource configuration. To recognise the new property called "disableColumnSanitiser". The default value for this property is false. Comments to rename the property are welcome.

3 additional test case classes added to the jdbc2 test suite. Each class groups tests for:

a) ColumnSanitiserEnabledTest. Tests to check existing and default behaviour still works as expected.
b) ColumnSanitiserDisabledTest. Tests to check new behaviour works as expected.
c) CaseOptimiserDataSourceTest. Checking a datasource configured with the sanitiser disabled will recognise the property configuration.

 I've attempted to conform to the coding guidelines where possible. The test-suite passes for 1.7 and 1.6 spec. 1.5 fails due to code in master [1] branch.

Regards,
Jeremy

[1] https://github.com/pgjdbc/pgjdbc/pull/113
[2] http://www.postgresql.org/message-id/52DFA5D0.7000901@redhat.com

On 17/01/14 16:55, Dave Cramer wrote:
Jeremy,

At this point send a PR and we can test it. If you are proposing having an optimization switch which defaults to off I can't see why I would not merge it.

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


On Fri, Jan 17, 2014 at 11:48 AM, Jeremy Whiting <jwhiting@redhat.com> wrote:
Hi Dave and danap,



On 16/01/14 19:33, Dave Cramer wrote:


On Thu, Jan 16, 2014 at 2:21 PM, dmp <danap@ttc-cmc.net> wrote:
Jeremy Whiting wrote:
Hi,
  I would like to propose an optimization to improve performance in the
jdbc driver. The performance improvement has been tested on commodity
hardware using an industry standard Java benchmark. The overall
benchmark metric reports an improvement in performance. Profiling using
sampling showed calls reduced from 1100 to 0 when the benchmark workload
is running.

  The optimization will eliminate calls to the method
toLowerCase(java.util.Locale). This in the pg-jdbc
org.postgresql.jdbc2.AbstractJdbc2ResultSet.findColumnIndex(java.lang.String)
method when setting up the column index registry.

  For the optimization to be enabled I suggest relying on a new system
property. Making the existing functionality the default behaviour to
ensure existing applications do not break when the driver is upgraded.

  The change removes the call toLowerCase when putting items in the
registry [1].  Essentially what's being proposed is removing sanitizing
the key names. For best performance application code should pass SQL to
the driver with column names already folded to lower case.
 I have found this to be incorrect. The column registry is built using fields/columns that have come back from the db.

But upper
case names will still be matched in the second lookup [2] in the method.

****************

  For this optimization to work this feature introduces a requirement on
applications. To use all lower or upper case column names.
****************

This is going to break existing applications, if the requirement is to have
either upper or lowercase.
 Yes it's incompatible. Mixed case situations should use the existing behaviour in 9.3-1100 which is proposed to be the default.

I test explicitly with the MyJSQLView application
the use of mixed case column, key index names because there are applications
that used mixed case. The proper way to handle is quote to handle the mixed
case so it is store as such.

Perhaps the optimization could check for quoting then do no addition process
and store directly intact. Other wise the optimization could initiate as
proposed.

danap.



Are you sure ? This is in the resultset, so any column names should have come back from the db. 
Which means that they should come back in lower case anyway.

I would think the current code would break apps that having real mixed case columns in the database ?

 Column names coming back from the database are sanitized. Before putting into the column registry.

Jeremy's proposal would leave the case alone and store it in the map with mixed case.

 The proposal has a hard requirement of either all upper or lower case columns defined in the database.

 If a user attempts to enable the optimization with an application that uses mixed case the driver does throw an exception of type org.postgresql.util.PSQLException. With this message "The column name <column name> was not found in this ResultSet.".

Jeremy
-- 
Jeremy Whiting
Senior Software Engineer, Performance Team
Red Hat

------------------------------------------------------------
Registered Address: Red Hat UK Ltd, 64 Baker Street, 4th Floor, London. W1U 7DF. United Kingdom.
Registered in England and Wales under Company Registration No. 03798903. Directors: Michael Cunningham (USA), Charlie Peters (USA), Matt Parson (USA), Paul Hickey (Ireland) 


Re: Performance improvement proposal. Removal of toLowerCase calls. PR

От
Dave Cramer
Дата:
Hi Jeremy,

I saw the PR. Thanks

What tests fail in 1.5 ?

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


On Wed, Jan 22, 2014 at 2:11 PM, Jeremy Whiting <jwhiting@redhat.com> wrote:
Hi Dave,
 A pull request has been created. See [1].

 In the commit are:

 Changes to the Driver and DataSource configuration. To recognise the new property called "disableColumnSanitiser". The default value for this property is false. Comments to rename the property are welcome.

3 additional test case classes added to the jdbc2 test suite. Each class groups tests for:

a) ColumnSanitiserEnabledTest. Tests to check existing and default behaviour still works as expected.
b) ColumnSanitiserDisabledTest. Tests to check new behaviour works as expected.
c) CaseOptimiserDataSourceTest. Checking a datasource configured with the sanitiser disabled will recognise the property configuration.

 I've attempted to conform to the coding guidelines where possible. The test-suite passes for 1.7 and 1.6 spec. 1.5 fails due to code in master [1] branch.

Regards,
Jeremy

[1] https://github.com/pgjdbc/pgjdbc/pull/113
[2] http://www.postgresql.org/message-id/52DFA5D0.7000901@redhat.com

On 17/01/14 16:55, Dave Cramer wrote:
Jeremy,

At this point send a PR and we can test it. If you are proposing having an optimization switch which defaults to off I can't see why I would not merge it.

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


On Fri, Jan 17, 2014 at 11:48 AM, Jeremy Whiting <jwhiting@redhat.com> wrote:
Hi Dave and danap,



On 16/01/14 19:33, Dave Cramer wrote:


On Thu, Jan 16, 2014 at 2:21 PM, dmp <danap@ttc-cmc.net> wrote:
Jeremy Whiting wrote:
Hi,
  I would like to propose an optimization to improve performance in the
jdbc driver. The performance improvement has been tested on commodity
hardware using an industry standard Java benchmark. The overall
benchmark metric reports an improvement in performance. Profiling using
sampling showed calls reduced from 1100 to 0 when the benchmark workload
is running.

  The optimization will eliminate calls to the method
toLowerCase(java.util.Locale). This in the pg-jdbc
org.postgresql.jdbc2.AbstractJdbc2ResultSet.findColumnIndex(java.lang.String)
method when setting up the column index registry.

  For the optimization to be enabled I suggest relying on a new system
property. Making the existing functionality the default behaviour to
ensure existing applications do not break when the driver is upgraded.

  The change removes the call toLowerCase when putting items in the
registry [1].  Essentially what's being proposed is removing sanitizing
the key names. For best performance application code should pass SQL to
the driver with column names already folded to lower case.
 I have found this to be incorrect. The column registry is built using fields/columns that have come back from the db.

But upper
case names will still be matched in the second lookup [2] in the method.

****************

  For this optimization to work this feature introduces a requirement on
applications. To use all lower or upper case column names.
****************

This is going to break existing applications, if the requirement is to have
either upper or lowercase.
 Yes it's incompatible. Mixed case situations should use the existing behaviour in 9.3-1100 which is proposed to be the default.

I test explicitly with the MyJSQLView application
the use of mixed case column, key index names because there are applications
that used mixed case. The proper way to handle is quote to handle the mixed
case so it is store as such.

Perhaps the optimization could check for quoting then do no addition process
and store directly intact. Other wise the optimization could initiate as
proposed.

danap.



Are you sure ? This is in the resultset, so any column names should have come back from the db. 
Which means that they should come back in lower case anyway.

I would think the current code would break apps that having real mixed case columns in the database ?

 Column names coming back from the database are sanitized. Before putting into the column registry.

Jeremy's proposal would leave the case alone and store it in the map with mixed case.

 The proposal has a hard requirement of either all upper or lower case columns defined in the database.

 If a user attempts to enable the optimization with an application that uses mixed case the driver does throw an exception of type org.postgresql.util.PSQLException. With this message "The column name <column name> was not found in this ResultSet.".

Jeremy
-- 
Jeremy Whiting
Senior Software Engineer, Performance Team
Red Hat

------------------------------------------------------------
Registered Address: Red Hat UK Ltd, 64 Baker Street, 4th Floor, London. W1U 7DF. United Kingdom.
Registered in England and Wales under Company Registration No. 03798903. Directors: Michael Cunningham (USA), Charlie Peters (USA), Matt Parson (USA), Paul Hickey (Ireland) 



Re: Performance improvement proposal. Removal of toLowerCase calls. PR

От
Jeremy Whiting
Дата:
Hi Dave,
 It's not tests that fail but compilation errors.

 The compile phase output is attached to an email in a new thread I started earlier. Subject is "Master branch compile error using 1.5 spec.".

Jeremy

On 22/01/14 19:38, Dave Cramer wrote:
Hi Jeremy,

I saw the PR. Thanks

What tests fail in 1.5 ?

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


On Wed, Jan 22, 2014 at 2:11 PM, Jeremy Whiting <jwhiting@redhat.com> wrote:
Hi Dave,
 A pull request has been created. See [1].

 In the commit are:

 Changes to the Driver and DataSource configuration. To recognise the new property called "disableColumnSanitiser". The default value for this property is false. Comments to rename the property are welcome.

3 additional test case classes added to the jdbc2 test suite. Each class groups tests for:

a) ColumnSanitiserEnabledTest. Tests to check existing and default behaviour still works as expected.
b) ColumnSanitiserDisabledTest. Tests to check new behaviour works as expected.
c) CaseOptimiserDataSourceTest. Checking a datasource configured with the sanitiser disabled will recognise the property configuration.

 I've attempted to conform to the coding guidelines where possible. The test-suite passes for 1.7 and 1.6 spec. 1.5 fails due to code in master [1] branch.

Regards,
Jeremy

[1] https://github.com/pgjdbc/pgjdbc/pull/113
[2] http://www.postgresql.org/message-id/52DFA5D0.7000901@redhat.com

On 17/01/14 16:55, Dave Cramer wrote:
Jeremy,

At this point send a PR and we can test it. If you are proposing having an optimization switch which defaults to off I can't see why I would not merge it.

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


On Fri, Jan 17, 2014 at 11:48 AM, Jeremy Whiting <jwhiting@redhat.com> wrote:
Hi Dave and danap,



On 16/01/14 19:33, Dave Cramer wrote:


On Thu, Jan 16, 2014 at 2:21 PM, dmp <danap@ttc-cmc.net> wrote:
Jeremy Whiting wrote:
Hi,
  I would like to propose an optimization to improve performance in the
jdbc driver. The performance improvement has been tested on commodity
hardware using an industry standard Java benchmark. The overall
benchmark metric reports an improvement in performance. Profiling using
sampling showed calls reduced from 1100 to 0 when the benchmark workload
is running.

  The optimization will eliminate calls to the method
toLowerCase(java.util.Locale). This in the pg-jdbc
org.postgresql.jdbc2.AbstractJdbc2ResultSet.findColumnIndex(java.lang.String)
method when setting up the column index registry.

  For the optimization to be enabled I suggest relying on a new system
property. Making the existing functionality the default behaviour to
ensure existing applications do not break when the driver is upgraded.

  The change removes the call toLowerCase when putting items in the
registry [1].  Essentially what's being proposed is removing sanitizing
the key names. For best performance application code should pass SQL to
the driver with column names already folded to lower case.
 I have found this to be incorrect. The column registry is built using fields/columns that have come back from the db.

But upper
case names will still be matched in the second lookup [2] in the method.

****************

  For this optimization to work this feature introduces a requirement on
applications. To use all lower or upper case column names.
****************

This is going to break existing applications, if the requirement is to have
either upper or lowercase.
 Yes it's incompatible. Mixed case situations should use the existing behaviour in 9.3-1100 which is proposed to be the default.

I test explicitly with the MyJSQLView application
the use of mixed case column, key index names because there are applications
that used mixed case. The proper way to handle is quote to handle the mixed
case so it is store as such.

Perhaps the optimization could check for quoting then do no addition process
and store directly intact. Other wise the optimization could initiate as
proposed.

danap.



Are you sure ? This is in the resultset, so any column names should have come back from the db. 
Which means that they should come back in lower case anyway.

I would think the current code would break apps that having real mixed case columns in the database ?

 Column names coming back from the database are sanitized. Before putting into the column registry.

Jeremy's proposal would leave the case alone and store it in the map with mixed case.

 The proposal has a hard requirement of either all upper or lower case columns defined in the database.

 If a user attempts to enable the optimization with an application that uses mixed case the driver does throw an exception of type org.postgresql.util.PSQLException. With this message "The column name <column name> was not found in this ResultSet.".

Jeremy
-- 
Jeremy Whiting
Senior Software Engineer, Performance Team
Red Hat

------------------------------------------------------------
Registered Address: Red Hat UK Ltd, 64 Baker Street, 4th Floor, London. W1U 7DF. United Kingdom.
Registered in England and Wales under Company Registration No. 03798903. Directors: Michael Cunningham (USA), Charlie Peters (USA), Matt Parson (USA), Paul Hickey (Ireland) 




Re: Performance improvement proposal. Removal of toLowerCase calls. PR

От
Jeremy Whiting
Дата:
Dave,
 Having merged in your change on master the compilation error is fixed. The testsuite is passing for the three different jdk specs.

 For those interested to know the matching with combinations of case possible take a look at the test cases covering the proposed behaviour.

https://github.com/whitingjr/pgjdbc/blob/remove-toLowerCase-calls/org/postgresql/test/jdbc2/ColumnSanitiserDisabledTest.java?#L66

Jeremy

On 22/01/14 21:08, Jeremy Whiting wrote:
Hi Dave,
 It's not tests that fail but compilation errors.

 The compile phase output is attached to an email in a new thread I started earlier. Subject is "Master branch compile error using 1.5 spec.".

Jeremy

On 22/01/14 19:38, Dave Cramer wrote:
Hi Jeremy,

I saw the PR. Thanks

What tests fail in 1.5 ?

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


Re: Performance improvement proposal. Removal of toLowerCase calls.

От
Jeremy Whiting
Дата:

On 16/01/14 20:33, dmp wrote:
> Thomas Kellerer wrote:
>> Dave Cramer wrote on 16.01.2014 20:51:
>>>
>>> Are you sure ? This is in the resultset, so any column names should
>>> have come back from the db.
>>> Which means that they should come back in lower case anyway.
>>>
>>> create table foo
>>> (
>>> "ID" integer,
>>> "Bar" varchar
>>> );
>>>
>>> select *
>>> from foo;
>>>
>>> Should return ID and Bar as the column names.
>>>
>>>
>>> Exactly! While I haven't actually tested the code. The lines that
>>> Jeremy is referring to would seem to suggest that
>>> the return columns would be id and bar.
>>>
>>> Does anyone have time to test this ?
>>
>> The current driver returns this correctly for a ResultSet.
>>
>> Here is a "transcript" from running my application on the console
>> (easier to copy & paste than a screenshot)
>>
>> SQL Workbench/J console interface started.
>> Enter 'exit' to quit.
>> Enter 'WbHelp' for a list of SQL Workbench/J specific commands
>> Config directory: C:\Projects\sqlworkbench\conf
>>
>> SQL> wbconnect {Postgres}/WbTest;
>> Connection to "User=thomas, Schema=public,
>> URL=jdbc:postgresql://localhost/wbtest" successful
>>
>> thomas@wbtest/public> create table foo ("ID" integer, "Bar" varchar,
>> "FoBaR" varchar);
>> Table 'foo' created
>> thomas@wbtest/public> insert into foo values (1, 'foo', 'bar');
>> INSERT executed successfully
>> 1 row affected
>> thomas@wbtest/public> select * from foo;
>> ID | Bar | FoBaR
>> ---+-----+------
>> 1 | foo | bar
>>
>> (1 Row)
>> SELECT executed successfully
>> thomas@wbtest/public>
>>
>> All the various API calls in DatabaseMetaData do return identifiers in
>> the correct case as well.
>>
>
> This is also the type of results that I have observed in the past in
> testing.
> The tableMetaData.getColumnName(i) will also return back the correct
> mixed
> case result. Ex. from table below.
>
> 1 Host
> 2 Db
> 3 Username
> 4 select_priv
>
> Not only does this apply to columns but also to table names that are
> quoted
> Normally I use the following table to test such occurrences in my code.
>
>
> DROP TABLE IF EXISTS "keY_tAble2";
> CREATE TABLE "keY_tAble2" (
>   "Host" char(60) NOT NULL default '',
>   "Db" char(64) NOT NULL default '',
>   "Username" char(16) NOT NULL default '',
>   Select_priv boolean NOT NULL default TRUE,
>   PRIMARY KEY ("Host","Db","Username")
> );
>
> danap.
>
 My testing does confirm the findings by danap and Thomas too. Names are
correctly returned mixed.

 My proposal is for situations where the known case is already all upper
or lower. For situations other than that the feature can be ignored. The
current driver behaviour will be the default to prevent applications
breaking.

 I don't have as much oversight on all possible use cases as others in
the community. I may have been overlooked something. What other concerns
are there if any ?

Jeremy