Обсуждение: Very strange performance decrease when reusing a PreparedStatement

Поиск
Список
Период
Сортировка

Very strange performance decrease when reusing a PreparedStatement

От
Frédérik Bilhaut
Дата:
Hi everybody,

I am experiencing a *very* strange problem when querying Postgres
through JDBC using PreparedStatements.

To say it short, for the same SELECT query :

- when reusing a single PreparedStatement the average response time
per query is 60 milliseconds

- when creating (and closing) a new PreparedStatement each time, the
average response time drops to only 2 milliseconds !

This seems unbelievable, but I cross-checked with several people, and
this is truly what happens. Maybe some cleaning or synchronizing is
done before executing again the same statement ? Has anybody
experienced this ?

This appears on two different 8.x versions of postgresql, on Mac and
Linux. The client runs under Mac/Java 5.

Here are a little bit more details :

At first I used to create a single prepared statement once in my
constructor :

this.stmt = getConnection().prepareStatement("SELECT resource.uri FROM
literal,resource WHERE resource.id=literal.id_subject AND
literal.id_graph=? AND literal.id_predicate=? AND literal.lexical_form
LIKE ?");

And then, for each new query :

this.stmt.setLong(1, graphID);
this.stmt.setLong(2, relationID);
this.stmt.setString(3, litteralValue);
ResultSet rs = this.stmt.executeQuery();
...
rs.close();


This gave me very poor performance (about 60ms/query). Just for
testing I added the following lines, and I got my 2 milliseconds per
query (you can double check that the statement creation is exactly the
same) :

if(this.stmt != null)
    this.stmt.close();
this.stmt = getConnection().prepareStatement("SELECT resource.uri FROM
literal,resource WHERE resource.id=literal.id_subject AND
literal.id_graph=? AND literal.id_predicate=? AND literal.lexical_form
LIKE ?");


Many thanks for your help.

Best regards,
--
Frédérik Bilhaut


Re: Very strange performance decrease when reusing a PreparedStatement

От
Oliver Jowett
Дата:
Frédérik Bilhaut wrote:
> Hi everybody,
>
> I am experiencing a *very* strange problem when querying Postgres
> through JDBC using PreparedStatements.
>
> To say it short, for the same SELECT query :
>
> - when reusing a single PreparedStatement the average response time per
> query is 60 milliseconds
>
> - when creating (and closing) a new PreparedStatement each time, the
> average response time drops to only 2 milliseconds !

Try with prepareThreshold=0. Probably, your particular query benefits
from re-planning each time with the particular concrete parameter values
for each execution.

(you can either specify this as a URL parameter, or tweak it on a
per-connection or per-statement basis via methods on
PGConnection/PGStatement)

-O

Re: Very strange performance decrease when reusing a PreparedStatement

От
Oliver Jowett
Дата:
Frédérik Bilhaut wrote:
> Le 28 avr. 09 à 17:21, Oliver Jowett a écrit :
>> Try with prepareThreshold=0. Probably, your particular query benefits
>> from re-planning each time with the particular concrete parameter values
>> for each execution.
>>
>> (you can either specify this as a URL parameter, or tweak it on a
>> per-connection or per-statement basis via methods on
>> PGConnection/PGStatement)
>
>
> Thank you Oliver for this answer.
>
> Your hypothesis seems plausible to me, because we rely strongly on
> indexes to improve performance, and it appears that the query duration
> of a reused satement is the that the same query without index. On the
> other hand, "explain analyze" tells that the index is correctly used
> each time, but it also reports a very short total time in any case, so
> who knows what happens...

You need to be careful that you are testing the right thing. The JDBC
driver only switches over to using named server-side prepared statements
after the Java-side PreparedStatement object has been reused a number of
times (controlled by prepareThreshold). If you run an EXPLAIN or EXPLAIN
ANALYZE via JDBC and reuse the statement a number of times, I'd expect
you to see the reported plan change after a few uses.

You can trigger a similar thing to what the JDBC driver does by using
PREPARE + EXPLAIN ANALYZE EXECUTE via psql etc, but it's not exactly the
same thing.

> Anyway, I will try the option you suggest asap (although re-creating
> statements each time does not seem to be so harmful...).
>
> But there is maybe something somewhere in the driver (or pg iteself ?)
> that may have to be fixed or at least documented ? Every JDBC
> optimisation tutorial will mention the fact that preparing and reusing
> statements can improve the performances of recurrent query, but my
> example proves that in some situations, the performances can be
> dramatically worsen (around 20 times slower in my case) !

Well, it's specific to the query. The underlying problem is that to get
the benefits of preserving a prepared statement (i.e. don't reparse and
replan every time), the server has to use a more generic plan that will
work for any parameter value, rather than a plan that can be customized
to the particular values used in one execution. Some (most?) queries
will benefit, but some simple queries such as yours can go slower.
There's no way for the driver to really know; it just makes the guess
that applications that go to the trouble of reusing a PreparedStatement
are probably running queries that are expensive to plan. The
prepareThreshold knob is there for the cases where that guess is wrong.

-O

Re: Very strange performance decrease when reusing a PreparedStatement

От
Frédérik Bilhaut
Дата:
Le 28 avr. 09 à 17:21, Oliver Jowett a écrit :
> Try with prepareThreshold=0. Probably, your particular query benefits
> from re-planning each time with the particular concrete parameter
> values
> for each execution.
>
> (you can either specify this as a URL parameter, or tweak it on a
> per-connection or per-statement basis via methods on
> PGConnection/PGStatement)


Thank you Oliver for this answer.

Your hypothesis seems plausible to me, because we rely strongly on
indexes to improve performance, and it appears that the query duration
of a reused satement is the that the same query without index. On the
other hand, "explain analyze" tells that the index is correctly used
each time, but it also reports a very short total time in any case, so
who knows what happens...

Anyway, I will try the option you suggest asap (although re-creating
statements each time does not seem to be so harmful...).

But there is maybe something somewhere in the driver (or pg iteself ?)
that may have to be fixed or at least documented ? Every JDBC
optimisation tutorial will mention the fact that preparing and reusing
statements can improve the performances of recurrent query, but my
example proves that in some situations, the performances can be
dramatically worsen (around 20 times slower in my case) !


Best regards,
--
Frédérik Bilhaut
NOOPSIS











Re: Very strange performance decrease when reusing a PreparedStatement

От
Dave Cramer
Дата:


Make sure you use a prepared statement in psql to do the explain analyze.

explain analyze select ....

is not the same as

prepare foo as select ...
explain analyze execute ...

Dave











--
Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc

Re: Very strange performance decrease when reusing a PreparedStatement

От
Péter Kovács
Дата:
2009/4/29 Frédérik Bilhaut <frederik.bilhaut@noopsis.fr>:
> Le 28 avr. 09 à 17:21, Oliver Jowett a écrit :
>>
>> Try with prepareThreshold=0. Probably, your particular query benefits
>> from re-planning each time with the particular concrete parameter values
>> for each execution.
>>
>> (you can either specify this as a URL parameter, or tweak it on a
>> per-connection or per-statement basis via methods on
>> PGConnection/PGStatement)
>
>
> Thank you Oliver for this answer.
>
> Your hypothesis seems plausible to me, because we rely strongly on indexes
> to improve performance, and it appears that the query duration of a reused
> satement is the that the same query without index. On the other hand,
> "explain analyze" tells that the index is correctly used each time, but it
> also reports a very short total time in any case, so who knows what
> happens...
>
> Anyway, I will try the option you suggest asap (although re-creating
> statements each time does not seem to be so harmful...).
>
> But there is maybe something somewhere in the driver (or pg iteself ?) that
> may have to be fixed or at least documented ? Every JDBC optimisation
> tutorial will mention the fact that preparing and reusing statements can
> improve the performances of recurrent query, but my example proves that in
> some situations, the performances can be dramatically worsen (around 20
> times slower in my case) !
>

It appears that the Postgres "server-prepared statement" cannot handle
parameters to the statement. This is really unfortunate, because 99%
of real-life applications will want to re-use the same statement
(template) with different parameters.

The term "server-prepared statement" itself already indicates that
there may be something skewed about the "local" semantics of
java.sql.PreparedStatements in the Postgres JDBC driver. There is no
notion of "client-prepared statement" in the JDBC API, which conceives
PreparedStatement instances as mere handles to server side objects.
And indeed, Postgres JDBC users have historically been using
java.sql.PreparedStatements for its side-effect of preventing SQL
injection rather than for the purpose the JDBC API designers had in
mind with this class.

Peter

>
> Best regards,
> --
> Frédérik Bilhaut
> NOOPSIS
>
>
>
>
>
>
>
>
>
>
>
> --
> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-jdbc
>

Re: Very strange performance decrease when reusing a PreparedStatement

От
Dave Cramer
Дата:

It appears that the Postgres "server-prepared statement" cannot handle
parameters to the statement. This is really unfortunate, because 99%
of real-life applications will want to re-use the same statement
(template) with different parameters.

The term "server-prepared statement" itself already indicates that
there may be something skewed about the "local" semantics of
java.sql.PreparedStatements in the Postgres JDBC driver. There is no
notion of "client-prepared statement" in the JDBC API, which conceives
PreparedStatement instances as mere handles to server side objects.
And indeed, Postgres JDBC users have historically been using
java.sql.PreparedStatements for its side-effect of preventing SQL
injection rather than for the purpose the JDBC API designers had in
mind with this class.

I'm not sure where this hypothesis is coming from. Postgresql server prepared statements can certainly handle parameters.

What makes you think it can't ?

Dave

Re: Very strange performance decrease when reusing a PreparedStatement

От
John Lister
Дата:

Dave Cramer wrote:
>
>     It appears that the Postgres "server-prepared statement" cannot
> handle
>     parameters to the statement. This is really unfortunate, because 99%
>     of real-life applications will want to re-use the same statement
>     (template) with different parameters.
>
>     The term "server-prepared statement" itself already indicates that
>     there may be something skewed about the "local" semantics of
>     java.sql.PreparedStatements in the Postgres JDBC driver. There is no
>     notion of "client-prepared statement" in the JDBC API, which
> conceives
>     PreparedStatement instances as mere handles to server side objects.
>     And indeed, Postgres JDBC users have historically been using
>     java.sql.PreparedStatements for its side-effect of preventing SQL
>     injection rather than for the purpose the JDBC API designers had in
>     mind with this class.
>
>
> I'm not sure where this hypothesis is coming from. Postgresql server
> prepared statements can certainly handle parameters.
>
> What makes you think it can't ?
>
Possibly the post refers to problems in generating optimal query plans
with prepared statements, for example with respect to index choice. Some
cases i can understand:
For example if you have a partial index on say (val=3) and you do
something like "select ... where val=?" if the server knows the value is
3 it can use the better index.

However it seems that other optimisations can't be made for example it
doesn't seem possible to tell the server that parameter 1 is always
going to be an int and therefore it should be using index A. The current
implementation may not use index A as it is unaware as to the type of
the supplied parameter. Perhaps in this case the execution planner
should pick the parameters for the most optimal plan and return the
types during the parse and let the driver convert the data to what the
server requires...

Certainly it is a (minor) problem the prepared statements may not be the
most optimal with any benefits saved during parse lost due to bad
execution plans.


JOHN


Re: Very strange performance decrease when reusing a PreparedStatement

От
Tom Lane
Дата:
John Lister <john.lister-ps@kickstone.com> writes:
> However it seems that other optimisations can't be made for example it
> doesn't seem possible to tell the server that parameter 1 is always
> going to be an int and therefore it should be using index A. The current
> implementation may not use index A as it is unaware as to the type of
> the supplied parameter.

I don't think this is true either.  The wire protocol certainly provides
the ability for the client to tell the server what data type a parameter
has.  I don't know whether the JDBC driver makes use of that, but if it
does not, then something like
    variable = ?
is going to be treated exactly like
    variable = 'unmarked literal'
and in both cases the parser's default assumption is that the
unknown-type value has the same data type as the thing it's being
compared to.  So if the variable is indexed this would always be
seen as a indexable comparison.

There are certainly cases where lack of parameter type information could
lead to a poor plan, but they are corner cases.

            regards, tom lane

Re: Very strange performance decrease when reusing a PreparedStatement

От
Péter Kovács
Дата:
On Sun, May 3, 2009 at 12:10 PM, Dave Cramer <pg@fastcrypt.com> wrote:
>
>> It appears that the Postgres "server-prepared statement" cannot handle
>> parameters to the statement. This is really unfortunate, because 99%
>> of real-life applications will want to re-use the same statement
>> (template) with different parameters.
>>
>> The term "server-prepared statement" itself already indicates that
>> there may be something skewed about the "local" semantics of
>> java.sql.PreparedStatements in the Postgres JDBC driver. There is no
>> notion of "client-prepared statement" in the JDBC API, which conceives
>> PreparedStatement instances as mere handles to server side objects.
>> And indeed, Postgres JDBC users have historically been using
>> java.sql.PreparedStatements for its side-effect of preventing SQL
>> injection rather than for the purpose the JDBC API designers had in
>> mind with this class.
>
> I'm not sure where this hypothesis is coming from. Postgresql server
> prepared statements can certainly handle parameters.
>
> What makes you think it can't ?

http://jdbc.postgresql.org/documentation/83/server-prepare.html:

"Server side prepared statements are planned only once by the server.
This avoids the cost of replanning the query every time, but also
means that the planner cannot take advantage of the particular
parameter values used in a particular execution of the query. You
should be cautious about enabling the use of server side prepared
statements globally."

Or is this only the JDBC perspective of things, meaning that the
capability is there at the back-end, just the JDBC driver doesn't take
advantage of it?

Thanks
Peter

>
> Dave
>
>

Re: Very strange performance decrease when reusing a PreparedStatement

От
John Lister
Дата:

Tom Lane wrote:
> John Lister <john.lister-ps@kickstone.com> writes:
>
>> However it seems that other optimisations can't be made for example it
>> doesn't seem possible to tell the server that parameter 1 is always
>> going to be an int and therefore it should be using index A. The current
>> implementation may not use index A as it is unaware as to the type of
>> the supplied parameter.
>>
>
> I don't think this is true either. The wire protocol certainly provides
> the ability for the client to tell the server what data type a parameter
> has.  I don't know whether the JDBC driver makes use of that, but if it
> does not, then something like
>     variable = ?
> is going to be treated exactly like
>     variable = 'unmarked literal'
> and in both cases the parser's default assumption is that the
> unknown-type value has the same data type as the thing it's being
> compared to.  So if the variable is indexed this would always be
> seen as a indexable comparison.
>
The JDBC driver tries to mitigate this by delaying the parse until
execution time when all the query parameters are known.

> There are certainly cases where lack of parameter type information could
> lead to a poor plan, but they are corner cases.
>
>
I was guessing on the server implementation (wrongly probably) based on
a some other posts and comments in the code. My knowledge of the query
planner is limited but i would hope there are only a few cases where
knowing the types isn't sufficient to generate good plans without
knowing the values

JOHN

Re: Very strange performance decrease when reusing a PreparedStatement

От
Roland Roberts
Дата:
Péter Kovács wrote:
> http://jdbc.postgresql.org/documentation/83/server-prepare.html:
>
> "Server side prepared statements are planned only once by the server.
> This avoids the cost of replanning the query every time, but also
> means that the planner cannot take advantage of the particular
> parameter values used in a particular execution of the query. You
> should be cautious about enabling the use of server side prepared
> statements globally."
>
> Or is this only the JDBC perspective of things, meaning that the
> capability is there at the back-end, just the JDBC driver doesn't take
> advantage of it?
>
I believe the above documentation exactly describes the recent
performance problem.  The issue is that the next time you use the same
query, your parameters may be sufficiently different that the optimizer
*would* have picked a different index, but because the query has been
parsed and planned, the old one is used.

This problem is *not* unique to PostgreSQL.  We recently encountered
exactly the same problem with Oracle 10g.  In our case, the query
included the equivalent of a date range as "where event_time between ?
and ?".  For one query, the range covered less than 24 hours and was
likely to return only a single row.  In aother base, the range included
a full calendar year.  Depending on which query happened "first" (as far
as the database prepared statement cache was concerned), very different
plans could result.  Of the two plans, the one for a year's data worked
acceptably for both queries, but the plan for a day's data resulted in
abysmal performance for the year's data.

At least PostgreSQL has the ability to tell the parser to reparse every
time you ask it to prepare the statement.  Oracle doesn't.  The solution
for Oracle isn't relevant, but the real point here is that your code may
have to become smarter to help out the optimizer.  We reparse on every
execution, but can't control Oracle's server-side cache.  PostgreSQL
lets you control that with prepareThreshold=0.  If we'd had that option
with Oracle, we would have had an easy solution.  I'd say PostgreSQL's
implementation for this case is a good one.

roland

--
               PGP Key ID: 66 BC 3B CD
Roland B. Roberts, PhD                             RL Enterprises
roland@rlenter.com                            6818 Madeline Court
roland@astrofoto.org                           Brooklyn, NY 11220


Re: Very strange performance decrease when reusing a PreparedStatement

От
Péter Kovács
Дата:
On Sun, May 3, 2009 at 9:08 PM, Roland Roberts <roland@astrofoto.org> wrote:
> Péter Kovács wrote:
>>
>> http://jdbc.postgresql.org/documentation/83/server-prepare.html:
>>
>> "Server side prepared statements are planned only once by the server.
>> This avoids the cost of replanning the query every time, but also
>> means that the planner cannot take advantage of the particular
>> parameter values used in a particular execution of the query. You
>> should be cautious about enabling the use of server side prepared
>> statements globally."
>>
>> Or is this only the JDBC perspective of things, meaning that the
>> capability is there at the back-end, just the JDBC driver doesn't take
>> advantage of it?
>>
>
> I believe the above documentation exactly describes the recent performance
> problem.  The issue is that the next time you use the same query, your
> parameters may be sufficiently different that the optimizer *would* have
> picked a different index, but because the query has been parsed and planned,
> the old one is used.
>
> This problem is *not* unique to PostgreSQL.  We recently encountered exactly
> the same problem with Oracle 10g.  In our case, the query included the
> equivalent of a date range as "where event_time between ? and ?".  For one
> query, the range covered less than 24 hours and was likely to return only a
> single row.  In aother base, the range included a full calendar year.
>  Depending on which query happened "first" (as far as the database prepared
> statement cache was concerned), very different plans could result.  Of the
> two plans, the one for a year's data worked acceptably for both queries, but
> the plan for a day's data resulted in abysmal performance for the year's
> data.
>

Regardless of what Oracle can or cannot do, the question stays put:
Can precompiled Postgres SQL statements handle varying parameters? If
they can (and people here say they can), why doesn't the JDBC
PreparedStatement take advantage of it? (If they can't, I don't think
this is an impossible thing to do. My gut feeling is that a large
portion of the query planning process can be be completed up to the
inclusion of the actual values of the parameters. The resulting "query
plan template" could be cached and reused and refined for each
execution by taking account of the selectivity of the actual parameter
values.)

> At least PostgreSQL has the ability to tell the parser to reparse every time
> you ask it to prepare the statement.  Oracle doesn't.  The solution for
> Oracle isn't relevant, but the real point here is that your code may have to
> become smarter to help out the optimizer.  We reparse on every execution,
> but can't control Oracle's server-side cache.  PostgreSQL lets you control
> that with prepareThreshold=0.  If we'd had that option with Oracle, we would
> have had an easy solution.  I'd say PostgreSQL's implementation for this
> case is a good one.

So the usefulness of java.sql.PreparedStatements tends to be limited
to SQL injection prevention. :-)

Thanks
Peter

>
> roland
>
> --
>                       PGP Key ID: 66 BC 3B CD
> Roland B. Roberts, PhD                             RL Enterprises
> roland@rlenter.com                            6818 Madeline Court
> roland@astrofoto.org                           Brooklyn, NY 11220
>
>
> --
> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-jdbc
>

Re: Very strange performance decrease when reusing a PreparedStatement

От
John Lister
Дата:

Péter Kovács wrote:
> On Sun, May 3, 2009 at 9:08 PM, Roland Roberts <roland@astrofoto.org> wrote:
>
>> Péter Kovács wrote:
>>
>>> http://jdbc.postgresql.org/documentation/83/server-prepare.html:
>>>
>>> "Server side prepared statements are planned only once by the server.
>>> This avoids the cost of replanning the query every time, but also
>>> means that the planner cannot take advantage of the particular
>>> parameter values used in a particular execution of the query. You
>>> should be cautious about enabling the use of server side prepared
>>> statements globally."
>>>
>>> Or is this only the JDBC perspective of things, meaning that the
>>> capability is there at the back-end, just the JDBC driver doesn't take
>>> advantage of it?
>>>
>>>
>> I believe the above documentation exactly describes the recent performance
>> problem.  The issue is that the next time you use the same query, your
>> parameters may be sufficiently different that the optimizer *would* have
>> picked a different index, but because the query has been parsed and planned,
>> the old one is used.
>>
>> This problem is *not* unique to PostgreSQL.  We recently encountered exactly
>> the same problem with Oracle 10g.  In our case, the query included the
>> equivalent of a date range as "where event_time between ? and ?".  For one
>> query, the range covered less than 24 hours and was likely to return only a
>> single row.  In aother base, the range included a full calendar year.
>>  Depending on which query happened "first" (as far as the database prepared
>> statement cache was concerned), very different plans could result.  Of the
>> two plans, the one for a year's data worked acceptably for both queries, but
>> the plan for a day's data resulted in abysmal performance for the year's
>> data.
>>
>>
>
> Regardless of what Oracle can or cannot do, the question stays put:
> Can precompiled Postgres SQL statements handle varying parameters? If
> they can (and people here say they can), why doesn't the JDBC
> PreparedStatement take advantage of it? (If they can't, I don't think
> this is an impossible thing to do. My gut feeling is that a large
> portion of the query planning process can be be completed up to the
> inclusion of the actual values of the parameters. The resulting "query
> plan template" could be cached and reused and refined for each
> execution by taking account of the selectivity of the actual parameter
> values.)
>
>
I'm fairly sure the JDBC driver does take advantage of it, with a couple
of exceptions at the moment. The first time a query is executed, the
parameter types are fetched and used on subsequent queries. Server side
prepared statements aren't used until a user controlled threshold has
been reached.. I may be wrong, but i think the oracle driver does the
planning when the statement is created and not at execution time. The
closest postgres comes to this is setting the prepareThreshold to 1
which means every statement gets a server side prepared statement.

One other potential issue is that some values are sent without defined
types due to compatibility issues and the vagueness of the JDBC spec
(times for example) whether this effects the planner or not is over my
head, but Toms earlier comment would seem to imply possibly not...



JOHN

Re: Very strange performance decrease when reusing a PreparedStatement

От
Roland Roberts
Дата:
John Lister wrote:
> Péter Kovács wrote:
>> Regardless of what Oracle can or cannot do, the question stays put:
>> Can precompiled Postgres SQL statements handle varying parameters? If
>> they can (and people here say they can), why doesn't the JDBC
>> PreparedStatement take advantage of it? (If they can't, I don't think
>> this is an impossible thing to do. My gut feeling is that a large
>> portion of the query planning process can be be completed up to the
>> inclusion of the actual values of the parameters. The resulting "query
>> plan template" could be cached and reused and refined for each
>> execution by taking account of the selectivity of the actual parameter
>> values.)
> I'm fairly sure the JDBC driver does take advantage of it, with a
> couple of exceptions at the moment. The first time a query is
> executed, the parameter types are fetched and used on subsequent
> queries. Server side prepared statements aren't used until a user
> controlled threshold has been reached.. I may be wrong, but i think
> the oracle driver does the planning when the statement is created and
> not at execution time. The closest postgres comes to this is setting
> the prepareThreshold to 1 which means every statement gets a server
> side prepared statement.
No, my point was that PostgreSQL is doing the same thing that Oracle is
doing and in both cases it can bite you.  Both are looking at the bind
variables to come up with a plan but the plan is retained for reuse
under the assumption that the bind variables will be, statistically at
least, similar the next time.  When that assumption is violated, you get
stuck with a bad plan.  The purpose of my Oracle example was to give a
concrete example of such a violation.

Planning is a server side activity, always.  talking about it as
something that happens in the JDBC driver makes it sound like a client
side activity, but it's not.

regards,

roland

--
               PGP Key ID: 66 BC 3B CD
Roland B. Roberts, PhD                             RL Enterprises
roland@rlenter.com                            6818 Madeline Court
roland@astrofoto.org                           Brooklyn, NY 11220


Re: Very strange performance decrease when reusing a PreparedStatement

От
Oliver Jowett
Дата:
John Lister wrote:

> However it seems that other optimisations can't be made for example it
> doesn't seem possible to tell the server that parameter 1 is always
> going to be an int and therefore it should be using index A. The current
> implementation may not use index A as it is unaware as to the type of
> the supplied parameter. Perhaps in this case the execution planner
> should pick the parameters for the most optimal plan and return the
> types during the parse and let the driver convert the data to what the
> server requires...

No it's not a problem with types at all, parameter types are fixed at
the point the named statement is created.

It's all about index selectivity statistics.

-O

Re: Very strange performance decrease when reusing a PreparedStatement

От
Oliver Jowett
Дата:
Péter Kovács wrote:
> On Sun, May 3, 2009 at 12:10 PM, Dave Cramer <pg@fastcrypt.com> wrote:
>>> It appears that the Postgres "server-prepared statement" cannot handle
>>> parameters to the statement. This is really unfortunate, because 99%
>>> of real-life applications will want to re-use the same statement
>>> (template) with different parameters.
>>>
>>> The term "server-prepared statement" itself already indicates that
>>> there may be something skewed about the "local" semantics of
>>> java.sql.PreparedStatements in the Postgres JDBC driver. There is no
>>> notion of "client-prepared statement" in the JDBC API, which conceives
>>> PreparedStatement instances as mere handles to server side objects.

There's no requirement in JDBC that PreparedStatement be merely a handle
to a server-side object - that's just one implementation option, and a
poor one to choose for the PostgreSQL wire protocol in its current form.
At a bare minimum, we cache much of the statement and parameter
information on the client side. (Can you imagine every setXXX call doing
a network round-trip, really?)

>>> And indeed, Postgres JDBC users have historically been using
>>> java.sql.PreparedStatements for its side-effect of preventing SQL
>>> injection rather than for the purpose the JDBC API designers had in
>>> mind with this class.
>> I'm not sure where this hypothesis is coming from. Postgresql server
>> prepared statements can certainly handle parameters.
>>
>> What makes you think it can't ?
>
> http://jdbc.postgresql.org/documentation/83/server-prepare.html:
>
> "Server side prepared statements are planned only once by the server.
> This avoids the cost of replanning the query every time, but also
> means that the planner cannot take advantage of the particular
> parameter values used in a particular execution of the query. You
> should be cautious about enabling the use of server side prepared
> statements globally."

I don't think you're understanding that correctly. It has nothing to do
with how parameters are passed.

Are you familiar with how the wire protocol works?

If you have a query:

  SELECT * FROM foo WHERE someindexedvalue=?

When the unnamed statement is used, a new query plan is constructed when
parameters are bound to the statement before execution, and the
*particular* parameter value provided is given to the query planner.

The query planner then make decisions based on that actual value. For
example, if the table statistics say that the particular value used is
very selective for that column, it may decide to use an index scan over
a table scan.

When a named statement is used, the server behavior changes. The query
is planned once, when the statement is created but before parameter
values are bound. The planner does not have particular parameter values
available, and it creates a general plan suitable for any parameter
value. This can result in different index selectivity information,
because the values that will be actually be used are not known ahead of
time .. which in turn can result in different plans. The resulting plan
may be "good" for any random value but "bad" for a particular value.
Often apprications are not passing in any random value - there are
external constraints on the value that are not visible in the structure
of the query. For example, maybe you only retrieve log data from that
5GB table with a timestamp range that selects a few minutes of data -
but without specific parameter values, the planner doesn't know that.

"server side prepared statement" is the JDBC driver jargon for "uses a
persistent, named, statement at the protocol level". The mechanism for
passing parameters remains the same in both cases - the difference is
that in one case a one-shot unnamed statement is used, in the other case
a persistent server-side named statement is used.

The distinction is there precisely because there *ARE* known performance
issues on the server side with some queries when using a named statements.

There is not much more the driver can do than give you tuning knobs for
the problematic cases. Anything else is going to have to happen on the
server side to make the planner smarter in these cases (and I'm not even
sure if it CAN be smarter with the information it has)

-O

Re: Very strange performance decrease when reusing a PreparedStatement

От
Oliver Jowett
Дата:
Tom Lane wrote:

> There are certainly cases where lack of parameter type information could
> lead to a poor plan, but they are corner cases.

I fear they are less of a corner case than you believe - they seem to
pop up regularly here on the JDBC list.

(This might be a reflection that applications using JDBC are often
web-facing apps with simple, index-heavy queries)

-O

Re: Very strange performance decrease when reusing a PreparedStatement

От
Oliver Jowett
Дата:
Oliver Jowett wrote:
> Tom Lane wrote:
>
>> There are certainly cases where lack of parameter type information could
>> lead to a poor plan, but they are corner cases.

Sorry, I read that as "lack of parameter information" which is what this
comment is about:

> I fear they are less of a corner case than you believe - they seem to
> pop up regularly here on the JDBC list.

The parameter type issues themselves are mostly resolved AFAIK.

-O

Re: Very strange performance decrease when reusing a PreparedStatement

От
Oliver Jowett
Дата:
Péter Kovács wrote:

> Regardless of what Oracle can or cannot do, the question stays put:
> Can precompiled Postgres SQL statements handle varying parameters?

Yes.

> If
> they can (and people here say they can), why doesn't the JDBC
> PreparedStatement take advantage of it?

It does.

-O

Re: Very strange performance decrease when reusing a PreparedStatement

От
Péter Kovács
Дата:
On Mon, May 4, 2009 at 1:43 AM, Oliver Jowett <oliver@opencloud.com> wrote:
> Péter Kovács wrote:
>> On Sun, May 3, 2009 at 12:10 PM, Dave Cramer <pg@fastcrypt.com> wrote:
>>>> It appears that the Postgres "server-prepared statement" cannot handle
>>>> parameters to the statement. This is really unfortunate, because 99%
>>>> of real-life applications will want to re-use the same statement
>>>> (template) with different parameters.
>>>>
>>>> The term "server-prepared statement" itself already indicates that
>>>> there may be something skewed about the "local" semantics of
>>>> java.sql.PreparedStatements in the Postgres JDBC driver. There is no
>>>> notion of "client-prepared statement" in the JDBC API, which conceives
>>>> PreparedStatement instances as mere handles to server side objects.
>
> There's no requirement in JDBC that PreparedStatement be merely a handle
> to a server-side object - that's just one implementation option, and a
> poor one to choose for the PostgreSQL wire protocol in its current form.
> At a bare minimum, we cache much of the statement and parameter
> information on the client side. (Can you imagine every setXXX call doing
> a network round-trip, really?)
>
>>>> And indeed, Postgres JDBC users have historically been using
>>>> java.sql.PreparedStatements for its side-effect of preventing SQL
>>>> injection rather than for the purpose the JDBC API designers had in
>>>> mind with this class.
>>> I'm not sure where this hypothesis is coming from. Postgresql server
>>> prepared statements can certainly handle parameters.
>>>
>>> What makes you think it can't ?
>>
>> http://jdbc.postgresql.org/documentation/83/server-prepare.html:
>>
>> "Server side prepared statements are planned only once by the server.
>> This avoids the cost of replanning the query every time, but also
>> means that the planner cannot take advantage of the particular
>> parameter values used in a particular execution of the query. You
>> should be cautious about enabling the use of server side prepared
>> statements globally."
>
> I don't think you're understanding that correctly. It has nothing to do
> with how parameters are passed.
>
> Are you familiar with how the wire protocol works?
>
> If you have a query:
>
>  SELECT * FROM foo WHERE someindexedvalue=?
>
> When the unnamed statement is used, a new query plan is constructed when
> parameters are bound to the statement before execution, and the
> *particular* parameter value provided is given to the query planner.
>
> The query planner then make decisions based on that actual value. For
> example, if the table statistics say that the particular value used is
> very selective for that column, it may decide to use an index scan over
> a table scan.
>
> When a named statement is used, the server behavior changes. The query
> is planned once, when the statement is created but before parameter
> values are bound. The planner does not have particular parameter values
> available, and it creates a general plan suitable for any parameter
> value. This can result in different index selectivity information,
> because the values that will be actually be used are not known ahead of
> time .. which in turn can result in different plans. The resulting plan
> may be "good" for any random value but "bad" for a particular value.
> Often apprications are not passing in any random value - there are
> external constraints on the value that are not visible in the structure
> of the query. For example, maybe you only retrieve log data from that
> 5GB table with a timestamp range that selects a few minutes of data -
> but without specific parameter values, the planner doesn't know that.
>
> "server side prepared statement" is the JDBC driver jargon for "uses a
> persistent, named, statement at the protocol level". The mechanism for
> passing parameters remains the same in both cases - the difference is
> that in one case a one-shot unnamed statement is used, in the other case
> a persistent server-side named statement is used.
>
> The distinction is there precisely because there *ARE* known performance
> issues on the server side with some queries when using a named statements.
>
> There is not much more the driver can do than give you tuning knobs for
> the problematic cases. Anything else is going to have to happen on the
> server side to make the planner smarter in these cases (and I'm not even
> sure if it CAN be smarter with the information it has)

Thank you, Oliver! This clarifies a lot!

Do I understand it correctly that in the case of named statements, the
query planner makes an assumption about the selectivity of the actual
values to create the plan which will be cached and reused? If so,
wouldn't it be possible for the planner to
1. also cache the selectivity estimations used for the cached plan;
2. check the actual selectivity of parameters before each execution;
3. create (and execute) a "temporary plan" for those "unexpected"
parameters whose actual selectivity values differ significantly from
the selectivity values assumed for the cached plan.

Assuming that the cost of looking up actual selectivity is
insignificant compared to the cost of (re)creating a cached query
plan, this could avoid the kind of unexpected performance death this
thread is about. (And let's face it: being smart about optimizing
query execution is a much easier task for the query planner than for
the application developer. The query planner doesn't know the
parameters beforehand, but is at least in possession of detailed
low-level statistics. The application developer (typically) doesn't
know either the parameters beforehand (they are specified by the user)
or the table/index statistics.)

Thanks,
Peter

>
> -O
>

Re: Very strange performance decrease when reusing a PreparedStatement

От
Thomas Kellerer
Дата:
Roland Roberts, 04.05.2009 01:12:
> No, my point was that PostgreSQL is doing the same thing that Oracle is
> doing and in both cases it can bite you.  Both are looking at the bind
> variables to come up with a plan but the plan is retained for reuse
> under the assumption that the bind variables will be, statistically at
> least, similar the next time.  When that assumption is violated, you get
> stuck with a bad plan.  The purpose of my Oracle example was to give a
> concrete example of such a violation.

Oracle 11 has introduced a feature called "Adaptive Cursor Sharing" which tries to solve this problem (Oracle 10 has
sometweaks to address that problem manually, but they have their cornerstones as well) 

Each time a prepared statement is re-used the optimizer compares the actuals with the stored assumptions (mainly
estimatedrows vs. actual rows, IIRC). If those figures deviate too much, a new plan is created and used from that point
onwards_for that parameter combination_.  

So each SQL actually has a map of prepared statements depending on the input parameters. Obviously there is still one
statementexecution (the first one) that is hit by the bad plan, but all subsequent executions will benefit from the new
one.

Might be worth considering for PostgreSQL as well :)

Thomas

Re: Very strange performance decrease when reusing a PreparedStatement

От
Mikko Tiihonen
Дата:
Péter Kovács wrote:
> On Mon, May 4, 2009 at 1:43 AM, Oliver Jowett <oliver@opencloud.com> wrote:
>> Péter Kovács wrote:
>> Are you familiar with how the wire protocol works?
>>
>> If you have a query:
>>
>>  SELECT * FROM foo WHERE someindexedvalue=?
>>
>> When the unnamed statement is used, a new query plan is constructed when
>> parameters are bound to the statement before execution, and the
>> *particular* parameter value provided is given to the query planner.
>>
>> The query planner then make decisions based on that actual value. For
>> example, if the table statistics say that the particular value used is
>> very selective for that column, it may decide to use an index scan over
>> a table scan.
>>
>> When a named statement is used, the server behavior changes. The query
>> is planned once, when the statement is created but before parameter
>> values are bound. The planner does not have particular parameter values
>> available, and it creates a general plan suitable for any parameter
>> value. This can result in different index selectivity information,
>> because the values that will be actually be used are not known ahead of
>> time .. which in turn can result in different plans. The resulting plan
>> may be "good" for any random value but "bad" for a particular value.
>> Often apprications are not passing in any random value - there are
>> external constraints on the value that are not visible in the structure
>> of the query. For example, maybe you only retrieve log data from that
>> 5GB table with a timestamp range that selects a few minutes of data -
>> but without specific parameter values, the planner doesn't know that.
>>
>> "server side prepared statement" is the JDBC driver jargon for "uses a
>> persistent, named, statement at the protocol level". The mechanism for
>> passing parameters remains the same in both cases - the difference is
>> that in one case a one-shot unnamed statement is used, in the other case
>> a persistent server-side named statement is used.
>>
>> The distinction is there precisely because there *ARE* known performance
>> issues on the server side with some queries when using a named statements.
>>
>> There is not much more the driver can do than give you tuning knobs for
>> the problematic cases. Anything else is going to have to happen on the
>> server side to make the planner smarter in these cases (and I'm not even
>> sure if it CAN be smarter with the information it has)
>
> Thank you, Oliver! This clarifies a lot!
>
> Do I understand it correctly that in the case of named statements, the
> query planner makes an assumption about the selectivity of the actual
> values to create the plan which will be cached and reused? If so,
> wouldn't it be possible for the planner to
> 1. also cache the selectivity estimations used for the cached plan;
> 2. check the actual selectivity of parameters before each execution;
> 3. create (and execute) a "temporary plan" for those "unexpected"
> parameters whose actual selectivity values differ significantly from
> the selectivity values assumed for the cached plan.
>
> Assuming that the cost of looking up actual selectivity is
> insignificant compared to the cost of (re)creating a cached query
> plan, this could avoid the kind of unexpected performance death this
> thread is about. (And let's face it: being smart about optimizing
> query execution is a much easier task for the query planner than for
> the application developer. The query planner doesn't know the
> parameters beforehand, but is at least in possession of detailed
> low-level statistics. The application developer (typically) doesn't
> know either the parameters beforehand (they are specified by the user)
> or the table/index statistics.)

Another option, if the server side cannot be changed, is to allow multiple
server named statements for each jdbc prepared statement. Of course the
heuristics when to do such thing is nothing easy to write, and if it needs
manual configuration most applications would not take advantage of the feature.

An easier solution would be to make the client side detect the problematic
situation. The jdbc driver would be to fetch the explain analyze from both the
exact parametrized query and for the named statement with the parameters. If the
plans have totally different execution plan then it means the generic plan is
not good enough. At this point the jdbc driver could decide not to use named
statements at all for the particular prepared statement.

I think the latter solution would fix the main problems that users are seeing
with just on-time extra work done on the n-th prepared statement execution.

-Mikko

Re: Very strange performance decrease when reusing a PreparedStatement

От
Simon Riggs
Дата:
On Sun, 2009-05-03 at 15:08 -0400, Roland Roberts wrote:

> At least PostgreSQL has the ability to tell the parser to reparse
> every time you ask it to prepare the statement.  Oracle doesn't.  The
> solution for Oracle isn't relevant, but the real point here is that
> your code may have to become smarter to help out the optimizer.  We
> reparse on every execution, but can't control Oracle's server-side
> cache.  PostgreSQL lets you control that with prepareThreshold=0.  If
> we'd had that option with Oracle, we would have had an easy solution.
> I'd say PostgreSQL's implementation for this case is a good one.

We have a good feature there, definitely. Although there is still more
to be done.

Oracle's cursor_sharing features are slightly ahead of Postgres,
specifically around the "SIMILAR" option.

We talked a few years back about an "Option node" that would select
sub-plans at runtime based upon the actual values of the bind variable.
Nobody's got around to doing anything with that yet.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: Very strange performance decrease when reusing a PreparedStatement

От
Tom Lane
Дата:
Simon Riggs <simon@2ndQuadrant.com> writes:
> We talked a few years back about an "Option node" that would select
> sub-plans at runtime based upon the actual values of the bind variable.
> Nobody's got around to doing anything with that yet.

There is an AlternativeSubPlan mechanism in 8.4, though we're not doing
very much with it yet.

            regards, tom lane

Re: Very strange performance decrease when reusing a PreparedStatement

От
Simon Riggs
Дата:
On Wed, 2009-05-06 at 18:39 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > We talked a few years back about an "Option node" that would select
> > sub-plans at runtime based upon the actual values of the bind variable.
> > Nobody's got around to doing anything with that yet.
>
> There is an AlternativeSubPlan mechanism in 8.4, though we're not doing
> very much with it yet.

OK, good. We just need to use it when the distribution of values is
highly non-uniform such that values with max and min selectivity can
generate different plans *and* we have a parameter, the arrival
distribution of which we have no way of knowing at plan time. That case
accounts for about 50% of the cases for which hints/special handling is
currently required.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support