Обсуждение: Date Parameter To Query Confusing Optimizer

Поиск
Список
Период
Сортировка

Date Parameter To Query Confusing Optimizer

От
"Kurt Westerfeld"
Дата:
I have a JDBC-based application which passes date/time parameters using JDBC query parameters, which is performing very badly (ie. doing full table scans).  In an effort to try to narrow down the problem, I am taking the query and running it in interactive SQL mode, but changing the date parameters (which are BETWEEN ? and ? clauses) and placing a date literal instead, using the "date '2011-01-01' syntax.  When I do this, the query runs instantly, obviously using indices on the tables involved.
 
Now, I suspect the optimizer is taking the wrong path based on the usage of query parameters.  I'm pretty surprised by this, because I would think the optimizer would do the same thing for any query parameter, however it arrived.  Unfortunately for this situation, the code which forms the query actually is used for several database back-ends, and I don't want to change it to use postgres-specific query syntax if I can help it.  I'm trying to avoid this at all costs.
 
What's really weird, and makes me suspect an optimizer or protocol bug, is that if I place "protocolVersion=2" as a JDBC parameter, the problem goes away.  That is, I'm seeing the query take <1sec as opposed to >3min when using the legacy protocol.  I stumbled on this based on reading that the older protocol sent everything as string, and inferred the type on the server side. 
 
Now, that's a reasonable workaround, but it does seem like I've hit either a Postgres server bug, optimizer or other, or a JDBC bug of some kind.
 
Any help in narrowing down the problem is appreciated!

Re: Date Parameter To Query Confusing Optimizer

От
bricklen
Дата:
On Mon, Jan 3, 2011 at 2:48 PM, Kurt Westerfeld <kwesterfeld@novell.com> wrote:
> I have a JDBC-based application which passes date/time parameters using JDBC
> query parameters, which is performing very badly (ie. doing full table
> scans).  In an effort to try to narrow down the problem, I am taking the
> query and running it in interactive SQL mode, but changing the date
> parameters (which are BETWEEN ? and ? clauses) and placing a date literal
> instead, using the "date '2011-01-01' syntax.  When I do this, the query
> runs instantly, obviously using indices on the tables involved.

Try using PREPARE to simulate your issue. We ran into a similar issue
using PHP and prepared statements, where the plan choices were
occasionally abysmal depending what the filters were and their
relative distributions within the table.

http://www.postgresql.org/docs/current/interactive/sql-prepare.html

Re: Date Parameter To Query Confusing Optimizer

От
Alban Hertroys
Дата:
On 3 Jan 2011, at 23:48, Kurt Westerfeld wrote:

> I have a JDBC-based application which passes date/time parameters using JDBC query parameters, which is performing
verybadly (ie. doing full table scans).  In an effort to try to narrow down the problem, I am taking the query and
runningit in interactive SQL mode, but changing the date parameters (which are BETWEEN ? and ? clauses) and placing a
dateliteral instead, using the "date '2011-01-01' syntax.  When I do this, the query runs instantly, obviously using
indiceson the tables involved. 
>
> Now, I suspect the optimizer is taking the wrong path based on the usage of query parameters.  I'm pretty surprised
bythis, because I would think the optimizer would do the same thing for any query parameter, however it arrived.
Unfortunatelyfor this situation, the code which forms the query  

The problem here is that JDBC uses prepared statements for parameterised queries. By the very definition of a prepared
statementthe query plan gets stored before the parameter values are known, which forces the database to use a query
planthat would work for every possible value of those parameters. 

Thus you end up with a generic query plan.

This isn't often a problem, but if a significant number of your possible parameter values exist in a high percentage of
yourtable rows, then chances are you'll end up with a plan with a sequential scan. 

You didn't tell what version of Postgres you're using - I recall recent versions (since 8.3?) are smarter about this
particularscenario. 

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


!DSPAM:737,4d22ca9a11548321074132!



Re: Date Parameter To Query Confusing Optimizer

От
"Kurt Westerfeld"
Дата:
"By the very definition of a prepared statement the query plan gets stored before the parameter values are known"
 
Is this true for all databases?  It would seem to me that this approach would always lead to the wrong query plan, especially in the case I am testing where the selectivity is very low for the BETWEEN clause I am using.  The trouble is, the BETWEEN clause can also yield a highly selective result--which is basically the case as the database "ages".  See, the date/timestamp column in this case is a good choice as it is a database activity table, a temporal database.  The trouble is at times there is a huge amount of activity, yielding very low selectivity. 
 
Maybe it would be best for me to just disable the indices on the timestamp fields and just use/expect my join would yield the best results.
 
But, back on topic, to me it seems wrong that choice of prepared vs. non-prepared, and protocol 2 vs. 3, would influence the optimizer so profoundly.  I would think it's got to be something I can tune, that prepared statement parameters be considered for execution plan.  Is there such a setting? 

>>> Alban Hertroys <dalroi@solfertje.student.utwente.nl> 1/4/2011 2:21 AM >>>
On 3 Jan 2011, at 23:48, Kurt Westerfeld wrote:

> I have a JDBC-based application which passes date/time parameters using JDBC query parameters, which is performing very badly (ie. doing full table scans).  In an effort to try to narrow down the problem, I am taking the query and running it in interactive SQL mode, but changing the date parameters (which are BETWEEN ? and ? clauses) and placing a date literal instead, using the "date '2011-01-01' syntax.  When I do this, the query runs instantly, obviously using indices on the tables involved.

> Now, I suspect the optimizer is taking the wrong path based on the usage of query parameters.  I'm pretty surprised by this, because I would think the optimizer would do the same thing for any query parameter, however it arrived.  Unfortunately for this situation, the code which forms the query

The problem here is that JDBC uses prepared statements for parameterised queries. By the very definition of a prepared statement the query plan gets stored before the parameter values are known, which forces the database to use a query plan that would work for every possible value of those parameters.

Thus you end up with a generic query plan.

This isn't often a problem, but if a significant number of your possible parameter values exist in a high percentage of your table rows, then chances are you'll end up with a plan with a sequential scan.

You didn't tell what version of Postgres you're using - I recall recent versions (since 8.3?) are smarter about this particular scenario.

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


!DSPAM:1213,4d22ca9211544532215324!


Re: Date Parameter To Query Confusing Optimizer

От
Radosław Smogura
Дата:
 Can You try "...BETWEEN ?::date and ?::date ..." syntax or send
 statement causing problems?

 Kind regards,
 Radosław Smogura

 On Tue, 04 Jan 2011 07:27:42 -0700, "Kurt Westerfeld"
 <kwesterfeld@novell.com> wrote:
> "By the very definition of a prepared statement the query plan gets
> stored before the parameter values are known"
>
> Is this true for all databases? It would seem to me that this
> approach would always lead to the wrong query plan, especially in the
> case I am testing where the selectivity is very low for the BETWEEN
> clause I am using. The trouble is, the BETWEEN clause can also yield
> a
> highly selective result--which is basically the case as the database
> "ages". See, the date/timestamp column in this case is a good choice
> as it is a database activity table, a temporal database. The trouble
> is at times there is a huge amount of activity, yielding very low
> selectivity.
>
> Maybe it would be best for me to just disable the indices on the
> timestamp fields and just use/expect my join would yield the best
> results.
>
> But, back on topic, to me it seems wrong that choice of prepared vs.
> non-prepared, and protocol 2 vs. 3, would influence the optimizer so
> profoundly. I would think it's got to be something I can tune, that
> prepared statement parameters be considered for execution plan. Is
> there such a setting?
>
>>>> Alban Hertroys  1/4/2011 2:21 AM >>>
> On 3 Jan 2011, at 23:48, Kurt Westerfeld wrote:
>
>> I have a JDBC-based application which passes date/time parameters
> using JDBC query parameters, which is performing very badly (ie.
> doing
> full table scans). In an effort to try to narrow down the problem, I
> am taking the query and running it in interactive SQL mode, but
> changing the date parameters (which are BETWEEN ? and ? clauses) and
> placing a date literal instead, using the "date '2011-01-01' syntax.
> When I do this, the query runs instantly, obviously using indices on
> the tables involved.
>>
>> Now, I suspect the optimizer is taking the wrong path based on the
> usage of query parameters. I'm pretty surprised by this, because I
> would think the optimizer would do the same thing for any query
> parameter, however it arrived. Unfortunately for this situation, the
> code which forms the query
>
> The problem here is that JDBC uses prepared statements for
> parameterised queries. By the very definition of a prepared statement
> the query plan gets stored before the parameter values are known,
> which forces the database to use a query plan that would work for
> every possible value of those parameters.
>
> Thus you end up with a generic query plan.
>
> This isn't often a problem, but if a significant number of your
> possible parameter values exist in a high percentage of your table
> rows, then chances are you'll end up with a plan with a sequential
> scan.
>
> You didn't tell what version of Postgres you're using - I recall
> recent versions (since 8.3?) are smarter about this particular
> scenario.
>
> Alban Hertroys
>
> --
> Screwing up is an excellent way to attach something to the ceiling.
>
> !DSPAM:1213,4d22ca9211544532215324!


Re: Date Parameter To Query Confusing Optimizer

От
"Kurt Westerfeld"
Дата:
First, sorry I didn't mention that I am using PostgreSQL 9, but the problem existed also on 8.4.
 
As for the BETWEEN clause, I'm using hibernate and don't want to put a database-specific SQL query in place. 
 
I also decided to drop a few indexes, which were probably causing the optimizer to choose the wrong plan.  This "fixes" the problem also, but I think in general the "bug" here is that the query plan is bound too early in prepared statement execution.

>>> Radosław Smogura<rsmogura@softperience.eu> 1/4/2011 9:48 AM >>>
Can You try "...BETWEEN ?::date and ?::date ..." syntax or send
statement causing problems?

Kind regards,
Radosław Smogura

On Tue, 04 Jan 2011 07:27:42 -0700, "Kurt Westerfeld"
<kwesterfeld@novell.com> wrote:
> "By the very definition of a prepared statement the query plan gets
> stored before the parameter values are known"
>
> Is this true for all databases? It would seem to me that this
> approach would always lead to the wrong query plan, especially in the
> case I am testing where the selectivity is very low for the BETWEEN
> clause I am using. The trouble is, the BETWEEN clause can also yield
> a
> highly selective result--which is basically the case as the database
> "ages". See, the date/timestamp column in this case is a good choice
> as it is a database activity table, a temporal database. The trouble
> is at times there is a huge amount of activity, yielding very low
> selectivity.
>
> Maybe it would be best for me to just disable the indices on the
> timestamp fields and just use/expect my join would yield the best
> results.
>
> But, back on topic, to me it seems wrong that choice of prepared vs.
> non-prepared, and protocol 2 vs. 3, would influence the optimizer so
> profoundly. I would think it's got to be something I can tune, that
> prepared statement parameters be considered for execution plan. Is
> there such a setting?
>
>>>> Alban Hertroys  1/4/2011 2:21 AM >>>
> On 3 Jan 2011, at 23:48, Kurt Westerfeld wrote:
>
>> I have a JDBC-based application which passes date/time parameters
> using JDBC query parameters, which is performing very badly (ie.
> doing
> full table scans). In an effort to try to narrow down the problem, I
> am taking the query and running it in interactive SQL mode, but
> changing the date parameters (which are BETWEEN ? and ? clauses) and
> placing a date literal instead, using the "date '2011-01-01' syntax.
> When I do this, the query runs instantly, obviously using indices on
> the tables involved.
>>
>> Now, I suspect the optimizer is taking the wrong path based on the
> usage of query parameters. I'm pretty surprised by this, because I
> would think the optimizer would do the same thing for any query
> parameter, however it arrived. Unfortunately for this situation, the
> code which forms the query
>
> The problem here is that JDBC uses prepared statements for
> parameterised queries. By the very definition of a prepared statement
> the query plan gets stored before the parameter values are known,
> which forces the database to use a query plan that would work for
> every possible value of those parameters.
>
> Thus you end up with a generic query plan.
>
> This isn't often a problem, but if a significant number of your
> possible parameter values exist in a high percentage of your table
> rows, then chances are you'll end up with a plan with a sequential
> scan.
>
> You didn't tell what version of Postgres you're using - I recall
> recent versions (since 8.3?) are smarter about this particular
> scenario.
>
> Alban Hertroys
>
> --
> Screwing up is an excellent way to attach something to the ceiling.
>
> !DSPAM:1213,4d22ca9211544532215324!

Re: Date Parameter To Query Confusing Optimizer

От
Bill Moran
Дата:
In response to "Kurt Westerfeld" <kwesterfeld@novell.com>:
> >>> Radosław Smogura<rsmogura@softperience.eu> 1/4/2011 9:48 AM >>>
> Can You try "...BETWEEN ?::date and ?::date ..." syntax or send
> statement causing problems?
>
> As for the BETWEEN clause, I'm using hibernate and don't want to put a database-specific SQL query in place.

You could do "BEWTEEN CAST(? AS DATE) AND CAST(? AS DATE)" and it wouldn't
be database-specific.

You might also want to consider top-posting.  I'm not sure why I read enough
of this to understand it, as I usually get confused an frustrated when
things have been top posted and stop reading.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/