Обсуждение: Transient plans versus the SPI API

Поиск
Список
Период
Сортировка

Transient plans versus the SPI API

От
Tom Lane
Дата:
I've been thinking about how to redesign the plancache infrastructure to
better support use of transient (one-shot) plans, as we've talked about
various times such as in this thread:
http://archives.postgresql.org/pgsql-hackers/2010-02/msg00607.php
(Note: that thread sorta went off into the weeds arguing about exactly
what heuristics to use for when to re-plan.  I don't want to re-open that
issue today, since there's no way to experiment with policy until we have
some mechanism in place.)

I think that what we need to do is get rid of the assumption that a "cached
plan" normally includes a plan per se.  The initial creation of the cache
entry should just supply a raw query plus its analyzed-and-rewritten form.
(plancache.c can actually operate that way today, via its "not fully
planned" flag, but it's a wart rather than the normal philosophy.)  Then
RevalidateCachedPlan should be replaced by something with the semantics
of "get me a plan to use, and here's the parameter values I'm going to use
it with".  The choice between using a pre-cached generic plan and building
a one-off plan would then be localized in this new function.

There are not that many places that call plancache.c directly, and so this
change in API won't cause much code churn --- but one place that does
depend on this is spi.c, and there is *lots* of both core and third-party
code that calls SPI_prepare for example.  So we need to tread carefully in
redefining SPI's behavior.

The most straightforward way to reimplement things within spi.c would be
to redefine SPI_prepare as just doing the parse-and-rewrite steps, with
planning always postponed to SPI_execute.  In the case where you just
prepare and then execute a SPIPlan, this would come out the same or
better, since we'd still just do one planning cycle, but the planner could
be given the actual parameter values to use.  However, if you SPI_prepare,
SPI_saveplan, and then SPI_execute many times, you might come out behind.
This is of course the same tradeoff we are going to impose at the SQL level
anyway, but I wonder whether there needs to be a control knob available to
C code to retain the old plan-once-and-always-use-that-plan approach.

Anyone have an opinion about that?  And if we do need to expose some
control, should the default (if you don't change your source code) be that
you still get the old behavior, or that you get the new behavior?  I'm
inclined to think that if we believe this'll be a win at the SQL level,
it should be a win at the SPI-caller level too, but maybe someone thinks
otherwise.
        regards, tom lane


Re: Transient plans versus the SPI API

От
"Kevin Grittner"
Дата:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The most straightforward way to reimplement things within spi.c
> would be to redefine SPI_prepare as just doing the
> parse-and-rewrite steps, with planning always postponed to
> SPI_execute.  In the case where you just prepare and then execute
> a SPIPlan, this would come out the same or better, since we'd
> still just do one planning cycle, but the planner could be given
> the actual parameter values to use.  However, if you SPI_prepare,
> SPI_saveplan, and then SPI_execute many times, you might come out
> behind.  This is of course the same tradeoff we are going to
> impose at the SQL level anyway, but I wonder whether there needs
> to be a control knob available to C code to retain the old
> plan-once-and-always-use-that-plan approach.
> 
> Anyone have an opinion about that?
I have a few places I've used SPI_saveplan where there is really
only one sensible plan, so I'm pretty sure it would be a loss to use
the new technique in those places.  Now, whether that would be a
loss that would be big enough for anyone to notice (or even to
reliably measure) is another question.  It wouldn't surprise me if
the difference was insignificant, but it would be reassuring to have
an easy way to check....
-Kevin


Re: Transient plans versus the SPI API

От
Robert Haas
Дата:
On Tue, Aug 2, 2011 at 4:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I've been thinking about how to redesign the plancache infrastructure to
> better support use of transient (one-shot) plans, as we've talked about
> various times such as in this thread:
> http://archives.postgresql.org/pgsql-hackers/2010-02/msg00607.php
> (Note: that thread sorta went off into the weeds arguing about exactly
> what heuristics to use for when to re-plan.  I don't want to re-open that
> issue today, since there's no way to experiment with policy until we have
> some mechanism in place.)
>
> I think that what we need to do is get rid of the assumption that a "cached
> plan" normally includes a plan per se.  The initial creation of the cache
> entry should just supply a raw query plus its analyzed-and-rewritten form.
> (plancache.c can actually operate that way today, via its "not fully
> planned" flag, but it's a wart rather than the normal philosophy.)  Then
> RevalidateCachedPlan should be replaced by something with the semantics
> of "get me a plan to use, and here's the parameter values I'm going to use
> it with".  The choice between using a pre-cached generic plan and building
> a one-off plan would then be localized in this new function.

This seems like a good design.  Now what would be really cool is if
you could observe a stream of queries like this:

SELECT a, b FROM foo WHERE c = 123
SELECT a, b FROM foo WHERE c = 97
SELECT a, b FROM foo WHERE c = 236

...and say, hey, I could just make a generic plan and use it every
time I see one of these.  It's not too clear to me how you'd make
recognition of such queries cheap enough to be practical, but maybe
someone will think of a way...

> There are not that many places that call plancache.c directly, and so this
> change in API won't cause much code churn --- but one place that does
> depend on this is spi.c, and there is *lots* of both core and third-party
> code that calls SPI_prepare for example.  So we need to tread carefully in
> redefining SPI's behavior.
>
> The most straightforward way to reimplement things within spi.c would be
> to redefine SPI_prepare as just doing the parse-and-rewrite steps, with
> planning always postponed to SPI_execute.  In the case where you just
> prepare and then execute a SPIPlan, this would come out the same or
> better, since we'd still just do one planning cycle, but the planner could
> be given the actual parameter values to use.  However, if you SPI_prepare,
> SPI_saveplan, and then SPI_execute many times, you might come out behind.
> This is of course the same tradeoff we are going to impose at the SQL level
> anyway, but I wonder whether there needs to be a control knob available to
> C code to retain the old plan-once-and-always-use-that-plan approach.
>
> Anyone have an opinion about that?  And if we do need to expose some
> control, should the default (if you don't change your source code) be that
> you still get the old behavior, or that you get the new behavior?  I'm
> inclined to think that if we believe this'll be a win at the SQL level,
> it should be a win at the SPI-caller level too, but maybe someone thinks
> otherwise.

I am not sure about this one.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Transient plans versus the SPI API

От
Simon Riggs
Дата:
On Tue, Aug 2, 2011 at 9:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> The most straightforward way to reimplement things within spi.c would be
> to redefine SPI_prepare as just doing the parse-and-rewrite steps, with
> planning always postponed to SPI_execute.  In the case where you just
> prepare and then execute a SPIPlan, this would come out the same or
> better, since we'd still just do one planning cycle, but the planner could
> be given the actual parameter values to use.  However, if you SPI_prepare,
> SPI_saveplan, and then SPI_execute many times, you might come out behind.
> This is of course the same tradeoff we are going to impose at the SQL level
> anyway, but I wonder whether there needs to be a control knob available to
> C code to retain the old plan-once-and-always-use-that-plan approach.

The problems only occur <1% of the time, so this penalises everyone to
avoid real but rare problems.

This will cause a massive loss of performance in most apps, though I
understand the annoyance and why you make the suggestion.

http://www.db2ude.com/?q=node/73 for some more background on how this
is handled elsewhere
Control knob == hint, so I've avoided suggesting such an approach myself.

I think its possible to tell automatically whether we need to replan
always or not based upon the path we take through selectivity
functions.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Transient plans versus the SPI API

От
Dimitri Fontaine
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> Anyone have an opinion about that?

I still have this application where PREPARE takes between 50ms and 300ms
and EXECUTE 5ms to 10ms, and I can handle 1 PREPARE for 10000 EXECUTE
quite easily.  (Yes the database fits in RAM, and yes when that's no
longer the case we just upgrade the hardware)

What does your proposal mean for such a use case?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: Transient plans versus the SPI API

От
Tom Lane
Дата:
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> Anyone have an opinion about that?

> I still have this application where PREPARE takes between 50ms and 300ms
> and EXECUTE 5ms to 10ms, and I can handle 1 PREPARE for 10000 EXECUTE
> quite easily.  (Yes the database fits in RAM, and yes when that's no
> longer the case we just upgrade the hardware)

> What does your proposal mean for such a use case?

Well, the policy for when to replan or not remains to be worked out in
detail, but what is likely to happen for such cases is that we'll waste
a few planning cycles before determining that there's no benefit in a
custom plan.  So, using the worst-case ends of your ranges above and
assuming that "a few" means "10", we'd go from 300 + 5 * 10000 = 50300
ms to execute the query 10000 times, to 10 * 300 + 5 * 10000 = 53000 ms.
So yes, it'd get a little worse for that use-case.  But you have to
weigh that against the likelihood that other use-cases will get better.
If our requirement for a transient-plan mechanism is that no individual
case can ever be worse than before, then we might as well abandon the
entire project right now, because the only way to meet that requirement
is to change nothing.

Of course we could address the worst cases by providing some mechanism
to tell the plancache code "always use a generic plan for this query"
or "always use a custom plan".  I'm not entirely thrilled with that,
because it's effectively a planner hint and has got the same problems
as all planner hints, namely that users are likely to get it wrong.
But it would be relatively painless to supply such a hint at the SPI
level, which is why I asked whether we should.  It'd be much harder to
do something equivalent at higher levels, which is why I'm not that
eager to do it for SPI.
        regards, tom lane


Re: Transient plans versus the SPI API

От
Robert Haas
Дата:
On Wed, Aug 3, 2011 at 12:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes:
>> Tom Lane <tgl@sss.pgh.pa.us> writes:
>>> Anyone have an opinion about that?
>
>> I still have this application where PREPARE takes between 50ms and 300ms
>> and EXECUTE 5ms to 10ms, and I can handle 1 PREPARE for 10000 EXECUTE
>> quite easily.  (Yes the database fits in RAM, and yes when that's no
>> longer the case we just upgrade the hardware)
>
>> What does your proposal mean for such a use case?
>
> Well, the policy for when to replan or not remains to be worked out in
> detail, but what is likely to happen for such cases is that we'll waste
> a few planning cycles before determining that there's no benefit in a
> custom plan.  So, using the worst-case ends of your ranges above and
> assuming that "a few" means "10", we'd go from 300 + 5 * 10000 = 50300
> ms to execute the query 10000 times, to 10 * 300 + 5 * 10000 = 53000 ms.

A little OT here, but (as I think Simon said elsewhere) I think we
really ought to be considering the table statistics when deciding
whether or not to replan.  It seems to me that the overwhelmingly
common case where this is going to come up is when (some subset of)
the MCVs require a different plan than run-of-the-mill values.  It
would be nice to somehow work that out.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Transient plans versus the SPI API

От
Dimitri Fontaine
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> So yes, it'd get a little worse for that use-case.  But you have to
> weigh that against the likelihood that other use-cases will get better.
> If our requirement for a transient-plan mechanism is that no individual
> case can ever be worse than before, then we might as well abandon the
> entire project right now, because the only way to meet that requirement
> is to change nothing.

That is not were I wanted to drift.  It's just that I don't have as much
time as I would like to those days, and so it helps me a lot seeing a
worked out example rather than make sure I parse your proposal
correctly.  Thanks a lot for your answer, I have a very clear
confirmation on how I read your previous email.

I will have to do some testing, but it could well be that this
application will benefit from locking reductions enough that it buys
this effect back.

> Of course we could address the worst cases by providing some mechanism
> to tell the plancache code "always use a generic plan for this query"
> or "always use a custom plan".  I'm not entirely thrilled with that,
> because it's effectively a planner hint and has got the same problems
> as all planner hints, namely that users are likely to get it wrong.

Yeah.

> But it would be relatively painless to supply such a hint at the SPI
> level, which is why I asked whether we should.  It'd be much harder to
> do something equivalent at higher levels, which is why I'm not that
> eager to do it for SPI.

Given the SLA of those prepared queries in my case, I think I could
accept to have to switch from SQL statements to C coded SRF to guarantee
the planning behavior.  It will not make the upgrade cheaper, but I
realize it's a very narrow and specific use case.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: Transient plans versus the SPI API

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> This seems like a good design.  Now what would be really cool is if
> you could observe a stream of queries like this:

> SELECT a, b FROM foo WHERE c = 123
> SELECT a, b FROM foo WHERE c = 97
> SELECT a, b FROM foo WHERE c = 236

> ...and say, hey, I could just make a generic plan and use it every
> time I see one of these.  It's not too clear to me how you'd make
> recognition of such queries cheap enough to be practical, but maybe
> someone will think of a way...

Hm, you mean reverse-engineering the parameterization of the query?
Interesting thought, but I really don't see a way to make it practical.

In any case, it would amount to making up for a bad decision on the
application side, ie, not transmitting the query in the parameterized
form that presumably exists somewhere in the application.  I think
we'd be better served all around by encouraging app developers to rely
more heavily on parameterized queries ... but first we have to fix the
performance risks there.
        regards, tom lane


Re: Transient plans versus the SPI API

От
Robert Haas
Дата:
On Wed, Aug 3, 2011 at 3:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> This seems like a good design.  Now what would be really cool is if
>> you could observe a stream of queries like this:
>
>> SELECT a, b FROM foo WHERE c = 123
>> SELECT a, b FROM foo WHERE c = 97
>> SELECT a, b FROM foo WHERE c = 236
>
>> ...and say, hey, I could just make a generic plan and use it every
>> time I see one of these.  It's not too clear to me how you'd make
>> recognition of such queries cheap enough to be practical, but maybe
>> someone will think of a way...
>
> Hm, you mean reverse-engineering the parameterization of the query?
> Interesting thought, but I really don't see a way to make it practical.
>
> In any case, it would amount to making up for a bad decision on the
> application side, ie, not transmitting the query in the parameterized
> form that presumably exists somewhere in the application.  I think
> we'd be better served all around by encouraging app developers to rely
> more heavily on parameterized queries ... but first we have to fix the
> performance risks there.

Fair enough.  I have to admit I'm afraid of them right now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Transient plans versus the SPI API

От
Tom Lane
Дата:
Simon Riggs <simon@2ndQuadrant.com> writes:
> I think its possible to tell automatically whether we need to replan
> always or not based upon the path we take through selectivity
> functions.

I don't really believe that, or at least I think it would only detect a
few cases.  Examples of parameter-value-sensitive decisions that are
made nowhere near the selectivity functions are constraint exclusion and
LIKE pattern to index-qual conversion.  And in none of these cases do we
really know at the bottom level whether a different parameter value will
lead to a significant change in the finished plan.  For instance, if
there's no index for column foo, it is a waste of time to force
replanning just because we have varying selectivity estimates for
"WHERE foo > $1".

I think we'll be a lot better off with the framework discussed last
year: build a generic plan, as well as custom plans for the first few
sets of parameter values, and then observe whether there's a significant
reduction in estimated costs for the custom plans.

But in any case, it's way premature to be debating this until we have
the infrastructure in which we can experiment with different policies.
        regards, tom lane


Re: Transient plans versus the SPI API

От
Simon Riggs
Дата:
On Wed, Aug 3, 2011 at 8:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> I think its possible to tell automatically whether we need to replan
>> always or not based upon the path we take through selectivity
>> functions.
>
> I don't really believe that, or at least I think it would only detect a
> few cases.  Examples of parameter-value-sensitive decisions that are
> made nowhere near the selectivity functions are constraint exclusion and
> LIKE pattern to index-qual conversion.  And in none of these cases do we
> really know at the bottom level whether a different parameter value will
> lead to a significant change in the finished plan.  For instance, if
> there's no index for column foo, it is a waste of time to force
> replanning just because we have varying selectivity estimates for
> "WHERE foo > $1".
>
> I think we'll be a lot better off with the framework discussed last
> year: build a generic plan, as well as custom plans for the first few
> sets of parameter values, and then observe whether there's a significant
> reduction in estimated costs for the custom plans.

The problem there is which executions we build custom plans for. That
turns the problem into a sampling issue and you'll only fix the
problems that occur with a frequency to match your sampling pattern
and rate. Examples of situations where it won't help.

* plans that vary by table size will be about the same in the first 5
executions. After large number of executions, things go bad.

* text search using parameter is provided by user input - sensible
requests have low selectivities; some users put in <space> or "e" and
then we try to retrieve whole table by index scan. Almost impossible
to prevent all potentially high selectivity inputs from user. We could
add LIMIT but frequently ORM generated queries do not do that.

This isn't my-way-or-your-way - I think we need to look at some form
of "safety barriers" so we generate a plan but also know when the plan
has outlived its usefulness and force a re-plan.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Transient plans versus the SPI API

От
Yeb Havinga
Дата:
On 2011-08-03 21:19, Tom Lane wrote:
> Robert Haas<robertmhaas@gmail.com>  writes:
>> This seems like a good design.  Now what would be really cool is if
>> you could observe a stream of queries like this:
>> SELECT a, b FROM foo WHERE c = 123
>> SELECT a, b FROM foo WHERE c = 97
>> SELECT a, b FROM foo WHERE c = 236
>> ...and say, hey, I could just make a generic plan and use it every
>> time I see one of these.  It's not too clear to me how you'd make
>> recognition of such queries cheap enough to be practical, but maybe
>> someone will think of a way...
> Hm, you mean reverse-engineering the parameterization of the query?
> Interesting thought, but I really don't see a way to make it practical.

See also http://archives.postgresql.org/pgsql-hackers/2010-11/msg00617.php

I don't know if any implementation can be practical - maybe the parser 
could be coerced into emitting some kind of number that's based on 
everything in the query, except constants (and whitespace), so it would 
be the same for all the queries Robert described. That could be low cost 
enough to detect of for a query's id a cached plan exists and do more 
work only in those cases.

-- 
Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data



Re: Transient plans versus the SPI API

От
Peter Eisentraut
Дата:
On tis, 2011-08-02 at 16:47 -0400, Tom Lane wrote:
> The most straightforward way to reimplement things within spi.c would
> be to redefine SPI_prepare as just doing the parse-and-rewrite steps,
> with planning always postponed to SPI_execute.  In the case where you
> just prepare and then execute a SPIPlan, this would come out the same
> or better, since we'd still just do one planning cycle, but the
> planner could be given the actual parameter values to use.  However,
> if you SPI_prepare, SPI_saveplan, and then SPI_execute many times, you
> might come out behind.  This is of course the same tradeoff we are
> going to impose at the SQL level anyway, but I wonder whether there
> needs to be a control knob available to C code to retain the old
> plan-once-and-always-use-that-plan approach. 

How about a new function like SPI_parse that has the new semantics?

Note that the SPI functions are more or less directly exposed in PL/Perl
and PL/Python, and there are a number of existing idioms there that make
use of prepared plans.  Changing the semantics of those functions might
upset a lot of code.



Re: Transient plans versus the SPI API

От
Tom Lane
Дата:
Peter Eisentraut <peter_e@gmx.net> writes:
> How about a new function like SPI_parse that has the new semantics?

Yeah, I'd considered that idea (and even exactly that name for it).
Howver, the disadvantage of inventing a separate entry point is that
it isn't going to be nice for multi-level call chains, of which there
are several inside the core code and probably plenty elsewhere.  The
bottom level would have to do something like
if (new-behavior-wanted)    SPI_parse(args...);else    SPI_prepare(args...);

and then invent some way for its callers to signal new-behavior-wanted,
and it won't be pretty if they all pick different ways to do that.

Plus we've already got SPI_prepare_cursor and SPI_prepare_params, each
of which would need a matching SPI_parse_foo entry point.

So if we want a knob here, I think that the sanest way to install it is
to add a couple more flag bits to the existing "int cursorOptions"
bitmask arguments of the latter two functions, perhaps
CURSOR_OPT_USE_GENERIC_PLANCURSOR_OPT_USE_CUSTOM_PLAN

to force generic-plan-always or custom-plan-always respectively.
(The "cursor" naming of those flag bits is starting to look a bit
unfortunate, but I'm not inclined to rename them now.)

If we set it up like that, then the default behavior with flags == 0
would be to use the heuristic plan-selection approach, and presumably
that is what you would also get from SPI_prepare (which is both coded
and documented as matching SPI_prepare_cursor with flags == 0).

So the question is whether it's okay to change the default behavior...

> Note that the SPI functions are more or less directly exposed in PL/Perl
> and PL/Python, and there are a number of existing idioms there that make
> use of prepared plans.  Changing the semantics of those functions might
> upset a lot of code.

Right, but by the same token, if we don't change the default behavior,
there is going to be a heck of a lot of code requiring manual adjustment
before it can make use of the (hoped-to-be) improvements.  To me it
makes more sense to change the default and then provide ways for people
to lock down the behavior if the heuristic doesn't work for them.
        regards, tom lane


Re: Transient plans versus the SPI API

От
Jeff Davis
Дата:
On Tue, 2011-08-02 at 16:47 -0400, Tom Lane wrote:
> The most straightforward way to reimplement things within spi.c would be
> to redefine SPI_prepare as just doing the parse-and-rewrite steps, with
> planning always postponed to SPI_execute.  In the case where you just
> prepare and then execute a SPIPlan, this would come out the same or
> better, since we'd still just do one planning cycle, but the planner could
> be given the actual parameter values to use.  However, if you SPI_prepare,
> SPI_saveplan, and then SPI_execute many times, you might come out behind.
> This is of course the same tradeoff we are going to impose at the SQL level
> anyway, but I wonder whether there needs to be a control knob available to
> C code to retain the old plan-once-and-always-use-that-plan approach.

Would there ultimately be a difference between the way SPI_prepare and
PQprepare work? It seems like the needs would be about the same, so I
think we should be consistent.

Also, I assume that SPI_execute and PQexecParams would always force a
custom plan, just like always, right?

A control knob sounds limited. For instance, what if the application
knows that some parameters will be constant over the time that the plan
is saved? It would be nice to be able to bind some parameters to come up
with a generic (but less generic) plan, and then execute it many times.
Right now that can only be done by inlining such constants in the SQL,
which is what we want to avoid.

I'm a little bothered by "prepare" sometimes planning and sometimes not
(and, by implication, "execute_plan" sometimes planning and sometimes
not). It seems cleaner to just separate the steps into parse+rewrite,
bind parameters, plan (with whatever parameters are present, giving a
more generic plan when some aren't specified), and execute (which would
require you to specify any parameters not bound yet). Maybe we don't
need to expose all of those steps (although maybe we do), but it would
be nice if the API we do offer resembles those steps.

Regards,Jeff Davis



Re: Transient plans versus the SPI API

От
Jeff Davis
Дата:
On Wed, 2011-08-03 at 12:19 -0400, Tom Lane wrote:
> Of course we could address the worst cases by providing some mechanism
> to tell the plancache code "always use a generic plan for this query"
> or "always use a custom plan".  I'm not entirely thrilled with that,
> because it's effectively a planner hint and has got the same problems
> as all planner hints, namely that users are likely to get it wrong.

I'm not entirely convinced by that. It's fairly challenging for a human
to choose a good plan for a moderately complex SQL query, and its much
more likely that the plan will become a bad one over time. But, in many
cases, a developer knows if they simply don't care about planning time,
and are willing to always replan.

Also, we have a fairly reasonable model for planning SQL queries, but
I'm not sure that the model for determining whether to replan a SQL
query is quite as clear. Simon brought up some useful points along these
lines.

Regards,Jeff Davis



Re: Transient plans versus the SPI API

От
Jeff Davis
Дата:
On Wed, 2011-08-03 at 13:07 -0400, Robert Haas wrote:
> A little OT here, but (as I think Simon said elsewhere) I think we
> really ought to be considering the table statistics when deciding
> whether or not to replan.  It seems to me that the overwhelmingly
> common case where this is going to come up is when (some subset of)
> the MCVs require a different plan than run-of-the-mill values.  It
> would be nice to somehow work that out.

That blurs the line a little bit. It sounds like this might be described
as "incremental planning", and perhaps that's a good way to think about
it.

Regards,Jeff Davis



Re: Transient plans versus the SPI API

От
Dimitri Fontaine
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> I think we'll be a lot better off with the framework discussed last
> year: build a generic plan, as well as custom plans for the first few
> sets of parameter values, and then observe whether there's a significant
> reduction in estimated costs for the custom plans.

Another way here would be to cache more than a single plan and to keep
execution time samples or some other relevant runtime characteristics.
Then what we need would be a way to switch from a plan to another at run
time on some conditions, like realizing that the reason why the planner
thought a nestloop would be perfect is obviously wrong, or maybe just
based on runtime characteristics.

> But in any case, it's way premature to be debating this until we have
> the infrastructure in which we can experiment with different policies.

That too.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: Transient plans versus the SPI API

От
Dimitri Fontaine
Дата:
Jeff Davis <pgsql@j-davis.com> writes:
> A control knob sounds limited. For instance, what if the application
> knows that some parameters will be constant over the time that the plan
> is saved? It would be nice to be able to bind some parameters to come up
> with a generic (but less generic) plan, and then execute it many times.
> Right now that can only be done by inlining such constants in the SQL,
> which is what we want to avoid.

+1

I was already thinking in those term at the application level for the
example I've been using before in this thread, and only reading your
mail I realize that maybe the backend should be able to do that itself.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: Transient plans versus the SPI API

От
Hannu Krosing
Дата:
On Wed, 2011-08-03 at 15:19 -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > This seems like a good design.  Now what would be really cool is if
> > you could observe a stream of queries like this:
> 
> > SELECT a, b FROM foo WHERE c = 123
> > SELECT a, b FROM foo WHERE c = 97
> > SELECT a, b FROM foo WHERE c = 236
> 
> > ...and say, hey, I could just make a generic plan and use it every
> > time I see one of these.  It's not too clear to me how you'd make
> > recognition of such queries cheap enough to be practical, but maybe
> > someone will think of a way...
> 
> Hm, you mean reverse-engineering the parameterization of the query?

Yes, basically re-generate the query after (or while) parsing, replacing
constants and arguments with another set of generated arguments and
printing the list of these arguments at the end. It may be easiest to do
This in parallel with parsing.

> Interesting thought, but I really don't see a way to make it practical.

Another place where this could be really useful is logging & monitoring

If there were an option to log the above queries as 

"SELECT a, b FROM foo WHERE c = $1", (123)
"SELECT a, b FROM foo WHERE c = $1", (97)
"SELECT a, b FROM foo WHERE c = $1", (236)

it would make all kinds of general performance monitoring tasks also
much easier, not to mention that this forw would actually be something
that kan be cached internally.

For some users this might even be worth to use this feature alone,
without it providing Repeating Plan Recognition.

> In any case, it would amount to making up for a bad decision on the
> application side, ie, not transmitting the query in the parameterized
> form that presumably exists somewhere in the application.  I think
> we'd be better served all around by encouraging app developers to rely
> more heavily on parameterized queries ... but first we have to fix the
> performance risks there.
> 
>             regards, tom lane
> 




Re: Transient plans versus the SPI API

От
Hannu Krosing
Дата:
On Sun, 2011-08-07 at 11:15 +0200, Hannu Krosing wrote:
> On Wed, 2011-08-03 at 15:19 -0400, Tom Lane wrote:

> > Hm, you mean reverse-engineering the parameterization of the query?
> 
> Yes, basically re-generate the query after (or while) parsing, replacing
> constants and arguments with another set of generated arguments and
> printing the list of these arguments at the end. It may be easiest to do
> This in parallel with parsing.
> 
> > Interesting thought, but I really don't see a way to make it practical.
> 
> Another place where this could be really useful is logging & monitoring
> 
> If there were an option to log the above queries as 
> 
> "SELECT a, b FROM foo WHERE c = $1", (123)
> "SELECT a, b FROM foo WHERE c = $1", (97)
> "SELECT a, b FROM foo WHERE c = $1", (236)

The main monitoring use_case would be pg_stat_statements,
http://developer.postgresql.org/pgdocs/postgres/pgstatstatements.html
which is currently pretty useless for queries without parameters 

> it would make all kinds of general performance monitoring tasks also
> much easier, not to mention that this forw would actually be something
> that kan be cached internally.
> 
> For some users this might even be worth to use this feature alone,
> without it providing Repeating Plan Recognition.
> 
> > In any case, it would amount to making up for a bad decision on the
> > application side, ie, not transmitting the query in the parameterized
> > form that presumably exists somewhere in the application.  I think
> > we'd be better served all around by encouraging app developers to rely
> > more heavily on parameterized queries ... but first we have to fix the
> > performance risks there.
> > 
> >             regards, tom lane
> > 
> 
> 
> 




Re: Transient plans versus the SPI API

От
Simon Riggs
Дата:
On Sat, Aug 6, 2011 at 7:29 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> I think we'll be a lot better off with the framework discussed last
>> year: build a generic plan, as well as custom plans for the first few
>> sets of parameter values, and then observe whether there's a significant
>> reduction in estimated costs for the custom plans.
>
> Another way here would be to cache more than a single plan and to keep
> execution time samples or some other relevant runtime characteristics.
> Then what we need would be a way to switch from a plan to another at run
> time on some conditions, like realizing that the reason why the planner
> thought a nestloop would be perfect is obviously wrong, or maybe just
> based on runtime characteristics.

Tom and I discussed storing multiple sub-plans on a node back in '05
IIRC, and Tom later put in support for that.

That wasn't followed up on.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Transient plans versus the SPI API

От
Anssi Kääriäinen
Дата:
On 08/07/2011 12:25 PM, Hannu Krosing wrote:
> On Sun, 2011-08-07 at 11:15 +0200, Hannu Krosing wrote:
>> On Wed, 2011-08-03 at 15:19 -0400, Tom Lane wrote:
>>> Hm, you mean reverse-engineering the parameterization of the query?
>> Yes, basically re-generate the query after (or while) parsing, replacing
>> constants and arguments with another set of generated arguments and
>> printing the list of these arguments at the end. It may be easiest to do
>> This in parallel with parsing.
>>
>>> Interesting thought, but I really don't see a way to make it practical.
>> Another place where this could be really useful is logging&  monitoring
>>
>> If there were an option to log the above queries as
>>
>> "SELECT a, b FROM foo WHERE c = $1", (123)
>> "SELECT a, b FROM foo WHERE c = $1", (97)
>> "SELECT a, b FROM foo WHERE c = $1", (236)
> The main monitoring use_case would be pg_stat_statements,
> http://developer.postgresql.org/pgdocs/postgres/pgstatstatements.html
> which is currently pretty useless for queries without parameters

I was trying to implement something similar for pgpool-II. The user 
could configure queries for which cached plans are wanted. The 
configuration would have been a file containing lines in format "SELECT 
* FROM foo WHERE id = ?". I did not get anything implemented, as there 
were some problems. The problems were mainly with DEALLOCATE ALL called 
without pgpool-II knowing it, issues with search_path and the amount of 
work needed to implement parse tree matching.

It would be interesting if pg_stat_statements would be globally 
available with queries using generic arguments. First, there would be an 
obvious heuristic for when to cache the plan: If the average runtime of 
the query is much larger than the average planning time, there is no 
point in caching the plan. This would also give one option for cache hit 
estimation. The hit_percent is directly available. On the other hand 
pg_stat_statements could easily become a choke-point.

I would love to work on this, but I lack the needed skills. Maybe I 
could take another try for writing a proof-of-concept parse tree 
transformer and matcher, but I doubt I can produce anything useful.
 - Anssi


Re: Transient plans versus the SPI API

От
Hannu Krosing
Дата:
On Mon, 2011-08-08 at 11:39 +0300, Anssi Kääriäinen wrote:
> On 08/07/2011 12:25 PM, Hannu Krosing wrote:
> > On Sun, 2011-08-07 at 11:15 +0200, Hannu Krosing wrote:
> >> On Wed, 2011-08-03 at 15:19 -0400, Tom Lane wrote:
> >>> Hm, you mean reverse-engineering the parameterization of the query?
> >> Yes, basically re-generate the query after (or while) parsing, replacing
> >> constants and arguments with another set of generated arguments and
> >> printing the list of these arguments at the end. It may be easiest to do
> >> This in parallel with parsing.
> >>
> >>> Interesting thought, but I really don't see a way to make it practical.
> >> Another place where this could be really useful is logging&  monitoring
> >>
> >> If there were an option to log the above queries as
> >>
> >> "SELECT a, b FROM foo WHERE c = $1", (123)
> >> "SELECT a, b FROM foo WHERE c = $1", (97)
> >> "SELECT a, b FROM foo WHERE c = $1", (236)
> > The main monitoring use_case would be pg_stat_statements,
> > http://developer.postgresql.org/pgdocs/postgres/pgstatstatements.html
> > which is currently pretty useless for queries without parameters
>
> I was trying to implement something similar for pgpool-II. The user
> could configure queries for which cached plans are wanted. The
> configuration would have been a file containing lines in format "SELECT
> * FROM foo WHERE id = ?". I did not get anything implemented, as there
> were some problems. The problems were mainly with DEALLOCATE ALL called
> without pgpool-II knowing it, issues with search_path and the amount of
> work needed to implement parse tree matching.
>
> It would be interesting if pg_stat_statements would be globally
> available with queries using generic arguments. First, there would be an
> obvious heuristic for when to cache the plan: If the average runtime of
> the query is much larger than the average planning time, there is no
> point in caching the plan. This would also give one option for cache hit
> estimation. The hit_percent is directly available. On the other hand
> pg_stat_statements could easily become a choke-point.
>
> I would love to work on this, but I lack the needed skills. Maybe I
> could take another try for writing a proof-of-concept parse tree
> transformer and matcher, but I doubt I can produce anything useful.

That is why I think it is best done in the main parser - it has to parse
and analyse the query anyway and likely knows which constants are
"arguments" to the query.

If doing it outside the main backend parser, it would be best to still
use the postgreSQL lex/bison files as much as possible for this.


--
-------
Hannu Krosing
PostgreSQL Unlimited Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/



Re: Transient plans versus the SPI API

От
Anssi Kääriäinen
Дата:
On 08/08/2011 01:07 PM, Hannu Krosing wrote:
> That is why I think it is best done in the main parser - it has to parse
> and analyse the query anyway and likely knows which constants are
> "arguments" to the query
As far as I understand the problem, the parsing must transform table 
references to schema-qualified references. The table_foobar in "SELECT * 
FROM table_foobar WHERE id = ?" is not enough to identify a table. Using 
search_path, query_str as a key is one possibility, but the search_path 
is likely to be different for each user, and this could result in the 
same query being cached multiple times.

By the way, I checked current Git HEAD and pg_stat_statements seems to 
not handle search_path correctly. For pg_stat_statements this is not 
critical, but if the raw query string is used as plan cache key things 
will obviously break...
 - Anssi


Re: Transient plans versus the SPI API

От
Dimitri Fontaine
Дата:
Hannu Krosing <hannu@krosing.net> writes:
>> Hm, you mean reverse-engineering the parameterization of the query?
>
> Yes, basically re-generate the query after (or while) parsing, replacing
> constants and arguments with another set of generated arguments and
> printing the list of these arguments at the end. It may be easiest to do
> This in parallel with parsing.
>
>> Interesting thought, but I really don't see a way to make it practical.
>
> Another place where this could be really useful is logging & monitoring

Another big use case is full support for materialized views: we could
then optimize a query to automatically use a matview even when written
against the “usual” schema.  Now matviews are another kind of indexes.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: Transient plans versus the SPI API

От
Bruce Momjian
Дата:
Tom Lane wrote:
> > Note that the SPI functions are more or less directly exposed in PL/Perl
> > and PL/Python, and there are a number of existing idioms there that make
> > use of prepared plans.  Changing the semantics of those functions might
> > upset a lot of code.
> 
> Right, but by the same token, if we don't change the default behavior,
> there is going to be a heck of a lot of code requiring manual adjustment
> before it can make use of the (hoped-to-be) improvements.  To me it
> makes more sense to change the default and then provide ways for people
> to lock down the behavior if the heuristic doesn't work for them.

Agreed.  I think the big sticking point is that without logic on how the
replanning will happen, users are having to guess how much impact this
new default behavior will have.  I also agree that this will harm some
uses but improve a larger pool of users.  Remember, the people on this
email list are probably using this feature in a much more sophisticated
way than the average user.

Also, there is a TODO idea that the results found by executing the query
(e.g. number of rows returned at each stage) could be fed back and
affect the replanning of queries.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Transient plans versus the SPI API

От
Tom Lane
Дата:
[ getting back to the planner finally ]

Simon Riggs <simon@2ndQuadrant.com> writes:
> On Wed, Aug 3, 2011 at 8:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Simon Riggs <simon@2ndQuadrant.com> writes:
>>> I think its possible to tell automatically whether we need to replan
>>> always or not based upon the path we take through selectivity
>>> functions.

>> I don't really believe that, or at least I think it would only detect a
>> few cases.

> The problem there is which executions we build custom plans for. That
> turns the problem into a sampling issue and you'll only fix the
> problems that occur with a frequency to match your sampling pattern
> and rate. Examples of situations where it won't help.

Sure, this is not going to solve every problem we have with the
planner.  What it is intended to solve is cases where someone is trying
to use the prepared-plan mechanisms but he would be a lot better off
with parameter-value-specific plans.  In particular:

> * plans that vary by table size will be about the same in the first 5
> executions. After large number of executions, things go bad.

This is a red herring.  The plancache code already arranges to replan
every time the relevant table stats are updated by autovacuum, which
should certainly happen from time to time if the table's contents are
changing materially.  If you're thinking in terms of plans being "stale"
then you're worrying about a long-since-solved problem.
        regards, tom lane


Re: Transient plans versus the SPI API

От
Simon Riggs
Дата:
On Fri, Aug 19, 2011 at 6:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> [ getting back to the planner finally ]
>
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On Wed, Aug 3, 2011 at 8:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Simon Riggs <simon@2ndQuadrant.com> writes:
>>>> I think its possible to tell automatically whether we need to replan
>>>> always or not based upon the path we take through selectivity
>>>> functions.
>
>>> I don't really believe that, or at least I think it would only detect a
>>> few cases.
>
>> The problem there is which executions we build custom plans for. That
>> turns the problem into a sampling issue and you'll only fix the
>> problems that occur with a frequency to match your sampling pattern
>> and rate. Examples of situations where it won't help.
>
> Sure, this is not going to solve every problem we have with the
> planner.  What it is intended to solve is cases where someone is trying
> to use the prepared-plan mechanisms but he would be a lot better off
> with parameter-value-specific plans.  In particular:

I just realised this is exactly the same strategy as the
no-longer-used JDBC parameter prepareThreshold.

If we treat this in a similar way. prepare_threshold currently = 0 and
you are suggesting we move the value to 5. OK.

Will this be an actual parameter? If so, it removes my objection
because I can turn it off. What would be even better would be some
other controls, like a plugin that allows us to control the mechanism
or at least experiment with it.

Maybe we can assemble enough evidence to remove it before release.

I've been arguing it won't solve all problems. It won't. But if it
solves some, so its worth having.


On another point, I'd still like a "one-shot plan" flag, so that we
can act on that knowledge and have various pieces of code take
decisions that override the plan cache. i.e. if the plan screws up
during execution we can mark the plan as a one shot so it isn't
reused.


>> * plans that vary by table size will be about the same in the first 5
>> executions. After large number of executions, things go bad.
>
> This is a red herring.  The plancache code already arranges to replan
> every time the relevant table stats are updated by autovacuum, which
> should certainly happen from time to time if the table's contents are
> changing materially.  If you're thinking in terms of plans being "stale"
> then you're worrying about a long-since-solved problem.

Fair enough.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services