Обсуждение: Caching query plan costs

Поиск
Список
Период
Сортировка

Caching query plan costs

От
Bruce Momjian
Дата:
This email thread from last year:

    https://www.postgresql.org/message-id/flat/CAMjNa7cC4X9YR-vAJS-jSYCajhRDvJQnN7m2sLH1wLh-_Z2bsw%40mail.gmail.com

got stuck on when to check for certain optimizations in the planner.  It
was discussed that knowing a plan was going to be expensive _before_
planning could help determine when to try certain expensive
optimizations.

What if we globally or locally cache the _cost_ of plans, so we can
consult that cache before planning and enable certain optimizations? It
also might be useful to cache some of the unexpected things found during
execution, e.g. inaccurate cost estimates.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: Caching query plan costs

От
Tom Lane
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> What if we globally or locally cache the _cost_ of plans, so we can
> consult that cache before planning and enable certain optimizations?

But what would you use as cache key?  And how's this help if we haven't
seen a similar query before in the session?

            regards, tom lane


Re: Caching query plan costs

От
Bruce Momjian
Дата:
On Mon, Sep  3, 2018 at 01:30:33PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > What if we globally or locally cache the _cost_ of plans, so we can
> > consult that cache before planning and enable certain optimizations?
> 
> But what would you use as cache key?  And how's this help if we haven't

Uh, I assume we would do what pg_stat_statements does and remove the
constants an hash that.

> seen a similar query before in the session?

Well, if it was global we could use output from another session.

I guess my point is that this only used to turn on micro-optimizations
and maybe parallelism and JIT, so it doesn't have to be 100% accurate.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: Caching query plan costs

От
Andres Freund
Дата:

On September 3, 2018 11:33:35 AM PDT, Bruce Momjian <bruce@momjian.us> wrote:
>On Mon, Sep  3, 2018 at 01:30:33PM -0400, Tom Lane wrote:
>> Bruce Momjian <bruce@momjian.us> writes:
>> > What if we globally or locally cache the _cost_ of plans, so we can
>> > consult that cache before planning and enable certain
>optimizations?
>>
>> But what would you use as cache key?  And how's this help if we
>haven't
>
>Uh, I assume we would do what pg_stat_statements does and remove the
>constants an hash that.

That's not particularly cheap... Constants heavily influence planning choices, so I don't think they actually could be
removed.


>> seen a similar query before in the session?
>
>Well, if it was global we could use output from another session.
>
>I guess my point is that this only used to turn on micro-optimizations
>and maybe parallelism

What kind of micro opts are you thinking of? The cases I remember are more in the vein of doing additional complex
optimizations(join removal, transforming ORs into UNION, more complex analysis of predicates...). 

Parallelism would definitely benefit from earlier knowledge, although I suspect some base rel analysis might be more
realistic,because it's far from guaranteed that queries are ever repeated in a similar enough manner. 



>  and JIT, so it doesn't have to be 100% accurate.

JIT decision is done after main planning, so we know the cost.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: Caching query plan costs

От
Bruce Momjian
Дата:
On Mon, Sep 3, 2018 at 11:42:31AM -0700, Andres Freund wrote:
> On September 3, 2018 11:33:35 AM PDT, Bruce Momjian <bruce@momjian.us>
> wrote:
> >On Mon, Sep 3, 2018 at 01:30:33PM -0400, Tom Lane wrote:
> >> Bruce Momjian <bruce@momjian.us> writes:
> >> > What if we globally or locally cache the _cost_ of plans, so we
> >> > can consult that cache before planning and enable certain
> >optimizations?
> >>
> >> But what would you use as cache key?  And how's this help if we
> >haven't
> >
> >Uh, I assume we would do what pg_stat_statements does and remove the
> >constants an hash that.
>
> That's not particularly cheap... Constants heavily influence planning
> choices, so I don't think they actually could be removed.

Oh.

> >> seen a similar query before in the session?
> >
> >Well, if it was global we could use output from another session.
> >
> >I guess my point is that this only used to turn on
> >micro-optimizations and maybe parallelism
>
> What kind of micro opts are you thinking of? The cases I remember
> are more in the vein of doing additional complex optimizations (join
> removal, transforming ORs into UNION, more complex analysis of
> predicates...).
>
> Parallelism would definitely benefit from earlier knowledge, although
> I suspect some base rel analysis might be more realistic, because it's
> far from guaranteed that queries are ever repeated in a similar enough
> manner.

Yes, no question that we would need something that could detect a
sufficient percentage of previous queries.

> >  and JIT, so it doesn't have to be 100% accurate.
>
> JIT decision is done after main planning, so we know the cost.

Well, as I remember, we are considering disabling JIT in PG 11 because
of the use of fixed costs to trigger it.  Could executor information
help decide to use JIT?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: Caching query plan costs

От
Tomas Vondra
Дата:
On 09/03/2018 08:56 PM, Bruce Momjian wrote:
> On Mon, Sep 3, 2018 at 11:42:31AM -0700, Andres Freund wrote:
>> On September 3, 2018 11:33:35 AM PDT, Bruce Momjian <bruce@momjian.us>
>> wrote:
>>> On Mon, Sep 3, 2018 at 01:30:33PM -0400, Tom Lane wrote:
>>>> Bruce Momjian <bruce@momjian.us> writes:
>>>>> What if we globally or locally cache the _cost_ of plans, so we
>>>>> can consult that cache before planning and enable certain
>>> optimizations?
>>>>
>>>> But what would you use as cache key?  And how's this help if we
>>> haven't
>>>
>>> Uh, I assume we would do what pg_stat_statements does and remove the
>>> constants an hash that.
>>
>> That's not particularly cheap... Constants heavily influence planning
>> choices, so I don't think they actually could be removed.
> 
> Oh.
> 

Yeah, it doesn't really tell you the cost for the plan, because a single
query string may use vastly different plans for different constants.
Which pretty much is why we have pg_stat_plans.

Imagine a query that matches 99% of the table for one value and 1% for
another one. That's going to produce rather different plans for each
(say, seqscan vs. index scan), with very different costs.

>>>> seen a similar query before in the session?
>>>
>>> Well, if it was global we could use output from another session.
>>>
>>> I guess my point is that this only used to turn on
>>> micro-optimizations and maybe parallelism
>>
>> What kind of micro opts are you thinking of? The cases I remember
>> are more in the vein of doing additional complex optimizations (join
>> removal, transforming ORs into UNION, more complex analysis of
>> predicates...).
>>
>> Parallelism would definitely benefit from earlier knowledge, although
>> I suspect some base rel analysis might be more realistic, because it's
>> far from guaranteed that queries are ever repeated in a similar enough
>> manner.
> 
> Yes, no question that we would need something that could detect a
> sufficient percentage of previous queries.
> 
>>>  and JIT, so it doesn't have to be 100% accurate.
>>
>> JIT decision is done after main planning, so we know the cost.
> 
> Well, as I remember, we are considering disabling JIT in PG 11 because
> of the use of fixed costs to trigger it.  Could executor information
> help decide to use JIT?
> 

Isn't that issue more about what is the right default threshold, rather
than using fixed costs in principle?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Caching query plan costs

От
Andres Freund
Дата:
On 2018-09-03 14:56:28 -0400, Bruce Momjian wrote:
> On Mon, Sep 3, 2018 at 11:42:31AM -0700, Andres Freund wrote:
> > >  and JIT, so it doesn't have to be 100% accurate.
> >
> > JIT decision is done after main planning, so we know the cost.
> 
> Well, as I remember, we are considering disabling JIT in PG 11 because
> of the use of fixed costs to trigger it.  Could executor information
> help decide to use JIT?

I don't think so. The issues with JIT planning are more that it's
costing is simplistic (for good-ish reason, to avoid increasing the
number of plans), and that there's no caching (lots of infrastructure
work needed).

Greetings,

Andres Freund


Re: Caching query plan costs

От
Bruce Momjian
Дата:
On Mon, Sep  3, 2018 at 02:53:59PM -0700, Andres Freund wrote:
> On 2018-09-03 14:56:28 -0400, Bruce Momjian wrote:
> > On Mon, Sep 3, 2018 at 11:42:31AM -0700, Andres Freund wrote:
> > > >  and JIT, so it doesn't have to be 100% accurate.
> > >
> > > JIT decision is done after main planning, so we know the cost.
> > 
> > Well, as I remember, we are considering disabling JIT in PG 11 because
> > of the use of fixed costs to trigger it.  Could executor information
> > help decide to use JIT?
> 
> I don't think so. The issues with JIT planning are more that it's
> costing is simplistic (for good-ish reason, to avoid increasing the
> number of plans), and that there's no caching (lots of infrastructure
> work needed).

Uh, yeah, that was my question.  If we knew the cost was high before we
plan, could we realistically increase the number of plans to avoid the
cost-trigger issue?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: Caching query plan costs

От
Andres Freund
Дата:

On September 3, 2018 3:01:29 PM PDT, Bruce Momjian <bruce@momjian.us> wrote:
>On Mon, Sep  3, 2018 at 02:53:59PM -0700, Andres Freund wrote:
>> On 2018-09-03 14:56:28 -0400, Bruce Momjian wrote:
>> > On Mon, Sep 3, 2018 at 11:42:31AM -0700, Andres Freund wrote:
>> > > >  and JIT, so it doesn't have to be 100% accurate.
>> > >
>> > > JIT decision is done after main planning, so we know the cost.
>> >
>> > Well, as I remember, we are considering disabling JIT in PG 11
>because
>> > of the use of fixed costs to trigger it.  Could executor
>information
>> > help decide to use JIT?
>>
>> I don't think so. The issues with JIT planning are more that it's
>> costing is simplistic (for good-ish reason, to avoid increasing the
>> number of plans), and that there's no caching (lots of infrastructure
>> work needed).
>
>Uh, yeah, that was my question.  If we knew the cost was high before we
>plan, could we realistically increase the number of plans to avoid the
>cost-trigger issue?

I think there are much more pressing / more general things to do. Caching of JITed "hunks" and scaling the cost with
thenumber of JITed functions rather than one global cost.  Having to run queries multiple times for good plans just
isn'tthat interesting IMO. Especially for analytics queries, where JIT is interesting. 

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: Caching query plan costs

От
Bruce Momjian
Дата:
On Mon, Sep 3, 2018 at 04:13:40PM -0700, Andres Freund wrote:
> On September 3, 2018 3:01:29 PM PDT, Bruce Momjian <bruce@momjian.us>
> wrote:
> >On Mon, Sep 3, 2018 at 02:53:59PM -0700, Andres Freund wrote:
> >> On 2018-09-03 14:56:28 -0400, Bruce Momjian wrote:
> >> > On Mon, Sep 3, 2018 at 11:42:31AM -0700, Andres Freund wrote:
> >> > > >  and JIT, so it doesn't have to be 100% accurate.
> >> > >
> >> > > JIT decision is done after main planning, so we know the cost.
> >> I don't think so. The issues with JIT planning are more that it's
> >> costing is simplistic (for good-ish reason, to avoid increasing
> >> the number of plans), and that there's no caching (lots of
> >> infrastructure work needed).
> >
> >Uh, yeah, that was my question.  If we knew the cost was high before
> >we plan, could we realistically increase the number of plans to avoid
> >the cost-trigger issue?
>
> I think there are much more pressing / more general things to
> do. Caching of JITed "hunks" and scaling the cost with the number
> of JITed functions rather than one global cost.  Having to run
> queries multiple times for good plans just isn't that interesting
> IMO. Especially for analytics queries, where JIT is interesting.

I agree it isn't useful for JIT alone but if it can be used for multiple
purposes, it might be worth it.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +