Обсуждение: Getting rid of cheap-startup-cost paths earlier

Поиск
Список
Период
Сортировка

Getting rid of cheap-startup-cost paths earlier

От
Tom Lane
Дата:
Currently, the planner keeps paths that appear to win on the grounds of
either cheapest startup cost or cheapest total cost.  It suddenly struck
me that in many simple cases (viz, those with no LIMIT, EXISTS, cursor
fast-start preference, etc) we could know a-priori that cheapest startup
cost is not going to be interesting, and hence immediately discard any
path that doesn't win on total cost.

This would require some additional logic to detect whether the case
applies, as well as extra complexity in add_path.  So it's possible
that it wouldn't be worthwhile overall.  Still, it seems like it might
be a useful idea to investigate.

Thoughts?
        regards, tom lane



Re: Getting rid of cheap-startup-cost paths earlier

От
Simon Riggs
Дата:
On 22 May 2012 06:50, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Currently, the planner keeps paths that appear to win on the grounds of
> either cheapest startup cost or cheapest total cost.  It suddenly struck
> me that in many simple cases (viz, those with no LIMIT, EXISTS, cursor
> fast-start preference, etc) we could know a-priori that cheapest startup
> cost is not going to be interesting, and hence immediately discard any
> path that doesn't win on total cost.

My experience is that most people don't provide a LIMIT explicitly
even when they know that's the desired behaviour. That's because
either they simply don't understand that SQL can return lots of rows,
or SQL knowledge isn't enough, or worse that people don't even know
that specifying it would alter query plans.

Regrettably the current planning of LIMIT clauses causes more problems
so in many cases these have been explicitly removed from SQL by
developers that know how many rows they wish to see.

I would have proposed a default-LIMIT parameter before now, but for
that last point.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Getting rid of cheap-startup-cost paths earlier

От
PostgreSQL - Hans-Jürgen Schönig
Дата:
On May 22, 2012, at 9:57 AM, Simon Riggs wrote:

> On 22 May 2012 06:50, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> Currently, the planner keeps paths that appear to win on the grounds of
>> either cheapest startup cost or cheapest total cost.  It suddenly struck
>> me that in many simple cases (viz, those with no LIMIT, EXISTS, cursor
>> fast-start preference, etc) we could know a-priori that cheapest startup
>> cost is not going to be interesting, and hence immediately discard any
>> path that doesn't win on total cost.
>
> My experience is that most people don't provide a LIMIT explicitly
> even when they know that's the desired behaviour. That's because
> either they simply don't understand that SQL can return lots of rows,
> or SQL knowledge isn't enough, or worse that people don't even know
> that specifying it would alter query plans.
>
> Regrettably the current planning of LIMIT clauses causes more problems
> so in many cases these have been explicitly removed from SQL by
> developers that know how many rows they wish to see.
>
> I would have proposed a default-LIMIT parameter before now, but for
> that last point.


this sounds like a total disaster to me ...
why in the world should we have a default LIMIT parameter?
i guess if somebody is not able to use LIMIT he should better not touch the DB.
we clearly cannot fix incompetence by adding parameters.
regards,
    hans


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de



Re: Getting rid of cheap-startup-cost paths earlier

От
Simon Riggs
Дата:
On 22 May 2012 12:12, PostgreSQL - Hans-Jürgen Schönig
<postgres@cybertec.at> wrote:
>
> On May 22, 2012, at 9:57 AM, Simon Riggs wrote:
>
>> On 22 May 2012 06:50, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>>> Currently, the planner keeps paths that appear to win on the grounds of
>>> either cheapest startup cost or cheapest total cost.  It suddenly struck
>>> me that in many simple cases (viz, those with no LIMIT, EXISTS, cursor
>>> fast-start preference, etc) we could know a-priori that cheapest startup
>>> cost is not going to be interesting, and hence immediately discard any
>>> path that doesn't win on total cost.
>>
>> My experience is that most people don't provide a LIMIT explicitly
>> even when they know that's the desired behaviour. That's because
>> either they simply don't understand that SQL can return lots of rows,
>> or SQL knowledge isn't enough, or worse that people don't even know
>> that specifying it would alter query plans.
>>
>> Regrettably the current planning of LIMIT clauses causes more problems
>> so in many cases these have been explicitly removed from SQL by
>> developers that know how many rows they wish to see.
>>
>> I would have proposed a default-LIMIT parameter before now, but for
>> that last point.
>
>
> this sounds like a total disaster to me ...
> why in the world should we have a default LIMIT parameter?

Its common for people to write SQL they want to have a fast response
to, yet to not specify that in their SQL code.

> i guess if somebody is not able to use LIMIT he should better not touch the DB.
> we clearly cannot fix incompetence by adding parameters.

Not sure I understand that argument. Many things are set by default in
parameters.

Oracle allows the user to choose whether plans return rows quickly, or
execute plans efficiently overall. That is a relevant user choice, not
a hint. We simply don't know that if LIMIT is absent whether the user
wishes fast or efficient plans. If they use LIMIT we know they want
fast plan.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Getting rid of cheap-startup-cost paths earlier

От
Robert Haas
Дата:
On Tue, May 22, 2012 at 1:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Currently, the planner keeps paths that appear to win on the grounds of
> either cheapest startup cost or cheapest total cost.  It suddenly struck
> me that in many simple cases (viz, those with no LIMIT, EXISTS, cursor
> fast-start preference, etc) we could know a-priori that cheapest startup
> cost is not going to be interesting, and hence immediately discard any
> path that doesn't win on total cost.
>
> This would require some additional logic to detect whether the case
> applies, as well as extra complexity in add_path.  So it's possible
> that it wouldn't be worthwhile overall.  Still, it seems like it might
> be a useful idea to investigate.
>
> Thoughts?

Yeah, I think we should investigate that.  Presumably you could easily
have a situation where one part of the tree is under a LIMIT or EXISTS
and therefore needs to preserve fast-start plans but the rest of the
(potentially large) tree isn't, so we need something fairly
fine-grained, I think.  Maybe we could add a flag to each RelOptInfo
indicating whether fast-start plans should be kept, or something like
that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Getting rid of cheap-startup-cost paths earlier

От
Bruce Momjian
Дата:
On Tue, May 22, 2012 at 08:29:48AM -0400, Robert Haas wrote:
> On Tue, May 22, 2012 at 1:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Currently, the planner keeps paths that appear to win on the grounds of
> > either cheapest startup cost or cheapest total cost.  It suddenly struck
> > me that in many simple cases (viz, those with no LIMIT, EXISTS, cursor
> > fast-start preference, etc) we could know a-priori that cheapest startup
> > cost is not going to be interesting, and hence immediately discard any
> > path that doesn't win on total cost.
> >
> > This would require some additional logic to detect whether the case
> > applies, as well as extra complexity in add_path.  So it's possible
> > that it wouldn't be worthwhile overall.  Still, it seems like it might
> > be a useful idea to investigate.
> >
> > Thoughts?
> 
> Yeah, I think we should investigate that.  Presumably you could easily
> have a situation where one part of the tree is under a LIMIT or EXISTS
> and therefore needs to preserve fast-start plans but the rest of the
> (potentially large) tree isn't, so we need something fairly
> fine-grained, I think.  Maybe we could add a flag to each RelOptInfo
> indicating whether fast-start plans should be kept, or something like
> that.

Is this a TODO?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: Getting rid of cheap-startup-cost paths earlier

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, May 22, 2012 at 1:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Currently, the planner keeps paths that appear to win on the grounds of
>> either cheapest startup cost or cheapest total cost.  It suddenly struck
>> me that in many simple cases (viz, those with no LIMIT, EXISTS, cursor
>> fast-start preference, etc) we could know a-priori that cheapest startup
>> cost is not going to be interesting, and hence immediately discard any
>> path that doesn't win on total cost.
>> 
>> This would require some additional logic to detect whether the case
>> applies, as well as extra complexity in add_path.  So it's possible
>> that it wouldn't be worthwhile overall.  Still, it seems like it might
>> be a useful idea to investigate.
>> 
>> Thoughts?

> Yeah, I think we should investigate that.  Presumably you could easily
> have a situation where one part of the tree is under a LIMIT or EXISTS
> and therefore needs to preserve fast-start plans but the rest of the
> (potentially large) tree isn't, so we need something fairly
> fine-grained, I think.  Maybe we could add a flag to each RelOptInfo
> indicating whether fast-start plans should be kept, or something like
> that.

I got around to looking at this finally.  It turns out to be a big win,
at least for queries without any LIMIT or other reason to worry about
fast-start plans.

As things currently stand, there isn't any reason to control the
decision at finer than per-subquery level.  I did it the way you suggest
above anyway, with a per-RelOptInfo flag, because add_path() is passed
only a RelOptInfo and not the PlannerInfo root structure.  We could
have changed that of course, but it would have meant changing an API
used by FDWs, which would be annoying.  It seems possible that in future
somebody might think of a way to determine this on a per-relation level,
so I thought this design might be a bit more future-proof anyway.
        regards, tom lane



Re: Getting rid of cheap-startup-cost paths earlier

От
Bruce Momjian
Дата:
On Sat, Sep  1, 2012 at 06:23:59PM -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Tue, May 22, 2012 at 1:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Currently, the planner keeps paths that appear to win on the grounds of
> >> either cheapest startup cost or cheapest total cost.  It suddenly struck
> >> me that in many simple cases (viz, those with no LIMIT, EXISTS, cursor
> >> fast-start preference, etc) we could know a-priori that cheapest startup
> >> cost is not going to be interesting, and hence immediately discard any
> >> path that doesn't win on total cost.
> >> 
> >> This would require some additional logic to detect whether the case
> >> applies, as well as extra complexity in add_path.  So it's possible
> >> that it wouldn't be worthwhile overall.  Still, it seems like it might
> >> be a useful idea to investigate.
> >> 
> >> Thoughts?
> 
> > Yeah, I think we should investigate that.  Presumably you could easily
> > have a situation where one part of the tree is under a LIMIT or EXISTS
> > and therefore needs to preserve fast-start plans but the rest of the
> > (potentially large) tree isn't, so we need something fairly
> > fine-grained, I think.  Maybe we could add a flag to each RelOptInfo
> > indicating whether fast-start plans should be kept, or something like
> > that.
> 
> I got around to looking at this finally.  It turns out to be a big win,
> at least for queries without any LIMIT or other reason to worry about
> fast-start plans.

Yes, I remember from the early days how quickly the number of considred
paths can grow.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +