Re: Ordered Partitioned Table Scans

Поиск
Список
Период
Сортировка
От Julien Rouhaud
Тема Re: Ordered Partitioned Table Scans
Дата
Msg-id CAOBaU_Y+mOsCJqLs_zkae=Wf0HgWFAx0TaA-LnUon_-R6DzVEw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Ordered Partitioned Table Scans  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Fri, Mar 22, 2019 at 7:19 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Mar 22, 2019 at 12:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> > > On Fri, Mar 22, 2019 at 11:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > >> In cases where, say, the first child requires no sort but also doesn't
> > >> emit very many rows, while the second child requires an expensive sort,
> > >> the planner will have a ridiculously optimistic opinion of the cost of
> > >> fetching slightly more rows than are available from the first child.
> > >> This might lead it to wrongly choose a merge join over a hash for example.
> >
> > > I think this is very much a valid point, especially in view of the
> > > fact that we already choose supposedly fast-start plans too often.  I
> > > don't know whether it's a death sentence for this patch, but it should
> > > at least make us stop and think hard.
> >
> > BTW, another thing we could possibly do to answer this objection is to
> > give the ordered-Append node an artificially pessimistic startup cost,
> > such as the sum or the max of its children's startup costs.  That's
> > pretty ugly and unprincipled, but maybe it's better than not having the
> > ability to generate the plan shape at all?
>
> Yeah, I'm not sure whether that's a good idea or not.  I think one of
> the problems with a cost-based optimizer is that once you start
> putting things in with the wrong cost because you think it will give
> the right answer, you're sorta playing with fire, because you can't
> necessarily predict how things are going are going to turn out in more
> complex scenarios.  On the other hand, it may sometimes be the right
> thing to do.

I've been bitten too many times with super inefficient plans of the
form "let's use the wrong index instead of the good one because I'll
probably find there the tuple I want very quickly", due to LIMIT
assuming an even distribution.   Since those queries can end up taking
dozens of minutes instead of less a ms, without a lot of control to
fix this kind of problem I definitely don't want to introduce another
similar source of pain for users.

However, what we're talking here is still a corner case.  People
having partitioned tables with a mix of partitions with and without
indexes suitable for ORDER BY x LIMIT y queries should already have at
best average performance.  And trying to handle this case cannot hurt
cases where all partitions have suitable indexes, so that may be an
acceptable risk?

I also have mixed feeling about this artificial startup cost penalty,
but if we go this way we can for sure cumulate the startup cost of all
sorts that we think we'll have to perform (according to each path's
estimated rows and the given limit_tuples).  That probably won't be
enough though, especially with fractional paths.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: rename labels in heapam.c?
Следующее
От: Tom Lane
Дата:
Сообщение: Re: speeding up planning with partitions