Re: planner not choosing fastest estimated plan

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: planner not choosing fastest estimated plan
Дата
Msg-id 17167.1373213504@sss.pgh.pa.us
обсуждение исходный текст
Ответ на planner not choosing fastest estimated plan  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-hackers
Jeff Janes <jeff.janes@gmail.com> writes:
> I have a weird case where the planner doesn't choose the plan that it
> itself believes to be the fastest plan.

I poked into this a bit and found where things are going wrong.  The
ideal plan for this case involves a nestloop with a parameterized inner
scan over an inheritance tree (an append relation).  The Path for that
scan has to be built by set_append_rel_pathlist(), around lines 810-860
of allpaths.c in HEAD.  And what that code is doing is taking the
cheapest path for each appendrel member, then "reparameterizing" it
if necessary, which means modifying it to include enforcement of any
available parameterized join quals that it wasn't using already.
In this case what we've got for each child is a simple seqscan path
(enforcing no quals) and a bitmap indexscan path that uses the
parameterized joinqual perlupper(a.val1)=perlupper(b.val1).  Ordinarily
the bitmap path would be the cheapest and everything would be fine.
However, Jeff's example assigns a very high cost to the UDF, and the
bitmap scan's cost estimate includes (correctly) one evaluation of the
UDF.  With a high enough UDF cost, the bare seqscan is cheaper, and so
it gets picked.  Now, once we reparameterize the seqscan, which in this
case amounts to having it evaluate the parameterized joinqual at every
row, it's hugely expensive; so we end up with a very expensive
parameterized append path that isn't going to look attractive when join
paths are made, and the planner picks a hash or merge join instead.

So in short, the error lies in assuming that the cheapest plain path
will still be the cheapest one after reparameterization.  I think that
I did recognize that as a risk when I wrote this code, but supposed that
the possible addition of extra quals to evaluate wouldn't change the
outcome.  Jeff's example shows that actually the added quals can make a
huge difference.

The simplest correct logic would involve running through all the
available paths for the child relation, reparameterizing each one, and
only then comparing their costs.  That sounds a bit expensive though.
What we could do to ameliorate the planning cost is to still search for
the overall-cheapest child path, and if it is already parameterized the
way we want (which will frequently be the case, I think), then we're
done.  Only if it requires reparameterization do we have to go through
the more complicated process.

Thoughts, better ideas?
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Add regression tests for ROLE (USER)
Следующее
От: Robins Tharakan
Дата:
Сообщение: Re: Add tests for LOCK TABLE