Re: Implied BETWEEN from join quals

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: Implied BETWEEN from join quals
Дата
Msg-id CA+U5nMKMEUCLNZRcguQerDf5OFi7S=J6EC8zW6LZRS1u3a8pSg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Implied BETWEEN from join quals  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 28 April 2014 15:01, Robert Haas <robertmhaas@gmail.com> wrote:

> Interesting.  This can pretty obviously produce a big win if things
> break our way.  But it'll take some effort do determine whether the
> range of possible values for the join column is narrow enough to make
> the inferred BETWEEN clause selective enough to be useful, and that
> effort will be wasted if it turns out that the answer is no.  I can
> certainly think of times when this would have helped me, so I kind of
> like the idea in theory ... but I won't be surprised if it turns out
> to be possible to construct realistic test cases where trying to
> figure this out causes too much increase in planning time.
>
> Actually, it seems possible to me that this could end up winning even
> if it doesn't enable the use of an index scan rather than a sequential
> scan, because eliminating rows during the scan is typically much
> cheaper than eliminating them during the join step.  But it also seems
> possible that it could lose heavily in that case, if the extra steps
> during the scan phase don't eliminate enough rows to reduce the join
> cost to a sufficient degree.  I'm not really sure how it will work out
> on balance.

I agree its not always a win and that the planning overhead is a
little high. That's why I suggest we only attempt to calculate such a
plan if the normal plan for the scan is already high. That way the
extra planning is worth it. I'm sure there are other optimizations
that we might only wish to consider for plans that are otherwise high
cost. This need hardly touch the main planning path for shirt queries
at all.

The extra run-time cost of the test adds 2*cpu_operator_cost to the
scan. However, it will save at least (OldSelectivity -
NewSelectivity)*(cpu_tuple_cost + cpu_operator_cost) in the join node
directly above it (and possible more above that - though without being
able to inspect the join tree we won't know that until later - and the
output plan might actually chage the later join plan anyway). So we
can calculate the threshold at which we should choose the modified
plan, though it will likely be a conservative value.

The new selectivity can be calculated just as other selectivities, I
see no problem there.

So we have a proposed way to limit planning time in cases where the
benefit would be low. Plus we have a way to estimate the reduction in
cost once we add the new tests.

The real value is whether we can use the additional tests to pick up
an index we hadn't been able to use before. Given that MinMax index
use will be a win even for fairly high selectivities (because of the
low cost of index access), it looks like this optimization will play
very nicely with MinMax.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: includedir_internal headers are not self-contained
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: allowing VACUUM to be cancelled for conflicting locks