Re: Implied BETWEEN from join quals

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Implied BETWEEN from join quals
Дата
Msg-id CA+TgmoZEpuP9TvJv+9MnJQTPEm95jrGOq1mYhV_A=kt6kujJ_g@mail.gmail.com
обсуждение исходный текст
Ответ на Implied BETWEEN from join quals  (Simon Riggs <simon@2ndQuadrant.com>)
Ответы Re: Implied BETWEEN from join quals  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
On Tue, Apr 22, 2014 at 11:44 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I was recently nudged to describe an optimisation of merge
> joins/sorts. Rather than decribe that, I've looked at the general
> case:
>
> There are some additional implications we may make when joining
> tables... a particularly interesting one is that
>
> SELECT *
> FROM Fact JOIN Dim on Fact.col = Dim.col
>
> can be rewritten as
>
> SELECT *
> FROM Fact JOIN Dim on Fact.col = Dim.col
> WHERE
> (
> Fact.col BETWEEN
>  (SELECT min(col) FROM Dim)
> AND
>  (SELECT max(col) FROM Dim)
> )
> AND
> (
> Dim.col BETWEEN
>  (SELECT min(col) FROM Fact)
> AND
>  (SELECT max(col) FROM Fact)
> )
>
> which also works similarly for anti/semi-joins.
>
> If we have indexes on A.col and B.col then these additional quals can
> be derived cheaply at run-time and could have an important effect on
> optimisation.

Interesting.  This can pretty obviously produce a big win if things
break our way.  But it'll take some effort do determine whether the
range of possible values for the join column is narrow enough to make
the inferred BETWEEN clause selective enough to be useful, and that
effort will be wasted if it turns out that the answer is no.  I can
certainly think of times when this would have helped me, so I kind of
like the idea in theory ... but I won't be surprised if it turns out
to be possible to construct realistic test cases where trying to
figure this out causes too much increase in planning time.

Actually, it seems possible to me that this could end up winning even
if it doesn't enable the use of an index scan rather than a sequential
scan, because eliminating rows during the scan is typically much
cheaper than eliminating them during the join step.  But it also seems
possible that it could lose heavily in that case, if the extra steps
during the scan phase don't eliminate enough rows to reduce the join
cost to a sufficient degree.  I'm not really sure how it will work out
on balance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: allowing VACUUM to be cancelled for conflicting locks
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Decrease MAX_BACKENDS to 2^16