Re: Implied BETWEEN from join quals

Поиск

Список

Период

Сортировка

От	Robert Haas
Тема	Re: Implied BETWEEN from join quals
Дата	28 апреля 2014 г. 14:01:36
Msg-id	CA+TgmoZEpuP9TvJv+9MnJQTPEm95jrGOq1mYhV_A=kt6kujJ_g@mail.gmail.com обсуждение исходный текст
Ответ на	Implied BETWEEN from join quals (Simon Riggs <simon@2ndQuadrant.com>)
Ответы	Re: Implied BETWEEN from join quals
Список	pgsql-hackers

Дерево обсуждения

On Tue, Apr 22, 2014 at 11:44 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I was recently nudged to describe an optimisation of merge
> joins/sorts. Rather than decribe that, I've looked at the general
> case:
>
> There are some additional implications we may make when joining
> tables... a particularly interesting one is that
>
> SELECT *
> FROM Fact JOIN Dim on Fact.col = Dim.col
>
> can be rewritten as
>
> SELECT *
> FROM Fact JOIN Dim on Fact.col = Dim.col
> WHERE
> (
> Fact.col BETWEEN
>  (SELECT min(col) FROM Dim)
> AND
>  (SELECT max(col) FROM Dim)
> )
> AND
> (
> Dim.col BETWEEN
>  (SELECT min(col) FROM Fact)
> AND
>  (SELECT max(col) FROM Fact)
> )
>
> which also works similarly for anti/semi-joins.
>
> If we have indexes on A.col and B.col then these additional quals can
> be derived cheaply at run-time and could have an important effect on
> optimisation.

Interesting.  This can pretty obviously produce a big win if things
break our way.  But it'll take some effort do determine whether the
range of possible values for the join column is narrow enough to make
the inferred BETWEEN clause selective enough to be useful, and that
effort will be wasted if it turns out that the answer is no.  I can
certainly think of times when this would have helped me, so I kind of
like the idea in theory ... but I won't be surprised if it turns out
to be possible to construct realistic test cases where trying to
figure this out causes too much increase in planning time.

Actually, it seems possible to me that this could end up winning even
if it doesn't enable the use of an index scan rather than a sequential
scan, because eliminating rows during the scan is typically much
cheaper than eliminating them during the join step.  But it also seems
possible that it could lose heavily in that case, if the extra steps
during the scan phase don't eliminate enough rows to reduce the join
cost to a sufficient degree.  I'm not really sure how it will work out
on balance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Implied BETWEEN from join quals