Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
Дата
Msg-id CA+TgmoYYMvZm1EnHrD8sBovZ=x2Piy9_yew3J1LDyCW39Ey-HQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-hackers
On Thu, Feb 17, 2022 at 4:17 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> IMHO the whole problem is we're unable to estimate the join clause as a
> conditional probability, i.e.
>
>    P(A.x = B.x | (A.x < 42) & (B.x < 42))
>
> so maybe instead of trying to generate additional RelOptInfo items we
> should think about improving that. The extra RelOptInfos don't really
> solve this, because even if you decide to join A|x<42 to B|x<42 it does
> nothing to improve the join clause estimate.

I guess I hadn't considered that angle. I think the extra RelOptInfos
(or whatever) actually do solve a problem, because enforcing a
high-selectivity join qual against both sides is potentially quite
wasteful, and you need some way to decide whether to do it on one
side, the other, or both. But it's also true that I was wrong to
assume independence ... and if we could avoid assuming that, then the
join selectivity would work itself out without any of the machinery
that I just proposed.

> It actually deals with a more general form of this case, because the
> clauses don't need to reference the same attribute - so for example this
> would work too, assuming there is extended stats object on the columns
> on each side:
>
>   P(A.c = B.d | (A.e < 42) & (B.f < 42))

That'd be cool.

> Not sure. In my experience queries with both a join clause and other
> clauses referencing the same attribute are pretty rare. But I agree if
> we can do the expensive stuff only when actually needed, with no cost in
> the 99.999% other cases, I don't see why not. Of course, code complexity
> is a cost too.

Right. I mean, we could have a planner GUC to control whether the
optimization is used even in cases where we see that it's possible.
But Tom keeps arguing that it is possible in many queries and would
benefit few queries, and I'm not seeing why that should be so. I think
it's likely to benefit many of the queries to which it applies.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Nonrandom scanned_pages distorts pg_class.reltuples set by VACUUM
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Nonrandom scanned_pages distorts pg_class.reltuples set by VACUUM