Re: [HACKERS] New design for FK-based join selectivity estimation

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [HACKERS] New design for FK-based join selectivity estimation
Дата
Msg-id 23545.1481658247@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [HACKERS] New design for FK-based join selectivity estimation  (ronan.dunklau@dalibo.com)
Ответы Re: [HACKERS] New design for FK-based join selectivity estimation  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
ronan.dunklau@dalibo.com writes:
> On mardi 13 décembre 2016 09:10:47 CET Adrien Nayrat wrote:
>> The commit 100340e2dcd05d6505082a8fe343fb2ef2fa5b2a introduce an
>> estimation error :

> The problem is, for semi and anti joins, we assume that we have nohing to do
> (costsize.c:4253):

>         else if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
>         {
>             /*
>              * For JOIN_SEMI and JOIN_ANTI, the selectivity is defined as the
>              * fraction of LHS rows that have matches.  If the referenced
>              * table is on the inner side, that means the selectivity is 1.0
>              * (modulo nulls, which we're ignoring for now).  We already
>              * covered the other case, so no work here.
>              */
>         }

> This results in assuming that the whole outerrel will match, no matter the
> selectivity of the innerrel.

Yeah.  In the terms of this example, the FK means that every outer row
would have a match, if the query were
select * from t3 where j in (select * from t4);

But actually it's
select * from t3 where j in (select * from t4 where j<10);

so of course we should not expect a match for every row.

> If I understand it correctly and the above is right, I think we should ignore
> SEMI or ANTI joins altogether when considering FKs, and keep the corresponding
> restrictinfos for later processing since they are already special-cased later
> on.

That seems like an overreaction.  While the old code happens to get this
example exactly right, eqjoinsel_semi is still full of assumptions and
approximations, and it doesn't do very well at all if it lacks MCV lists
for both sides.

I'm inclined to think that what we want to have happen in this case is
to estimate the fraction of outer rows having a match as equal to the
selectivity of the inner query's WHERE clauses, ie the semijoin
selectivity should be sizeof(inner result) divided by sizeof(inner
relation).
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kevin Grittner
Дата:
Сообщение: Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraintviolation [and 2 more messages]
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: [HACKERS] Logical Replication WIP