Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct
Дата
Msg-id 201201120140.35229.andres@anarazel.de
обсуждение исходный текст
Ответ на Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct  (Casey Allen Shobe <casey.shobe@messagesystems.com>)
Список pgsql-bugs
On Thursday, January 12, 2012 01:01:01 AM Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:

> Also, why the asymmetry in null handling?  And why did you only touch
> one of the two code paths in eqjoinsel_semi?  They have both got this
> issue of how to estimate with inadequate stats.
This patch was purely a proof of concept, sorry if that wasn't clear. I mostly
wanted to point out that real plans regressed with this change. Digging a bit
around I could find more examples where it caused real pain. But also cases
were the new behaviour was advantageous.
Unfortunately the pastebins where raptelan provided plans expired by now...
Perhaps he can provide them again?

If we aggree on a way to handle the stats I am happy to produce a patch that
actually tries to cover all the cases.

> > Whats your opinion on this?
> Looks pretty bogus to me.  You're essentially assuming that the side of
> the join without statistics is unique, which is a mighty dubious
> assumption.
It sure is a bit dubious. But assuming that a semijoin that has max of n rows
on the inner side results in half of the outer sides rows (>> n) is pretty
bogus as well. Using the asumption of uniqueness for the outer side seems
sensible if its only used as a upper limit (Except in an antijoin ...).

Yes, my "patch" didn't even start to do this ;)

SELECT * FROM blub WHERE foo IN (SELECT something_with_aggregation); is not
exactly a fringe case, so I find it problematic regressing quite a bit in the
estimates.


> (In cases where we *know* it's unique, something like this
> could be reasonable, but I believe get_variable_numdistinct already
> accounts for such cases.)
Only that we infer uniqueness only from very few things unless I miss
something...

Andres

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct