Re: disfavoring unparameterized nested loops

Поиск
Список
Период
Сортировка
От Ashutosh Bapat
Тема Re: disfavoring unparameterized nested loops
Дата
Msg-id CAExHW5vB+B_0zf1MDKFV_kL-FYkNQD=Rzx6Q2nUrEERC_mXg1Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: disfavoring unparameterized nested loops  (David Rowley <dgrowleyml@gmail.com>)
Ответы Re: disfavoring unparameterized nested loops  (John Naylor <john.naylor@enterprisedb.com>)
Список pgsql-hackers
>
> The problem I have with this idea is that I really don't know how to
> properly calculate what the risk_factor should be set to.  It seems
> easy at first to set it to something that has the planner avoid these
> silly 1-row estimate nested loop mistakes, but I think what we'd set
> the risk_factor to would become much more important when more and more
> Path types start using it. So if we did this and just guessed the
> risk_factor, that might be fine when only 1 of the paths being
> compared had a non-zero risk_factor, but as soon as both paths have
> one set, unless they're set to something sensible, then we just end up
> comparing garbage costs to garbage costs.

Risk factor is the inverse of confidence on estimate, lesser
confidence, higher risk. If we associate confidence with the
selectivity estimate, or computer confidence interval of the estimate
instead of a single number, we can associate risk factor with each
estimate. When we combine estimates to calculate new estimates, we
also combine their confidences/confidence intervals. If my memory
serves well, confidence intervals/confidences are calculated based on
the sample size and method used for estimation, so we should be able
to compute those during ANALYZE.

I have not come across many papers which leverage this idea. Googling
"selectivity estimation confidence interval", does not yield many
papers. Although I found [1] to be using a similar idea. So may be
there's not merit in this idea, thought theoretically it sounds fine
to me.


[1] https://pi3.informatik.uni-mannheim.de/~moer/Publications/vldb18_smpl_synop.pdf
--
Best Wishes,
Ashutosh Bapat



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fujii Masao
Дата:
Сообщение: Re: fdatasync performance problem with large number of DB files
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: row filtering for logical replication