Re: Planner hints in Postgresql

Поиск
Список
Период
Сортировка
От Claudio Freire
Тема Re: Planner hints in Postgresql
Дата
Msg-id CAGTBQpbdqT=1NuMPUcM6RZLhWVhr9H1c8QFZiy9240n9OG8Srw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Planner hints in Postgresql  (Merlin Moncure <mmoncure@gmail.com>)
Список pgsql-hackers

On Tue, Mar 18, 2014 at 4:48 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> That alone could improve things considerably, and statistical info could be
> propagated along expressions to make it possible to model uncertainty in
> complex expressions as well.

But how would that work?  I see no solution adumbrated there :-).

I would have to tipify the SQL expression grammar for this, but I don't think it would be impossible. Most non-function expression nodes seem rather trivial. Even CASE, as long as you have a distribution for the conditional, you can derive a distribution for the whole. User defined functions would be another game, though. Correlation would have to be measured, and that can be troublesome and a weak spot of risk computation as much as it is of planning, but it could be fuzzed arbitrarily until properly computed - after all, dependency on correlation or non-correlation is a known source of risk, and accounting for it in any way is better than not.
 
Let's say you change the rowcount estimate to low/bestguess/high *and*
you only engage extra searches when there is enough disparity between
those values you still get exponentially more searches.

I was under the impression the planner already did an exhaustive search for some queries. So it's just a matter of picking the best plan among those (ie: estimating cost). The case of GEQO isn't any different, except perhaps introducing a risk-decreasing transformation would be needed, unless I'm missing something.
 
 (my thinking
is that if bestguess estimated execution time is some user definable
amount faster then low/high at any node), a more skeptical plan is
introduced.   All that could end up being pessimal to the general case
though.

I think the cost estimate would be replaced by a distribution (simplified perhaps into an array of moments, or whatever is easily manipulated in the face of complex expressions). What the user would pick, is a sampling method of said distribution. Then, plans get measured by the user's stick (say: arithmetic mean, median, 90th percentile, etc). The arithmetic mean would I guess be the default, and that ought to be roughly equivalent to the planner's current behavior.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: [WIP] Better partial index-only scans
Следующее
От: Josh Berkus
Дата:
Сообщение: Re: First-draft release notes for next week's releases