Re: A query planner that learns

Поиск
Список
Период
Сортировка
От AgentM
Тема Re: A query planner that learns
Дата
Msg-id 1B7FAD85-20E9-40AE-8018-724F94E132F8@themactionfaction.com
обсуждение исходный текст
Ответ на Re: A query planner that learns  ("John D. Burger" <john@mitre.org>)
Ответы Re: A query planner that learns  ("Jim C. Nasby" <jim@nasby.net>)
Список pgsql-general
On Oct 13, 2006, at 11:47 , John D. Burger wrote:

> Erik Jones wrote:
>
>> Forgive me if I'm way off here as I'm not all that familiar with
>> the internals of postgres, but isn't this what the genetic query
>> optimizer discussed the one of the manual's appendixes is supposed
>> to do.
>
> No - it's not an "optimizer" in that sense.  When there are a small
> enough set of tables involved, the planner uses a dynamic
> programming algorithm to explore the entire space of all possible
> plans.  But the space grows exponentially (I think) with the number
> of tables - when this would take too long, the planner switches to
> a genetic algorithm approach, which explores a small fraction of
> the plan space, in a guided manner.
>
> But with both approaches, the planner is just using the static
> statistics gathered by ANALYZE to estimate the cost of each
> candidate plan, and these statistics are based on sampling your
> data - they may be wrong, or at least misleading.  (In particular,
> the statistic for total number of unique values is frequently =way=
> off, per a recent thread here.  I have been reading about this,
> idly thinking about how to improve the estimate.)
>
> The idea of a learning planner, I suppose, would be one that
> examines cases where these statistics lead to very misguided
> expectations.  The simplest version of a "learning" planner could
> simply bump up the statistics targets on certain columns.  A
> slightly more sophisticated idea would be for some of the
> statistics to optionally use parametric modeling (this column is a
> Gaussian, let's estimate the mean and variance, this one is a Beta
> distribution ...).  Then the smarter planner could spend some
> cycles applying more sophisticated statistical modeling to
> problematic tables/columns.

One simple first step would be to run an ANALYZE whenever a
sequential scan is executed. Is there a reason not to do this? It
could be controlled by a GUC variable in case someone wants
repeatable plans.

Further down the line, statistics could be collected during the
execution of any query- updating histograms on delete and update, as
well.

-M

В списке pgsql-general по дате отправления:

Предыдущее
От: "Harpreet Dhaliwal"
Дата:
Сообщение: Server Added Y'day. Missing Today
Следующее
От: Tom Lane
Дата:
Сообщение: Re: UTF-8