Re: LIMIT confuses the planner

Поиск
Список
Период
Сортировка
От marcin mank
Тема Re: LIMIT confuses the planner
Дата
Msg-id b1b9fac60903221712n255525a6w8575ccb4a75d56bb@mail.gmail.com
обсуждение исходный текст
Ответ на Re: LIMIT confuses the planner  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: LIMIT confuses the planner  (marcin mank <marcin.mank@gmail.com>)
Re: LIMIT confuses the planner  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
> So the bottom line here is just that the estimated n_distinct is too
> low.  We've seen before that the equation we use tends to do that more
> often than not.  I doubt that consistently erring on the high side would
> be better though :-(.  Estimating n_distinct from a limited sample of
> the population is known to be a statistically hard problem, so we'll
> probably not ever have perfect answers, but doing better is on the
> to-do list.
>

I hit an interestinhg paper on n_distinct calculation:

http://www.pittsburgh.intel-research.net/people/gibbons/papers/distinct-values-chapter.pdf

the PCSA algorithm described there requires O(1) calculation per
value. Page 22 describes what to do with updates streams.

This I think (disclaimer: I know little about PG internals) means that
the n_distinct estimation can be done during vacuum time (it would
play well with the visibility map addon).

What do You think?

Greetings
Marcin

В списке pgsql-performance по дате отправления:

Предыдущее
От: Laurent Wandrebeck
Дата:
Сообщение: Re: "iowait" bug?
Следующее
От: marcin mank
Дата:
Сообщение: Re: LIMIT confuses the planner