Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support
Дата
Msg-id 407d949e0912210213u39b857b0qa842885d715079e6@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support  (Pavel Stehule <pavel.stehule@gmail.com>)
Ответы Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support
Список pgsql-hackers
On Mon, Dec 21, 2009 at 5:48 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Information about count of rows are not detected in planner time. This
>  have to by available in any executor's sort node. So this value will
> be available every time - index scan is problem. Interesting is Greg's
> info about O(N) algorithms. Then we can implement median as classic
> aggregate.
>...
> On second hand - any internal implementation of median will be faster,
> then currently used techniques.

Some further information now that I'm on a real keyboard.

The O(n) algorithm for median of which I'm aware is Quickselect. It's
based on Quicksort which makes it unsuitable for a disk-based
algorithm since it would have to read every block many times. Perhaps
there's some way to adapt it or there might be some other O(n)
algorithm which has better i/o patterns.

But in cases where the tupleset/tuplesort is still in memory it would
be easy to add support for pulling out the nth element and use that in
a magic median() function. It wouldn't be a "classic" aggregate, at
least as I understand it where you also need something like O(1) space
to store the state, because you definitely need access to the whole
tupleset to do the quickselect.

If we don't find a way to optimize this for on-disk tuplestores though
then I wonder whether it's worth it. It would only help in the narrow
case that you had a large enough set to see the difference between
doing quickselect and quicksort but small enough to fit in workmem. I
suppose that case is widening over time though as memory sizes get
bigger and bigger.

--
greg


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Hiroyuki Yamada
Дата:
Сообщение: Re: alpha3 release schedule?
Следующее
От: Tim Bunce
Дата:
Сообщение: Minimum perl version supported