Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support
Дата
Msg-id 162867790912210343r9ce2d53s51616392c3003042@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support  (Greg Stark <gsstark@mit.edu>)
Ответы Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support  (Greg Stark <gsstark@mit.edu>)
Список pgsql-hackers
2009/12/21 Greg Stark <gsstark@mit.edu>:
> On Mon, Dec 21, 2009 at 5:48 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> Information about count of rows are not detected in planner time. This
>>  have to by available in any executor's sort node. So this value will
>> be available every time - index scan is problem. Interesting is Greg's
>> info about O(N) algorithms. Then we can implement median as classic
>> aggregate.
>>...
>> On second hand - any internal implementation of median will be faster,
>> then currently used techniques.
>
> Some further information now that I'm on a real keyboard.
>
> The O(n) algorithm for median of which I'm aware is Quickselect. It's
> based on Quicksort which makes it unsuitable for a disk-based
> algorithm since it would have to read every block many times. Perhaps
> there's some way to adapt it or there might be some other O(n)
> algorithm which has better i/o patterns.
>
> But in cases where the tupleset/tuplesort is still in memory it would
> be easy to add support for pulling out the nth element and use that in
> a magic median() function. It wouldn't be a "classic" aggregate, at
> least as I understand it where you also need something like O(1) space
> to store the state, because you definitely need access to the whole
> tupleset to do the quickselect.
>

I can check speed diferences on orafce's median implementation. But
orafce isn't correct - it ignores work_mem limit - so it usable only
for some external modules or for testing. I'll try simple median
implementation based on aggregate API. Then I'll compare speed.

> If we don't find a way to optimize this for on-disk tuplestores though
> then I wonder whether it's worth it. It would only help in the narrow
> case that you had a large enough set to see the difference between
> doing quickselect and quicksort but small enough to fit in workmem. I
> suppose that case is widening over time though as memory sizes get
> bigger and bigger.

Thank you

I have to do same test and get more info about quickselect

Pavel

>
> --
> greg
>


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: alpha3 release schedule?
Следующее
От: Robert Haas
Дата:
Сообщение: Re: using separate parameters in psql query execution