Re: wip: functions median and percentile

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: wip: functions median and percentile
Дата
Msg-id AANLkTikGt8Z1dEOShTmwCCCOCJvmO94UCnYn9ZLCrbi2@mail.gmail.com
обсуждение исходный текст
Ответ на Re: wip: functions median and percentile  (David Fetter <david@fetter.org>)
Ответы Re: wip: functions median and percentile  (Pavel Stehule <pavel.stehule@gmail.com>)
Список pgsql-hackers
2010/9/23 David Fetter <david@fetter.org>:
> On Thu, Sep 23, 2010 at 08:27:38PM +0200, Pavel Stehule wrote:
>> 2010/9/23 Hitoshi Harada <umi.tanuki@gmail.com>:
>> > 2010/9/23 Pavel Stehule <pavel.stehule@gmail.com>:
>> >> Hello
>> >>
>> >> 2010/9/22 Hitoshi Harada <umi.tanuki@gmail.com>:
>> >>> 2010/9/22 Pavel Stehule <pavel.stehule@gmail.com>:
>> >>>> Hello
>> >>>>
>> >>>> I found probably hard problem in cooperation with window functions :(
>> >>
>> >> maybe I was confused. I found a other possible problems.
>> >>
>> >> The problem with median function is probably inside a final function
>> >> implementation. Actually we request possibility of repetitive call of
>> >> final function. But final function call tuplesort_end function and
>> >> tuplesort_performsort. These function changes a state of tuplesort.
>> >> The most basic question is "who has to call tuplesort_end function and
>> >> when?
>> >
>> > Reading the comment in array_userfuncs.c, array_agg_finalfn() doesn't
>> > clean up its internal state at all and tells it's the executor's
>> > responsibility to clear memory. It is allowed since ArrayBuildState is
>> > only in-memory state. In the other hand, TupleSort should be cleared
>> > by calling tuplesort_end() if it has tapeset member (on file based
>> > sort) to close physical files.
>> >
>> > So 2 or 3 ways to go in my mind:
>>
>> it is little bit worse - we cannot to call tuplesort_performsort repetitive.
>>
>> >
>> > 1. call tuplesort_begin_datum with INT_MAX workMem rather than the
>> > global work_mem, to avoid it spills out sort state to files. It may
>> > sounds dangerous, but actually memory exhausting can happen in
>> > array_agg() as well.
>> >
>> > 2. add TupleSort an argument that tells not to use file at all. This
>> > results in the same as #1 but more generic approach.
>> >
>> > 3. don't use tuplesort in median() but implement its original sort
>> > management. This looks quite redundant and like maintenance problem.
>> >
>> > #2 sounds like the best in generic and consistent way. The only point
>> > is whether the change is worth for implementing median() as it's very
>> > system-wide common fundamentals.
>> >
>> > Other options?
>>
>> #4 block median under window clause
>>
>> #5 use a C array instead tuplesort under window clause. It is very
>> unpractical to use a windows clauses with large datasets, so it should
>> not be a problem. More, this can be very quick, because for C array we
>> can use a qsort function.
>>
>> Now I prefer #5 - it can be fast for using inside windows clause and
>> safe when window clause will not be used.
>
> If there's some way to do this using the same code in the windowing
> and non-windowing case, that would be much, much better from an
> architectural point of view.  Single Point of Truth and all that.

We can have a median with support a window clause, but limited to
work_mem, or we can have a unlimited median, but without window
clause. I think, I am able to minimalize a code duplicity - just to
define some envelope over  tuplesort.

The unique code isn't possible there - minimally now we have a two
variants - one for numeric result and second for double. But it is
usual - try to look how much AVG functions are in core.

Regards

Pavel

>
> Cheers,
> David.
> --
> David Fetter <david@fetter.org> http://fetter.org/
> Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
> Skype: davidfetter      XMPP: david.fetter@gmail.com
> iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dimitri Fontaine
Дата:
Сообщение: Re: Easy way to verify gitignore files?
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Why is time with timezone 12 bytes?