Re: wip: functions median and percentile
От | Pavel Stehule |
---|---|
Тема | Re: wip: functions median and percentile |
Дата | |
Msg-id | AANLkTi=wB_K0JYytVQC8g5As=xxzQAuKEfqhQcTXHaFT@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: wip: functions median and percentile (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
2010/10/10 Tom Lane <tgl@sss.pgh.pa.us>: > Dean Rasheed <dean.a.rasheed@gmail.com> writes: >> In the meantime, the attached variation of the patch fixes the temp >> file issue and will support all 3 cases. It gives OK performance for >> (1) and (2), and poor performance for (3). That could be viewed as a >> future development task, which perhaps the window function API would >> help with. I think it would be a shame to drop support for (2) just >> because we can't do (3) efficiently yet. > > I started looking at this patch, and noticed that we got so caught up > in implementation issues that we forgot the unresolved problem of data > types. The original patch defined median(anyelement), which is only > well-defined for an odd number of inputs; for an even number of inputs > you have to take the left or right item in the central pair. I see > that this version defines > median(float8) returns float8 > median(float4) returns float8 > median(numeric) returns numeric > median(int8) returns numeric > median(int4) returns numeric > median(int2) returns numeric > which strikes me as possibly being overkill. > > It was pointed out upthread that while median isn't presently > in the standard, Oracle defines it in terms of percentile_cont(0.5) > which *is* in the standard. What I read in SQL:2008 is that > percentile_cont is defined for all numeric types (returning > approximate numeric with implementation-defined precision), > and for interval (returning interval), and not for any other > input type. So it appears to me that what we ought to support > is > median(float8) returns float8 > median(interval) returns interval > and nothing else --- we can rely on implicit casting to convert > any other numeric input type to float8. +1 > > BTW, as far as the implementation issues go, telling tuplesort that it > can use gigabytes of memory no matter what seems quite unacceptable. > Put this thing into a hash aggregation and you'll blow out your memory > in no time. I don't think it's even a good idea to use work_mem there. > I wonder whether it'd be a good idea to augment AggCheckCallContext() > so that there's a way for aggregates to find out how much memory they > ought to try to use. In a simple aggregation situation it's probably > OK to use work_mem, but in a hash aggregation you'd better use less > --- perhaps work_mem divided by the number of groups expected. > > Also, I believe that the lack-of-cleanup problem for tuplesorts spilling > to disk should be fixable by using an exprcontext shutdown callback (see > RegisterExprContextCallback). this issue is only for window aggregates, and this way is block - the problem with tuplestore is not only wit cleanup, but more in performance. But using a MemoryContext shutdown callback is good idea. Regards Pavel Stehule > > Comments? > > regards, tom lane >
В списке pgsql-hackers по дате отправления: