Обсуждение: wip: functions median and percentile
Hello I am sending a prototype implementation of functions median and percentile. This implementation is very simple and I moved it to contrib for this moment - it is more easy maintainable. Later I'll move it to core. These functions are relative simple, there are not barrier for implementation own specific mutations of this functions - so I propose move to core only basic and well known form of these to core. postgres=# select median(v) from generate_series(1,10) g(v); median ──────── 5.5 (1 row) Time: 1.475 ms postgres=# select percentile(v,50) from generate_series(1,10) g(v); percentile ──────────── 5 (1 row) Time: 0.626 ms This implementation is based on tuplesort and the speed is relative well - the result from 1000000 rows is less 1 sec. Regards Pavel Stehule
Вложения
On Thu, Aug 19, 2010 at 11:59 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > I am sending a prototype implementation of functions median and > percentile. This implementation is very simple and I moved it to > contrib for this moment - it is more easy maintainable. Later I'll > move it to core. So if the entire result set fits in memory it would be nice to use the O(n) Quickselect algorithm -- which would only be a small change to the existing Quicksort code -- instead of sorting the entire set. That should be possible both for percentile() and median(). It would be good to generalize the tuplesort api sufficiently so that you can implement this as a contrib module even if we eventually decide to integrate it in core. It's probably worth having two copies of the sort code, one for Quickselect and one for Quicksort just for speed, though I suppose it's worth benchmarking. But I'm not aware of a generalization of tapesort to allow O(n) selection with the sequential i/o properties of tapesort. It would be especially interesting, probably more so than the in-memory case. -- greg
Greg Stark <gsstark@mit.edu> writes: > On Thu, Aug 19, 2010 at 11:59 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> I am sending a prototype implementation of functions median and >> percentile. This implementation is very simple and I moved it to >> contrib for this moment - it is more easy maintainable. Later I'll >> move it to core. > So if the entire result set fits in memory it would be nice to use the > O(n) Quickselect algorithm -- which would only be a small change to > the existing Quicksort code -- instead of sorting the entire set. That seems like rather a lot of added infrastructure for functions whose popularity is at best unknown. I think we should KISS for the first implementation. regards, tom lane
On Thu, Aug 19, 2010 at 11:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Greg Stark <gsstark@mit.edu> writes: >> On Thu, Aug 19, 2010 at 11:59 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>> I am sending a prototype implementation of functions median and >>> percentile. This implementation is very simple and I moved it to >>> contrib for this moment - it is more easy maintainable. Later I'll >>> move it to core. > >> So if the entire result set fits in memory it would be nice to use the >> O(n) Quickselect algorithm -- which would only be a small change to >> the existing Quicksort code -- instead of sorting the entire set. > > That seems like rather a lot of added infrastructure for functions whose > popularity is at best unknown. I think we should KISS for the first > implementation. +1. I think the functions are useful, but the perfect is the enemy of the good. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Thu, Aug 19, 2010 at 12:49:45PM -0400, Robert Haas wrote: > On Thu, Aug 19, 2010 at 11:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Greg Stark <gsstark@mit.edu> writes: > >> On Thu, Aug 19, 2010 at 11:59 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > >>> I am sending a prototype implementation of functions median and > >>> percentile. This implementation is very simple and I moved it to > >>> contrib for this moment - it is more easy maintainable. Later I'll > >>> move it to core. > > > >> So if the entire result set fits in memory it would be nice to use the > >> O(n) Quickselect algorithm -- which would only be a small change to > >> the existing Quicksort code -- instead of sorting the entire set. > > > > That seems like rather a lot of added infrastructure for functions whose > > popularity is at best unknown. I think we should KISS for the first > > implementation. > > +1. I think the functions are useful, but the perfect is the enemy of the good. Percentile is already there as NTILE, a windowing function. Median may be useful, but we pretty much can't just call it "median." Instead, we need to call it something like "left_median" or "arithmetic_median." Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> wrote: > Median may be useful, but we pretty much can't just call it > "median." Instead, we need to call it something like "left_median" > or "arithmetic_median." I think it would be reasonable, and perhaps preferable, to use just "median" for the semantics described in most dictionaries -- for example, this: http://www.merriam-webster.com/dictionary/median If you do a google search for "median" and poke around, you'll find many places where this is the only definition mentioned; the others seem to be rather infrequently used. Why not make the commone usage convenient? -Kevin
On Thu, Aug 19, 2010 at 12:12:12PM -0500, Kevin Grittner wrote: > David Fetter <david@fetter.org> wrote: > > > Median may be useful, but we pretty much can't just call it > > "median." Instead, we need to call it something like "left_median" > > or "arithmetic_median." > > I think it would be reasonable, and perhaps preferable, to use just > "median" for the semantics described in most dictionaries -- for > example, this: > > http://www.merriam-webster.com/dictionary/median > > If you do a google search for "median" and poke around, you'll find > many places where this is the only definition mentioned; the others > seem to be rather infrequently used. Why not make the commone usage > convenient? The reason not to is the same reason that MEDIAN doesn't appear in the SQL standard, namely that what's common in one field is wrong in another. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes: > On Thu, Aug 19, 2010 at 12:12:12PM -0500, Kevin Grittner wrote: >> http://www.merriam-webster.com/dictionary/median >> >> If you do a google search for "median" and poke around, you'll find >> many places where this is the only definition mentioned; the others >> seem to be rather infrequently used. Why not make the commone usage >> convenient? > The reason not to is the same reason that MEDIAN doesn't appear in the > SQL standard, namely that what's common in one field is wrong in > another. Hmm, do you have any *evidence* that that's why it's not in the standard? My own take on that is that it's reasonably probable that the SQL committee might standardize a function by that name someday. What we need to be worrying about is the prospect that if there are multiple definitions for the term, they might pick a different one than we did. A name like "arithmetic_median" seems much less likely to get blindsided by future standardization. regards, tom lane
2010/8/19 David Fetter <david@fetter.org>: > On Thu, Aug 19, 2010 at 12:49:45PM -0400, Robert Haas wrote: >> On Thu, Aug 19, 2010 at 11:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> > Greg Stark <gsstark@mit.edu> writes: >> >> On Thu, Aug 19, 2010 at 11:59 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> >>> I am sending a prototype implementation of functions median and >> >>> percentile. This implementation is very simple and I moved it to >> >>> contrib for this moment - it is more easy maintainable. Later I'll >> >>> move it to core. >> > >> >> So if the entire result set fits in memory it would be nice to use the >> >> O(n) Quickselect algorithm -- which would only be a small change to >> >> the existing Quicksort code -- instead of sorting the entire set. >> > >> > That seems like rather a lot of added infrastructure for functions whose >> > popularity is at best unknown. I think we should KISS for the first >> > implementation. >> >> +1. I think the functions are useful, but the perfect is the enemy of the good. > > Percentile is already there as NTILE, a windowing function. Median I don't think it. The purpose is same, but usage is different - aggregate functions can be used as window func, but window functions cannot be used as aggregates. > may be useful, but we pretty much can't just call it "median." > Instead, we need to call it something like "left_median" or > "arithmetic_median." I put some time to deep searching in this area - and I talked with my kolegues mathematican and everywhere use a just median. Some longer or different name isn't barrier for me, but people can be confused. Regards Pavel > > Cheers, > David. > -- > David Fetter <david@fetter.org> http://fetter.org/ > Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter > Skype: davidfetter XMPP: david.fetter@gmail.com > iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics > > Remember to vote! > Consider donating to Postgres: http://www.postgresql.org/about/donate >
2010/8/19 David Fetter <david@fetter.org>: > On Thu, Aug 19, 2010 at 12:12:12PM -0500, Kevin Grittner wrote: >> David Fetter <david@fetter.org> wrote: >> >> > Median may be useful, but we pretty much can't just call it >> > "median." Instead, we need to call it something like "left_median" >> > or "arithmetic_median." >> >> I think it would be reasonable, and perhaps preferable, to use just >> "median" for the semantics described in most dictionaries -- for >> example, this: >> >> http://www.merriam-webster.com/dictionary/median >> >> If you do a google search for "median" and poke around, you'll find >> many places where this is the only definition mentioned; the others >> seem to be rather infrequently used. Why not make the commone usage >> convenient? > > The reason not to is the same reason that MEDIAN doesn't appear in the > SQL standard, namely that what's common in one field is wrong in > another. I think some else. The reason can be more simple - implementation of median is significantly harder then other aggregates. I looked there and Oracle11g use "median" in common sense. Regards Pavel Stehule > > Cheers, > David. > -- > David Fetter <david@fetter.org> http://fetter.org/ > Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter > Skype: davidfetter XMPP: david.fetter@gmail.com > iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics > > Remember to vote! > Consider donating to Postgres: http://www.postgresql.org/about/donate >
On Thu, Aug 19, 2010 at 01:25:36PM -0400, Tom Lane wrote: > David Fetter <david@fetter.org> writes: > > On Thu, Aug 19, 2010 at 12:12:12PM -0500, Kevin Grittner wrote: > >> http://www.merriam-webster.com/dictionary/median > >> > >> If you do a google search for "median" and poke around, you'll find > >> many places where this is the only definition mentioned; the others > >> seem to be rather infrequently used. Why not make the commone usage > >> convenient? > > > The reason not to is the same reason that MEDIAN doesn't appear in the > > SQL standard, namely that what's common in one field is wrong in > > another. > > Hmm, do you have any *evidence* that that's why it's not in the standard? Only from Joe Celko, who was on the standards committee, and has written on the subject. There's nothing I've found in the standard itself for this, if that's what you're looking for. > My own take on that is that it's reasonably probable that the SQL > committee might standardize a function by that name someday. What > we need to be worrying about is the prospect that if there are > multiple definitions for the term, they might pick a different one > than we did. Exactly :) > A name like "arithmetic_median" seems much less likely to get > blindsided by future standardization. Yep. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Pavel Stehule <pavel.stehule@gmail.com> wrote: > I looked there and Oracle11g use "median" in common sense. As does Sybase IQ. FWIW, Excel spreadsheets do, too. The chance of the SQL committee picking some other meaning for bare MEDIAN seems vanishingly small; although I have to grant that with only Oracle and Sybase as a precedent in the SQL world, it's not zero. -Kevin
David Fetter <david@fetter.org> writes: > On Thu, Aug 19, 2010 at 01:25:36PM -0400, Tom Lane wrote: >> A name like "arithmetic_median" seems much less likely to get >> blindsided by future standardization. > Yep. OTOH, if Pavel's right that Oracle already has an aggregate named median(), it seems quite unlikely that the SQL committee would pick a definition incompatible with Oracle's to standardize on. regards, tom lane
2010/8/19 Pavel Stehule <pavel.stehule@gmail.com>: > Hello > > I am sending a prototype implementation of functions median and > percentile. This implementation is very simple and I moved it to > contrib for this moment - it is more easy maintainable. Later I'll > move it to core. I've reviewed this patch. * The patch can apply cleanly and make doesn't print any errors nor warnings. But it doesn't touch contrib/Makefile so I had to make by changing dir to contrib/median. * Cosmetic coding style should be more appropriate, including trailing white spaces and indentation positions. * Since these two aggregates use tuplesort inside it, there're possible risk to cause out of memory in normal use case. I don't think this fact is critical, but at least some notation should be referred in docs. * It doesn't contain any document nor regression tests. * They should be callable in window function context; for example contrib_regression=# select median(a) over (order by a) from t1; ERROR: invalid tuplesort state or at least user-friend message should be printed. * The returned type is controversy. median(int) returns float8 as the patch intended, but avg(int) returns numeric. AFAIK only avg(float8) returns float8. * percentile() is more problematic; first, the second argument for the aggregate takes N of "N%ile" as int, like 50 if you want 50%ile, but it doesn't care about negative values or more than 100. In addition, the second argument is taken at the first non-NULL value of the first argument, but the second argument is semantically constant. For example, for (1.. 10) value of a in table t1, contrib_regression=# select percentile(a, a * 10 order by a) from t1;percentile ------------ 1 (1 row) contrib_regression=# select percentile(a, a * 10 order by a desc) from t1;percentile ------------ 10 (1 row) and if the argument comes from the subquery which doesn't contain ORDER BY clause, you cannot predict the result. That's all of my quick review up to now. Regards, -- Hitoshi Harada
Hello I found probably hard problem in cooperation with window functions :( tuplesort_begin_datum creates child context inside aggcontext. This context is used for tuplestore. But when this function is called from WindowAgg_Aggregates context - someone drops all child context every iteration, and then tuplestore state is invalid. For this moment we can block using a median function as window function, but it should be solved better - if it is possible - I don't see inside window function implementation. 2010/9/20 Hitoshi Harada <umi.tanuki@gmail.com>: > 2010/8/19 Pavel Stehule <pavel.stehule@gmail.com>: >> Hello >> >> I am sending a prototype implementation of functions median and >> percentile. This implementation is very simple and I moved it to >> contrib for this moment - it is more easy maintainable. Later I'll >> move it to core. > > I've reviewed this patch. > > * The patch can apply cleanly and make doesn't print any errors nor > warnings. But it doesn't touch contrib/Makefile so I had to make by > changing dir to contrib/median. > fixed > * Cosmetic coding style should be more appropriate, including trailing > white spaces and indentation positions. > can you specify where, please? > * Since these two aggregates use tuplesort inside it, there're > possible risk to cause out of memory in normal use case. I don't think > this fact is critical, but at least some notation should be referred > in docs. > it is same as windows function, no? So the risk is same. > * It doesn't contain any document nor regression tests. > yes, it is my fault, some median regress test appended, documentation I am will sending later - resp. if you agree with current form of patch, I'll finalize this patch and remove "median" to core. I put these functions to contrib just for simple testing of proof concept. > * They should be callable in window function context; for example > > contrib_regression=# select median(a) over (order by a) from t1; > ERROR: invalid tuplesort state > > or at least user-friend message should be printed. > the problem is in windows function implementation - partially fixed - blocked from windows context > * The returned type is controversy. median(int) returns float8 as the > patch intended, but avg(int) returns numeric. AFAIK only avg(float8) > returns float8. > fixed > * percentile() is more problematic; first, the second argument for the > aggregate takes N of "N%ile" as int, like 50 if you want 50%ile, but > it doesn't care about negative values or more than 100. In addition, > the second argument is taken at the first non-NULL value of the first > argument, but the second argument is semantically constant. For > example, for (1.. 10) value of a in table t1, > yes, I understand - I don't have a problem to modify it. Just this has same behave as string_agg. This is again deeper problem - we cannot ensure so some parameter of aggregation function will be a constant. - so it cannot be well solved now :( Have you some idea? There isn't any tool for checking if some parameter is constant or not. I removed percentile now. > contrib_regression=# select percentile(a, a * 10 order by a) from t1; > percentile > ------------ > 1 > (1 row) > > contrib_regression=# select percentile(a, a * 10 order by a desc) from t1; > percentile > ------------ > 10 > (1 row) > > and if the argument comes from the subquery which doesn't contain > ORDER BY clause, you cannot predict the result. > > That's all of my quick review up to now. > > Regards, > > -- > Hitoshi Harada >
Вложения
On Tue, Sep 21, 2010 at 4:28 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> * Cosmetic coding style should be more appropriate, including trailing >> white spaces and indentation positions. > > can you specify where, please? I noticed a lot of problems in this area when working on your \ef / \sf patch, too. Trailing whitespace is nearly always bad, and it's not hard to find - just grep the diff for it. As for indentation, try to match the surrounding code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
2010/9/21 Robert Haas <robertmhaas@gmail.com>: > On Tue, Sep 21, 2010 at 4:28 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>> * Cosmetic coding style should be more appropriate, including trailing >>> white spaces and indentation positions. >> >> can you specify where, please? > > I noticed a lot of problems in this area when working on your \ef / > \sf patch, too. Trailing whitespace is nearly always bad, and it's > not hard to find - just grep the diff for it. As for indentation, try > to match the surrounding code. is updated patch better? Pavel > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise Postgres Company >
Вложения
2010/9/22 Pavel Stehule <pavel.stehule@gmail.com>: > 2010/9/21 Robert Haas <robertmhaas@gmail.com>: >> On Tue, Sep 21, 2010 at 4:28 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>>> * Cosmetic coding style should be more appropriate, including trailing >>>> white spaces and indentation positions. >>> >>> can you specify where, please? >> >> I noticed a lot of problems in this area when working on your \ef / >> \sf patch, too. Trailing whitespace is nearly always bad, and it's >> not hard to find - just grep the diff for it. As for indentation, try >> to match the surrounding code. > > is updated patch better? Better, but you should be more careful, for example, + tuplesort_performsort(aggstate->sortstate); + <-- 1 + while (tuplesort_getdatum(aggstate->sortstate, + true, + &value, &isNull)) For #1, please remove horizontal tabs in empty lines. Regards, -- Hitoshi Harada
2010/9/22 Pavel Stehule <pavel.stehule@gmail.com>: > Hello > > I found probably hard problem in cooperation with window functions :( > > tuplesort_begin_datum creates child context inside aggcontext. This > context is used for tuplestore. But when this function is called from > WindowAgg_Aggregates context - someone drops all child context every > iteration, and then tuplestore state is invalid. For this moment we > can block using a median function as window function, but it should be > solved better - if it is possible - I don't see inside window function > implementation. Does it happen when the window frame starts from not UNBOUNDED PRECEDING? In those cases, nodeWindowAgg tries to discard all aggregate contexts and to initialize the aggregate state. AFAIK the memory context is brand-new although it was reset and its children deleted, so if the function is well-formed and initializes its state on NULL input, it doesn't cause a problem. Regards, -- Hitoshi Harada
2010/9/22 Hitoshi Harada <umi.tanuki@gmail.com>: > 2010/9/22 Pavel Stehule <pavel.stehule@gmail.com>: >> 2010/9/21 Robert Haas <robertmhaas@gmail.com>: >>> On Tue, Sep 21, 2010 at 4:28 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>>>> * Cosmetic coding style should be more appropriate, including trailing >>>>> white spaces and indentation positions. >>>> >>>> can you specify where, please? >>> >>> I noticed a lot of problems in this area when working on your \ef / >>> \sf patch, too. Trailing whitespace is nearly always bad, and it's >>> not hard to find - just grep the diff for it. As for indentation, try >>> to match the surrounding code. >> >> is updated patch better? > > Better, but you should be more careful, for example, > > + tuplesort_performsort(aggstate->sortstate); > + <-- 1 > + while (tuplesort_getdatum(aggstate->sortstate, > + true, > + &value, &isNull)) > > For #1, please remove horizontal tabs in empty lines. ok, checked removed Pavel > > Regards, > > > -- > Hitoshi Harada >
Вложения
2010/9/22 Hitoshi Harada <umi.tanuki@gmail.com>: > 2010/9/22 Pavel Stehule <pavel.stehule@gmail.com>: >> Hello >> >> I found probably hard problem in cooperation with window functions :( >> >> tuplesort_begin_datum creates child context inside aggcontext. This >> context is used for tuplestore. But when this function is called from >> WindowAgg_Aggregates context - someone drops all child context every >> iteration, and then tuplestore state is invalid. For this moment we >> can block using a median function as window function, but it should be >> solved better - if it is possible - I don't see inside window function >> implementation. > > Does it happen when the window frame starts from not UNBOUNDED > PRECEDING? In those cases, nodeWindowAgg tries to discard all > aggregate contexts and to initialize the aggregate state. AFAIK the > memory context is brand-new although it was reset and its children > deleted, so if the function is well-formed and initializes its state > on NULL input, it doesn't cause a problem. > It is bad premise. Inside a aggregates I don't have a control over memory context. There missing one level of persistent memory context where I can put a child context - but this context should be created outside - some like group context. Pavel > > Regards, > > > -- > Hitoshi Harada >
Hello 2010/9/22 Hitoshi Harada <umi.tanuki@gmail.com>: > 2010/9/22 Pavel Stehule <pavel.stehule@gmail.com>: >> Hello >> >> I found probably hard problem in cooperation with window functions :( >> >> tuplesort_begin_datum creates child context inside aggcontext. This >> context is used for tuplestore. But when this function is called from >> WindowAgg_Aggregates context - someone drops all child context every >> iteration, and then tuplestore state is invalid. For this moment we >> can block using a median function as window function, but it should be >> solved better - if it is possible - I don't see inside window function >> implementation. > > Does it happen when the window frame starts from not UNBOUNDED > PRECEDING? In those cases, nodeWindowAgg tries to discard all > aggregate contexts and to initialize the aggregate state. AFAIK the > memory context is brand-new although it was reset and its children > deleted, so if the function is well-formed and initializes its state > on NULL input, it doesn't cause a problem. maybe I was confused. I found a other possible problems. The problem with median function is probably inside a final function implementation. Actually we request possibility of repetitive call of final function. But final function call tuplesort_end function and tuplesort_performsort. These function changes a state of tuplesort. The most basic question is "who has to call tuplesort_end function and when? Regards Pavel Stehule > > > Regards, > > > -- > Hitoshi Harada >
Hello I moved a "median" function to core. + doc part + regress test Regards Pavel Stehule 2010/9/20 Hitoshi Harada <umi.tanuki@gmail.com>: > 2010/8/19 Pavel Stehule <pavel.stehule@gmail.com>: >> Hello >> >> I am sending a prototype implementation of functions median and >> percentile. This implementation is very simple and I moved it to >> contrib for this moment - it is more easy maintainable. Later I'll >> move it to core. > > I've reviewed this patch. > > * The patch can apply cleanly and make doesn't print any errors nor > warnings. But it doesn't touch contrib/Makefile so I had to make by > changing dir to contrib/median. > > * Cosmetic coding style should be more appropriate, including trailing > white spaces and indentation positions. > > * Since these two aggregates use tuplesort inside it, there're > possible risk to cause out of memory in normal use case. I don't think > this fact is critical, but at least some notation should be referred > in docs. > > * It doesn't contain any document nor regression tests. > > * They should be callable in window function context; for example > > contrib_regression=# select median(a) over (order by a) from t1; > ERROR: invalid tuplesort state > > or at least user-friend message should be printed. > > * The returned type is controversy. median(int) returns float8 as the > patch intended, but avg(int) returns numeric. AFAIK only avg(float8) > returns float8. > > * percentile() is more problematic; first, the second argument for the > aggregate takes N of "N%ile" as int, like 50 if you want 50%ile, but > it doesn't care about negative values or more than 100. In addition, > the second argument is taken at the first non-NULL value of the first > argument, but the second argument is semantically constant. For > example, for (1.. 10) value of a in table t1, > > contrib_regression=# select percentile(a, a * 10 order by a) from t1; > percentile > ------------ > 1 > (1 row) > > contrib_regression=# select percentile(a, a * 10 order by a desc) from t1; > percentile > ------------ > 10 > (1 row) > > and if the argument comes from the subquery which doesn't contain > ORDER BY clause, you cannot predict the result. > > That's all of my quick review up to now. > > Regards, > > -- > Hitoshi Harada >
Вложения
sorry little bit fixed patch Pavel 2010/9/23 Pavel Stehule <pavel.stehule@gmail.com>: > Hello > > I moved a "median" function to core. > > + doc part > + regress test > > Regards > > Pavel Stehule > > > 2010/9/20 Hitoshi Harada <umi.tanuki@gmail.com>: >> 2010/8/19 Pavel Stehule <pavel.stehule@gmail.com>: >>> Hello >>> >>> I am sending a prototype implementation of functions median and >>> percentile. This implementation is very simple and I moved it to >>> contrib for this moment - it is more easy maintainable. Later I'll >>> move it to core. >> >> I've reviewed this patch. >> >> * The patch can apply cleanly and make doesn't print any errors nor >> warnings. But it doesn't touch contrib/Makefile so I had to make by >> changing dir to contrib/median. >> >> * Cosmetic coding style should be more appropriate, including trailing >> white spaces and indentation positions. >> >> * Since these two aggregates use tuplesort inside it, there're >> possible risk to cause out of memory in normal use case. I don't think >> this fact is critical, but at least some notation should be referred >> in docs. >> >> * It doesn't contain any document nor regression tests. >> >> * They should be callable in window function context; for example >> >> contrib_regression=# select median(a) over (order by a) from t1; >> ERROR: invalid tuplesort state >> >> or at least user-friend message should be printed. >> >> * The returned type is controversy. median(int) returns float8 as the >> patch intended, but avg(int) returns numeric. AFAIK only avg(float8) >> returns float8. >> >> * percentile() is more problematic; first, the second argument for the >> aggregate takes N of "N%ile" as int, like 50 if you want 50%ile, but >> it doesn't care about negative values or more than 100. In addition, >> the second argument is taken at the first non-NULL value of the first >> argument, but the second argument is semantically constant. For >> example, for (1.. 10) value of a in table t1, >> >> contrib_regression=# select percentile(a, a * 10 order by a) from t1; >> percentile >> ------------ >> 1 >> (1 row) >> >> contrib_regression=# select percentile(a, a * 10 order by a desc) from t1; >> percentile >> ------------ >> 10 >> (1 row) >> >> and if the argument comes from the subquery which doesn't contain >> ORDER BY clause, you cannot predict the result. >> >> That's all of my quick review up to now. >> >> Regards, >> >> -- >> Hitoshi Harada >> >
Вложения
2010/9/23 Pavel Stehule <pavel.stehule@gmail.com>: > Hello > > 2010/9/22 Hitoshi Harada <umi.tanuki@gmail.com>: >> 2010/9/22 Pavel Stehule <pavel.stehule@gmail.com>: >>> Hello >>> >>> I found probably hard problem in cooperation with window functions :( > > maybe I was confused. I found a other possible problems. > > The problem with median function is probably inside a final function > implementation. Actually we request possibility of repetitive call of > final function. But final function call tuplesort_end function and > tuplesort_performsort. These function changes a state of tuplesort. > The most basic question is "who has to call tuplesort_end function and > when? Reading the comment in array_userfuncs.c, array_agg_finalfn() doesn't clean up its internal state at all and tells it's the executor's responsibility to clear memory. It is allowed since ArrayBuildState is only in-memory state. In the other hand, TupleSort should be cleared by calling tuplesort_end() if it has tapeset member (on file based sort) to close physical files. So 2 or 3 ways to go in my mind: 1. call tuplesort_begin_datum with INT_MAX workMem rather than the global work_mem, to avoid it spills out sort state to files. It may sounds dangerous, but actually memory exhausting can happen in array_agg() as well. 2. add TupleSort an argument that tells not to use file at all. This results in the same as #1 but more generic approach. 3. don't use tuplesort in median() but implement its original sort management. This looks quite redundant and like maintenance problem. #2 sounds like the best in generic and consistent way. The only point is whether the change is worth for implementing median() as it's very system-wide common fundamentals. Other options? Regards, -- Hitoshi Harada
2010/9/23 Hitoshi Harada <umi.tanuki@gmail.com>: > 2010/9/23 Pavel Stehule <pavel.stehule@gmail.com>: >> Hello >> >> 2010/9/22 Hitoshi Harada <umi.tanuki@gmail.com>: >>> 2010/9/22 Pavel Stehule <pavel.stehule@gmail.com>: >>>> Hello >>>> >>>> I found probably hard problem in cooperation with window functions :( >> >> maybe I was confused. I found a other possible problems. >> >> The problem with median function is probably inside a final function >> implementation. Actually we request possibility of repetitive call of >> final function. But final function call tuplesort_end function and >> tuplesort_performsort. These function changes a state of tuplesort. >> The most basic question is "who has to call tuplesort_end function and >> when? > > Reading the comment in array_userfuncs.c, array_agg_finalfn() doesn't > clean up its internal state at all and tells it's the executor's > responsibility to clear memory. It is allowed since ArrayBuildState is > only in-memory state. In the other hand, TupleSort should be cleared > by calling tuplesort_end() if it has tapeset member (on file based > sort) to close physical files. > > So 2 or 3 ways to go in my mind: it is little bit worse - we cannot to call tuplesort_performsort repetitive. > > 1. call tuplesort_begin_datum with INT_MAX workMem rather than the > global work_mem, to avoid it spills out sort state to files. It may > sounds dangerous, but actually memory exhausting can happen in > array_agg() as well. > > 2. add TupleSort an argument that tells not to use file at all. This > results in the same as #1 but more generic approach. > > 3. don't use tuplesort in median() but implement its original sort > management. This looks quite redundant and like maintenance problem. > > #2 sounds like the best in generic and consistent way. The only point > is whether the change is worth for implementing median() as it's very > system-wide common fundamentals. > > Other options? #4 block median under window clause #5 use a C array instead tuplesort under window clause. It is very unpractical to use a windows clauses with large datasets, so it should not be a problem. More, this can be very quick, because for C array we can use a qsort function. Now I prefer #5 - it can be fast for using inside windows clause and safe when window clause will not be used. Regards Pavel > > > Regards, > -- > Hitoshi Harada >
On Thu, Sep 23, 2010 at 08:27:38PM +0200, Pavel Stehule wrote: > 2010/9/23 Hitoshi Harada <umi.tanuki@gmail.com>: > > 2010/9/23 Pavel Stehule <pavel.stehule@gmail.com>: > >> Hello > >> > >> 2010/9/22 Hitoshi Harada <umi.tanuki@gmail.com>: > >>> 2010/9/22 Pavel Stehule <pavel.stehule@gmail.com>: > >>>> Hello > >>>> > >>>> I found probably hard problem in cooperation with window functions :( > >> > >> maybe I was confused. I found a other possible problems. > >> > >> The problem with median function is probably inside a final function > >> implementation. Actually we request possibility of repetitive call of > >> final function. But final function call tuplesort_end function and > >> tuplesort_performsort. These function changes a state of tuplesort. > >> The most basic question is "who has to call tuplesort_end function and > >> when? > > > > Reading the comment in array_userfuncs.c, array_agg_finalfn() doesn't > > clean up its internal state at all and tells it's the executor's > > responsibility to clear memory. It is allowed since ArrayBuildState is > > only in-memory state. In the other hand, TupleSort should be cleared > > by calling tuplesort_end() if it has tapeset member (on file based > > sort) to close physical files. > > > > So 2 or 3 ways to go in my mind: > > it is little bit worse - we cannot to call tuplesort_performsort repetitive. > > > > > 1. call tuplesort_begin_datum with INT_MAX workMem rather than the > > global work_mem, to avoid it spills out sort state to files. It may > > sounds dangerous, but actually memory exhausting can happen in > > array_agg() as well. > > > > 2. add TupleSort an argument that tells not to use file at all. This > > results in the same as #1 but more generic approach. > > > > 3. don't use tuplesort in median() but implement its original sort > > management. This looks quite redundant and like maintenance problem. > > > > #2 sounds like the best in generic and consistent way. The only point > > is whether the change is worth for implementing median() as it's very > > system-wide common fundamentals. > > > > Other options? > > #4 block median under window clause > > #5 use a C array instead tuplesort under window clause. It is very > unpractical to use a windows clauses with large datasets, so it should > not be a problem. More, this can be very quick, because for C array we > can use a qsort function. > > Now I prefer #5 - it can be fast for using inside windows clause and > safe when window clause will not be used. If there's some way to do this using the same code in the windowing and non-windowing case, that would be much, much better from an architectural point of view. Single Point of Truth and all that. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
2010/9/23 David Fetter <david@fetter.org>: > On Thu, Sep 23, 2010 at 08:27:38PM +0200, Pavel Stehule wrote: >> 2010/9/23 Hitoshi Harada <umi.tanuki@gmail.com>: >> > 2010/9/23 Pavel Stehule <pavel.stehule@gmail.com>: >> >> Hello >> >> >> >> 2010/9/22 Hitoshi Harada <umi.tanuki@gmail.com>: >> >>> 2010/9/22 Pavel Stehule <pavel.stehule@gmail.com>: >> >>>> Hello >> >>>> >> >>>> I found probably hard problem in cooperation with window functions :( >> >> >> >> maybe I was confused. I found a other possible problems. >> >> >> >> The problem with median function is probably inside a final function >> >> implementation. Actually we request possibility of repetitive call of >> >> final function. But final function call tuplesort_end function and >> >> tuplesort_performsort. These function changes a state of tuplesort. >> >> The most basic question is "who has to call tuplesort_end function and >> >> when? >> > >> > Reading the comment in array_userfuncs.c, array_agg_finalfn() doesn't >> > clean up its internal state at all and tells it's the executor's >> > responsibility to clear memory. It is allowed since ArrayBuildState is >> > only in-memory state. In the other hand, TupleSort should be cleared >> > by calling tuplesort_end() if it has tapeset member (on file based >> > sort) to close physical files. >> > >> > So 2 or 3 ways to go in my mind: >> >> it is little bit worse - we cannot to call tuplesort_performsort repetitive. >> >> > >> > 1. call tuplesort_begin_datum with INT_MAX workMem rather than the >> > global work_mem, to avoid it spills out sort state to files. It may >> > sounds dangerous, but actually memory exhausting can happen in >> > array_agg() as well. >> > >> > 2. add TupleSort an argument that tells not to use file at all. This >> > results in the same as #1 but more generic approach. >> > >> > 3. don't use tuplesort in median() but implement its original sort >> > management. This looks quite redundant and like maintenance problem. >> > >> > #2 sounds like the best in generic and consistent way. The only point >> > is whether the change is worth for implementing median() as it's very >> > system-wide common fundamentals. >> > >> > Other options? >> >> #4 block median under window clause >> >> #5 use a C array instead tuplesort under window clause. It is very >> unpractical to use a windows clauses with large datasets, so it should >> not be a problem. More, this can be very quick, because for C array we >> can use a qsort function. >> >> Now I prefer #5 - it can be fast for using inside windows clause and >> safe when window clause will not be used. > > If there's some way to do this using the same code in the windowing > and non-windowing case, that would be much, much better from an > architectural point of view. Single Point of Truth and all that. We can have a median with support a window clause, but limited to work_mem, or we can have a unlimited median, but without window clause. I think, I am able to minimalize a code duplicity - just to define some envelope over tuplesort. The unique code isn't possible there - minimally now we have a two variants - one for numeric result and second for double. But it is usual - try to look how much AVG functions are in core. Regards Pavel > > Cheers, > David. > -- > David Fetter <david@fetter.org> http://fetter.org/ > Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter > Skype: davidfetter XMPP: david.fetter@gmail.com > iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics > > Remember to vote! > Consider donating to Postgres: http://www.postgresql.org/about/donate >
Hello, there is updated version - with support of window clause. The limits are work_mem for using inside window aggregate or unlimited when it is used as standard query. This patch needs a few work - can share a compare functionality with tuplesort.c, but I would to verify a concept now. Comments? Regards Pavel
Вложения
2010/9/26 Pavel Stehule <pavel.stehule@gmail.com>: > Hello, > > there is updated version - with support of window clause. The limits > are work_mem for using inside window aggregate or unlimited when it is > used as standard query. > > This patch needs a few work - can share a compare functionality with > tuplesort.c, but I would to verify a concept now. > > Comments? Sorry for delay. I read the patch and it seems the result is sane. For window function calls, I agree that the current tuplesort is not enough to implement median functions and the patch introduces its own memsort mechanism, although memsort has too much copied from tuplesort. It looks to me not so difficult to modify the existing tuplesort to guarantee staying in memory always if an option to do so is specified from caller. I think that option can be used by other cases in the core code. Regards, -- Hitoshi Harada
Hello 2010/10/1 Hitoshi Harada <umi.tanuki@gmail.com>: > 2010/9/26 Pavel Stehule <pavel.stehule@gmail.com>: >> Hello, >> >> there is updated version - with support of window clause. The limits >> are work_mem for using inside window aggregate or unlimited when it is >> used as standard query. >> >> This patch needs a few work - can share a compare functionality with >> tuplesort.c, but I would to verify a concept now. >> >> Comments? > > Sorry for delay. I read the patch and it seems the result is sane. For > window function calls, I agree that the current tuplesort is not > enough to implement median functions and the patch introduces its own > memsort mechanism, although memsort has too much copied from > tuplesort. It looks to me not so difficult to modify the existing > tuplesort to guarantee staying in memory always if an option to do so > is specified from caller. I think that option can be used by other > cases in the core code. There are two factors - a) tuplesorts should to uses only memory, b) data can be sorted again and again. I'll look on possible tuplesort modifications on weekend. Pavel > > Regards, > > > -- > Hitoshi Harada >
Hitoshi Harada <umi.tanuki@gmail.com> writes: > 2010/9/26 Pavel Stehule <pavel.stehule@gmail.com>: >> This patch needs a few work - can share a compare functionality with >> tuplesort.c, but I would to verify a concept now. > Sorry for delay. I read the patch and it seems the result is sane. For > window function calls, I agree that the current tuplesort is not > enough to implement median functions and the patch introduces its own > memsort mechanism, although memsort has too much copied from > tuplesort. It looks to me not so difficult to modify the existing > tuplesort to guarantee staying in memory always if an option to do so > is specified from caller. I think that option can be used by other > cases in the core code. If this patch tries to force the entire sort to happen in memory, it is not committable. What will happen when you get a lot of data? You need to be working on a variant that will work anyway, not working on an unacceptable lobotomization of the main sort code. regards, tom lane
2010/10/1 Tom Lane <tgl@sss.pgh.pa.us>: > Hitoshi Harada <umi.tanuki@gmail.com> writes: >> 2010/9/26 Pavel Stehule <pavel.stehule@gmail.com>: >>> This patch needs a few work - can share a compare functionality with >>> tuplesort.c, but I would to verify a concept now. > >> Sorry for delay. I read the patch and it seems the result is sane. For >> window function calls, I agree that the current tuplesort is not >> enough to implement median functions and the patch introduces its own >> memsort mechanism, although memsort has too much copied from >> tuplesort. It looks to me not so difficult to modify the existing >> tuplesort to guarantee staying in memory always if an option to do so >> is specified from caller. I think that option can be used by other >> cases in the core code. > > If this patch tries to force the entire sort to happen in memory, > it is not committable. What will happen when you get a lot of data? > You need to be working on a variant that will work anyway, not working > on an unacceptable lobotomization of the main sort code. The median function checking a calling context - under window aggregate uses a just memory sort solution - and under standard aggregate context it uses a full tuplesort. It bases on request of window aggregate function - the final function have to be called more time - and it isn't possible with tuplesort. So as window aggregate it uses just memory sort limited with work_mem. Other usage is unlimited. Second option was a block median function under window aggregates. It this design possible? We cannot use any complex source under window aggregates, because there isn't any way to unlink it. Regards Pavel > > regards, tom lane >
2010/10/1 Tom Lane <tgl@sss.pgh.pa.us>: > Hitoshi Harada <umi.tanuki@gmail.com> writes: >> 2010/9/26 Pavel Stehule <pavel.stehule@gmail.com>: >>> This patch needs a few work - can share a compare functionality with >>> tuplesort.c, but I would to verify a concept now. > >> Sorry for delay. I read the patch and it seems the result is sane. For >> window function calls, I agree that the current tuplesort is not >> enough to implement median functions and the patch introduces its own >> memsort mechanism, although memsort has too much copied from >> tuplesort. It looks to me not so difficult to modify the existing >> tuplesort to guarantee staying in memory always if an option to do so >> is specified from caller. I think that option can be used by other >> cases in the core code. > > If this patch tries to force the entire sort to happen in memory, > it is not committable. What will happen when you get a lot of data? > You need to be working on a variant that will work anyway, not working > on an unacceptable lobotomization of the main sort code. What about array_agg()? Doesn't it exceed memory even if the huge data come in? -- Hitoshi Harada
Hitoshi Harada <umi.tanuki@gmail.com> writes: > 2010/10/1 Tom Lane <tgl@sss.pgh.pa.us>: >> If this patch tries to force the entire sort to happen in memory, >> it is not committable. > What about array_agg()? Doesn't it exceed memory even if the huge data come in? Yeah, but for array_agg the user should be expecting a result of approximately the size of the whole input, so if he overruns memory he hasn't got a lot of room to complain. There is no reason for a user to expect that median or percentile will fall over on large input, and every reason to expect them to be more robust than that. regards, tom lane
On Fri, Oct 1, 2010 at 10:11 AM, Hitoshi Harada <umi.tanuki@gmail.com> wrote: > 2010/10/1 Tom Lane <tgl@sss.pgh.pa.us>: >> Hitoshi Harada <umi.tanuki@gmail.com> writes: >>> 2010/9/26 Pavel Stehule <pavel.stehule@gmail.com>: >>>> This patch needs a few work - can share a compare functionality with >>>> tuplesort.c, but I would to verify a concept now. >> >>> Sorry for delay. I read the patch and it seems the result is sane. For >>> window function calls, I agree that the current tuplesort is not >>> enough to implement median functions and the patch introduces its own >>> memsort mechanism, although memsort has too much copied from >>> tuplesort. It looks to me not so difficult to modify the existing >>> tuplesort to guarantee staying in memory always if an option to do so >>> is specified from caller. I think that option can be used by other >>> cases in the core code. >> >> If this patch tries to force the entire sort to happen in memory, >> it is not committable. What will happen when you get a lot of data? >> You need to be working on a variant that will work anyway, not working >> on an unacceptable lobotomization of the main sort code. > > What about array_agg()? Doesn't it exceed memory even if the huge data come in? So, if you have 512MB of RAM in the box and you build and return a 1GB array, it's going to be a problem. Period, full stop. The interim memory consumption cannot be less than the size of the final result. If you have 512MB of RAM in the box and you want to aggregate 1GB of data and return a 4 byte integer, it's only a problem if your implementation is bad. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
2010/10/1 Tom Lane <tgl@sss.pgh.pa.us>: > Hitoshi Harada <umi.tanuki@gmail.com> writes: >> 2010/10/1 Tom Lane <tgl@sss.pgh.pa.us>: >>> If this patch tries to force the entire sort to happen in memory, >>> it is not committable. > >> What about array_agg()? Doesn't it exceed memory even if the huge data come in? > > Yeah, but for array_agg the user should be expecting a result of > approximately the size of the whole input, so if he overruns memory he > hasn't got a lot of room to complain. There is no reason for a user to > expect that median or percentile will fall over on large input, and > every reason to expect them to be more robust than that. So it's particular problem of *median* but not the idea of on-memory-guaranteed tuplesort. If this way is not committable, one of alternatives is to implement median as a window function rather than an aggregate. But the big problem of this is that it's impossible to have two same-input-same-name functions of aggregate and window. AFAIK they are ambiguous at parser stage. So we have to have median() for aggregate and something like median_w() over (). This is worse idea, I feel. Another way is to modify nodeWindowAgg in some way, but I cannot wrap up how to. To call some destructor in the end of partition somehow, but this is out of the current aggregate system. The bottom-line is to throw an error from median in window aggregate, but personally I would like to see median in window aggregate, which is quite smart. Another suggestion? Regards, -- Hitoshi Harada
Hitoshi Harada <umi.tanuki@gmail.com> writes: > Another suggestion? The implementation I would've expected to see is to do the sort and then have two code paths for retrieving the median, depending on whether the sort result is all in memory or not. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hitoshi Harada <umi.tanuki@gmail.com> writes: >> Another suggestion? > > The implementation I would've expected to see is to do the sort > and then have two code paths for retrieving the median, depending > on whether the sort result is all in memory or not. Would it make sense to accumulate value/count pairs in a hash table, along with a total count, as the tuples are encountered, and sort the (potentially smaller) hash table at the end? (Not that this helps with the memory management questions...) Large sets with any significant degree of duplication in values (say the age in years of residents of a state) would probably run significantly faster this way. -Kevin
2010/10/2 Tom Lane <tgl@sss.pgh.pa.us>: > Hitoshi Harada <umi.tanuki@gmail.com> writes: >> Another suggestion? > > The implementation I would've expected to see is to do the sort and then > have two code paths for retrieving the median, depending on whether the > sort result is all in memory or not. > Hm? The problem we encountered in the middle of the patch is there is no chance to call tuplesort_end if median is called in moving frame window aggregate because final function is called multiple times during moving. The nearest pattern was array_agg() which uses on-memory state and throw its responsibility to clear memory to executor (nodeAgg / nodeWindowAgg). Am I missing something? Regards, -- Hitoshi Harada
Hitoshi Harada <umi.tanuki@gmail.com> writes: > 2010/10/2 Tom Lane <tgl@sss.pgh.pa.us>: >> The implementation I would've expected to see is to do the sort and then >> have two code paths for retrieving the median, depending on whether the >> sort result is all in memory or not. > Hm? The problem we encountered in the middle of the patch is there is > no chance to call tuplesort_end if median is called in moving frame > window aggregate because final function is called multiple times > during moving. Well, if you haven't got a solution for that, then this patch isn't ready for prime time. It's entirely possible that median as a window function is intractable. I'd rather have it throwing error than offer an implementation that will fall over as soon as the window gets large. regards, tom lane
2010/10/2 Tom Lane <tgl@sss.pgh.pa.us>: > Hitoshi Harada <umi.tanuki@gmail.com> writes: >> 2010/10/2 Tom Lane <tgl@sss.pgh.pa.us>: >>> The implementation I would've expected to see is to do the sort and then >>> have two code paths for retrieving the median, depending on whether the >>> sort result is all in memory or not. > >> Hm? The problem we encountered in the middle of the patch is there is >> no chance to call tuplesort_end if median is called in moving frame >> window aggregate because final function is called multiple times >> during moving. > > Well, if you haven't got a solution for that, then this patch isn't > ready for prime time. > > It's entirely possible that median as a window function is intractable. > I'd rather have it throwing error than offer an implementation that will > fall over as soon as the window gets large. Well, that sounds like the conclusion. It is a shame, but we have to throw an error from median() in the window aggregate, if Pavel does not have any better solution. And as an aggregate function only, the patch is ready if the window-related parts are removed. Regards, -- Hitoshi Harada
2010/10/2 Kevin Grittner <Kevin.Grittner@wicourts.gov>: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Hitoshi Harada <umi.tanuki@gmail.com> writes: >>> Another suggestion? >> >> The implementation I would've expected to see is to do the sort >> and then have two code paths for retrieving the median, depending >> on whether the sort result is all in memory or not. > > Would it make sense to accumulate value/count pairs in a hash table, Maybe, but it still has memory problem if the values vary, right? And I'm not familiar with the algorithm of median other than what the current patch does, so I'm not sure if hash table solution can be made easily. Regards, -- Hitoshi Harada
2010/10/1 Hitoshi Harada <umi.tanuki@gmail.com>: > 2010/10/2 Tom Lane <tgl@sss.pgh.pa.us>: >> Hitoshi Harada <umi.tanuki@gmail.com> writes: >>> 2010/10/2 Tom Lane <tgl@sss.pgh.pa.us>: >>>> The implementation I would've expected to see is to do the sort and then >>>> have two code paths for retrieving the median, depending on whether the >>>> sort result is all in memory or not. >> >>> Hm? The problem we encountered in the middle of the patch is there is >>> no chance to call tuplesort_end if median is called in moving frame >>> window aggregate because final function is called multiple times >>> during moving. >> >> Well, if you haven't got a solution for that, then this patch isn't >> ready for prime time. >> >> It's entirely possible that median as a window function is intractable. >> I'd rather have it throwing error than offer an implementation that will >> fall over as soon as the window gets large. > > Well, that sounds like the conclusion. It is a shame, but we have to > throw an error from median() in the window aggregate, if Pavel does > not have any better solution. And as an aggregate function only, the > patch is ready if the window-related parts are removed. > I am sorry - I don't have a better solution. Classic algorithm isn't well for window aggregate - it needs a sort after any append a new item. Maybe we can use a separate functionality based on estimated values for a windows. I read some articles about it. But this is work on longer time - all articles about this topic are experimental. More I am not mathematician - so I am not able to review these methods. Today or tomorrow I'll send a updated patch without support a window aggregates. Regards Pavel Stehule > Regards, > > > -- > Hitoshi Harada >
Hello updated version * memsort removed * window aggregate support blocked Regards Pavel 2010/10/1 Pavel Stehule <pavel.stehule@gmail.com>: > 2010/10/1 Hitoshi Harada <umi.tanuki@gmail.com>: >> 2010/10/2 Tom Lane <tgl@sss.pgh.pa.us>: >>> Hitoshi Harada <umi.tanuki@gmail.com> writes: >>>> 2010/10/2 Tom Lane <tgl@sss.pgh.pa.us>: >>>>> The implementation I would've expected to see is to do the sort and then >>>>> have two code paths for retrieving the median, depending on whether the >>>>> sort result is all in memory or not. >>> >>>> Hm? The problem we encountered in the middle of the patch is there is >>>> no chance to call tuplesort_end if median is called in moving frame >>>> window aggregate because final function is called multiple times >>>> during moving. >>> >>> Well, if you haven't got a solution for that, then this patch isn't >>> ready for prime time. >>> >>> It's entirely possible that median as a window function is intractable. >>> I'd rather have it throwing error than offer an implementation that will >>> fall over as soon as the window gets large. >> >> Well, that sounds like the conclusion. It is a shame, but we have to >> throw an error from median() in the window aggregate, if Pavel does >> not have any better solution. And as an aggregate function only, the >> patch is ready if the window-related parts are removed. >> > > I am sorry - I don't have a better solution. Classic algorithm isn't > well for window aggregate - it needs a sort after any append a new > item. Maybe we can use a separate functionality based on estimated > values for a windows. I read some articles about it. But this is work > on longer time - all articles about this topic are experimental. More > I am not mathematician - so I am not able to review these methods. > Today or tomorrow I'll send a updated patch without support a window > aggregates. > > Regards > > Pavel Stehule > >> Regards, >> >> >> -- >> Hitoshi Harada >> >
Вложения
2010/10/2 Pavel Stehule <pavel.stehule@gmail.com>: > Hello > > updated version > * memsort removed > * window aggregate support blocked I ran this patch and it looks good for committer. Just one thing, in src/backend/utils/adt/Makefile median.o to compile the new file is missing. And I'm now thinking about how to make median happen in window aggregate. In fact we might fix parse_func.c to distinguish the same name aggregate and window functions and median() can be implemented separately from aggregate one. But it's another story than this patch. For now we can commit the median aggregate only, without window function support. Regards, -- Hitoshi Harada
On Sun, Oct 3, 2010 at 7:06 AM, Hitoshi Harada <umi.tanuki@gmail.com> wrote: > And I'm now thinking about how to make median happen in window > aggregate. If you were to do this by extending tuplesort what extra features would tuplesort need? Would tuplesort need the ability to insert additional records into an already sorted array and maintain the sort? Would it need the ability to remove records from the sorted set? Would it need the ability to do a partial sort (the QuickSelect algorithm)? The ability to do random access on disk sets? How do existing windowed aggregates work if you specify an order by on the aggregate? Do they resort for every output row? Does the spec give a way to run an arbitrary subselect on the current window? I wonder if we need more powerful machinery anyways to handle these cases. -- greg
Hello 2010/10/3 Hitoshi Harada <umi.tanuki@gmail.com>: > 2010/10/2 Pavel Stehule <pavel.stehule@gmail.com>: >> Hello >> >> updated version >> * memsort removed >> * window aggregate support blocked > > I ran this patch and it looks good for committer. > Just one thing, in src/backend/utils/adt/Makefile median.o to compile > the new file is missing. > > And I'm now thinking about how to make median happen in window > aggregate. In fact we might fix parse_func.c to distinguish the same > name aggregate and window functions and median() can be implemented > separately from aggregate one. But it's another story than this patch. > For now we can commit the median aggregate only, without window > function support. > > Regards, > Thank you very much for review fixed patch + missing Makefile Regards Pavel Stehule > > -- > Hitoshi Harada >
Вложения
2010/10/4 Greg Stark <gsstark@mit.edu>: > On Sun, Oct 3, 2010 at 7:06 AM, Hitoshi Harada <umi.tanuki@gmail.com> wrote: >> And I'm now thinking about how to make median happen in window >> aggregate. > > If you were to do this by extending tuplesort what extra features > would tuplesort need? I don't think we need to extend tuplesort when we do it. IMO this can make happen by fetching all the values in the frame and performing sort once, then getting current position and calculate median. The only thing I consider is to mark and to go to the demanded position in tuplesort like tuplestore, but actually we can manage to median() without such facilities. > How do existing windowed aggregates work if you specify an order by on > the aggregate? Do they resort for every output row? I don't know the true specification of median() nor the behavior of it in other RDBs, but I bet median is median and ORDER BY in window clause doesn't affect its result. ORDER BY specifies only the frame boundary. Regards, -- Hitoshi Harada
On 4 October 2010 07:36, Hitoshi Harada <umi.tanuki@gmail.com> wrote: > 2010/10/4 Greg Stark <gsstark@mit.edu>: >> On Sun, Oct 3, 2010 at 7:06 AM, Hitoshi Harada <umi.tanuki@gmail.com> wrote: >>> And I'm now thinking about how to make median happen in window >>> aggregate. >> >> If you were to do this by extending tuplesort what extra features >> would tuplesort need? > > I don't think we need to extend tuplesort when we do it. IMO this can > make happen by fetching all the values in the frame and performing > sort once, then getting current position and calculate median. The > only thing I consider is to mark and to go to the demanded position in > tuplesort like tuplestore, but actually we can manage to median() > without such facilities. > I dont' think that works, because the ORDER BY in the WINDOW doesn't necessarily match the sort order that the median function needs. Consider for example: select a, median(a) over(order by a desc) from generate_series(1,10) as g(a);a | median ----+--------------------10 | 10 9 | 9.5000000000000000 8 | 9 7 | 8.5000000000000000 6 | 8 5 | 7.5000000000000000 4 | 7 3 | 6.5000000000000000 2 | 6 1 | 5.5000000000000000 (10 rows) That requires a new sort for each row. I generated this with a minor tweak to Pavel's patch to just restart the tuplesort each time (the "quick-fix" solution). The problem is that performance really sucks, because it is an O(n^2 log(n)) algorithm. I don't see an easy way around that without significant new infrastructure, as Greg describes, or a completely different algorithm. >> How do existing windowed aggregates work if you specify an order by on >> the aggregate? Do they resort for every output row? > Isn't the issue that they don't need to sort, so they all work unchanged. Regards, Dean > I don't know the true specification of median() nor the behavior of it > in other RDBs, but I bet median is median and ORDER BY in window > clause doesn't affect its result. ORDER BY specifies only the frame > boundary. > > Regards, > > > > -- > Hitoshi Harada > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
On Sun, Oct 3, 2010 at 11:58 PM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > The problem is that performance really sucks, > because it is an O(n^2 log(n)) algorithm. I don't see an easy way > around that without significant new infrastructure, as Greg describes, > or a completely different algorithm. Resorting for each record would be O(n^2 log(n)). If you use Quickselect it would be O(n^2) because each selection would be O(n). But then we could get O(n^2) by just doing insertion-sort. The problem with both these algorithms is that I don't see how to do it on-disk. Maybe there would be some way to do insertion-sort on disk by keeping a buffer in-memory holding the last n records inserted and whenever the in-memory buffer fills do a merge against the data on disk to spill it. But that's a lot of special-case machinery. Personally I think if we need it to handle sorted aggregates over window functions it's worth it to carry it if someone feels like writing it. -- greg
On Mon, Oct 4, 2010 at 2:58 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > That requires a new sort for each row. I generated this with a minor > tweak to Pavel's patch to just restart the tuplesort each time (the > "quick-fix" solution). The problem is that performance really sucks, > because it is an O(n^2 log(n)) algorithm. Maybe that's OK. If you're doing repeated median operations on large data sets, perhaps you should expect that to be slow. I bet that people who want to use this as a window function will want one median per group, not n medians per group; and it doesn't seem like a good idea to say - we're not going to let you use this as a window function AT ALL because you might decide to do something that will be really slow. You can always hit ^C if you get tired of waiting. This seems like it's very far from being the most important thing for us to optimize, though of course it's great if we can. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On 4 October 2010 18:22, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Oct 4, 2010 at 2:58 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >> That requires a new sort for each row. I generated this with a minor >> tweak to Pavel's patch to just restart the tuplesort each time (the >> "quick-fix" solution). The problem is that performance really sucks, >> because it is an O(n^2 log(n)) algorithm. > > Maybe that's OK. If you're doing repeated median operations on large > data sets, perhaps you should expect that to be slow. I bet that > people who want to use this as a window function will want one median > per group, not n medians per group; and it doesn't seem like a good > idea to say - we're not going to let you use this as a window function > AT ALL because you might decide to do something that will be really > slow. You can always hit ^C if you get tired of waiting. This seems > like it's very far from being the most important thing for us to > optimize, though of course it's great if we can. > Well FWIW, here's the quick-fix solution to make this median function support window queries. It's OK if all you want is one median per partition, but otherwise it's pretty ugly. It will copy the entire tuplesort for each row in a running median, which is horrible, but that's actually negligible compared to the subsequent sort that's going to happen for that row. I'd actually be tempted to force the tuplesort to stay in memory for a running median, on the grounds that performance will be so bad for a large partition that you'd run out of patience waiting for it long before it would run out of memory :-( Regards, Dean > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise Postgres Company >
Вложения
2010/10/5 Dean Rasheed <dean.a.rasheed@gmail.com>: > On 4 October 2010 18:22, Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, Oct 4, 2010 at 2:58 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >>> That requires a new sort for each row. I generated this with a minor >>> tweak to Pavel's patch to just restart the tuplesort each time (the >>> "quick-fix" solution). The problem is that performance really sucks, >>> because it is an O(n^2 log(n)) algorithm. >> >> Maybe that's OK. If you're doing repeated median operations on large >> data sets, perhaps you should expect that to be slow. I bet that >> people who want to use this as a window function will want one median >> per group, not n medians per group; and it doesn't seem like a good >> idea to say - we're not going to let you use this as a window function >> AT ALL because you might decide to do something that will be really >> slow. You can always hit ^C if you get tired of waiting. This seems >> like it's very far from being the most important thing for us to >> optimize, though of course it's great if we can. >> > > Well FWIW, here's the quick-fix solution to make this median function > support window queries. Is this safe? It seems to me that the tuplesort_end isn't called in window aggregates at the last of a partition, which results in forgetting to close temp file if tuplesort is in state of tape. Regards, -- Hitoshi Harada
On 5 October 2010 07:04, Hitoshi Harada <umi.tanuki@gmail.com> wrote: > 2010/10/5 Dean Rasheed <dean.a.rasheed@gmail.com>: >> On 4 October 2010 18:22, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Mon, Oct 4, 2010 at 2:58 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >>>> That requires a new sort for each row. I generated this with a minor >>>> tweak to Pavel's patch to just restart the tuplesort each time (the >>>> "quick-fix" solution). The problem is that performance really sucks, >>>> because it is an O(n^2 log(n)) algorithm. >>> >>> Maybe that's OK. If you're doing repeated median operations on large >>> data sets, perhaps you should expect that to be slow. I bet that >>> people who want to use this as a window function will want one median >>> per group, not n medians per group; and it doesn't seem like a good >>> idea to say - we're not going to let you use this as a window function >>> AT ALL because you might decide to do something that will be really >>> slow. You can always hit ^C if you get tired of waiting. This seems >>> like it's very far from being the most important thing for us to >>> optimize, though of course it's great if we can. >>> >> >> Well FWIW, here's the quick-fix solution to make this median function >> support window queries. > > Is this safe? It seems to me that the tuplesort_end isn't called in > window aggregates at the last of a partition, which results in > forgetting to close temp file if tuplesort is in state of tape. > Yeah, you're right. Actually I hate doing it this way, and only really suggested it because I like it even less to add an aggregate that arbitrarily doesn't support window queries. This approach will at least work well for arbitrary sized partitions without an ORDER BY clause. With an ORDER BY clause though, even in the best-case situation (values in the partition already sorted), this is going to be O(n^2) for a running median, but it seems like it would be significantly more work to come up with a better algorithm, and I'm not sure that there is sufficient demand for this function to justify that. Extrapolating from few quick timing tests, even in the best case, on my machine, it would take 7 days for the running median to use up 100MB, and 8 years for it to use 2GB. So setting the tuplesort's workMem to 2GB (only in the running median case) would actually be safe in practice, and would prevent the temp file leak (for a few years at least!). I feel dirty even suggesting that. Better ideas anyone? Regards, Dean
2010/10/5 Dean Rasheed <dean.a.rasheed@gmail.com>: > On 5 October 2010 07:04, Hitoshi Harada <umi.tanuki@gmail.com> wrote: > Extrapolating from few quick timing tests, even in the best case, on > my machine, it would take 7 days for the running median to use up > 100MB, and 8 years for it to use 2GB. So setting the tuplesort's > workMem to 2GB (only in the running median case) would actually be > safe in practice, and would prevent the temp file leak (for a few > years at least!). I feel dirty even suggesting that. Better ideas > anyone? So, I suggested to implement median as a *pure* window function aside from Pavel's aggregate function, and Greg suggested insertion capability of tuplesort. By this approach, we keep tuplesort to hold all the values in the current frame and can release it on the last of a partition (it's possible by window function API.) This is incremental addition of values and is far better than O(n^2 log(n)) although I didn't estimate the order. Only when the frame head is moving down, we should re-initialize tuplesort and it is as slow as calling aggregate version per each row (but I think we can solve it somehow if looking precisely). Regards, -- Hitoshi Harada
On 5 October 2010 13:14, Hitoshi Harada <umi.tanuki@gmail.com> wrote: > 2010/10/5 Dean Rasheed <dean.a.rasheed@gmail.com>: >> On 5 October 2010 07:04, Hitoshi Harada <umi.tanuki@gmail.com> wrote: >> Extrapolating from few quick timing tests, even in the best case, on >> my machine, it would take 7 days for the running median to use up >> 100MB, and 8 years for it to use 2GB. So setting the tuplesort's >> workMem to 2GB (only in the running median case) would actually be >> safe in practice, and would prevent the temp file leak (for a few >> years at least!). I feel dirty even suggesting that. Better ideas >> anyone? > > So, I suggested to implement median as a *pure* window function aside > from Pavel's aggregate function, and Greg suggested insertion > capability of tuplesort. By this approach, we keep tuplesort to hold > all the values in the current frame and can release it on the last of > a partition (it's possible by window function API.) This is > incremental addition of values and is far better than O(n^2 log(n)) > although I didn't estimate the order. Only when the frame head is > moving down, we should re-initialize tuplesort and it is as slow as > calling aggregate version per each row (but I think we can solve it > somehow if looking precisely). > Possibly, but that sounds like a lot of work to get an efficient algorithm. The 3 cases I see are: 1). Simple aggregate. Current algorithm is O(n log(n)) which is OK. It could be better because a full sort is not strictly needed. As already mentioned, a quickselect would be O(n). 2). Window without ORDER BY. This is actually basically the same as (1), but called once per partition. 3). Window with ORDER BY (running median). The simplest algorithm is O(n^2 log(n)). It could be tweaked to use an insertion sort, but that would still be O(n^2), which is not a lot better for all the effort that would be involved. In theory (perhaps with some kind of tree) it ought to be possible to come up with an O(n log(n)) algorithm, but that would be a lot of work. In the meantime, the attached variation of the patch fixes the temp file issue and will support all 3 cases. It gives OK performance for (1) and (2), and poor performance for (3). That could be viewed as a future development task, which perhaps the window function API would help with. I think it would be a shame to drop support for (2) just because we can't do (3) efficiently yet. Regards, Dean > Regards, > > -- > Hitoshi Harada >
Вложения
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > In the meantime, the attached variation of the patch fixes the temp > file issue and will support all 3 cases. It gives OK performance for > (1) and (2), and poor performance for (3). That could be viewed as a > future development task, which perhaps the window function API would > help with. I think it would be a shame to drop support for (2) just > because we can't do (3) efficiently yet. I started looking at this patch, and noticed that we got so caught up in implementation issues that we forgot the unresolved problem of data types. The original patch defined median(anyelement), which is only well-defined for an odd number of inputs; for an even number of inputs you have to take the left or right item in the central pair. I see that this version definesmedian(float8) returns float8median(float4) returns float8median(numeric) returns numericmedian(int8)returns numericmedian(int4) returns numericmedian(int2) returns numeric which strikes me as possibly being overkill. It was pointed out upthread that while median isn't presently in the standard, Oracle defines it in terms of percentile_cont(0.5) which *is* in the standard. What I read in SQL:2008 is that percentile_cont is defined for all numeric types (returning approximate numeric with implementation-defined precision), and for interval (returning interval), and not for any other input type. So it appears to me that what we ought to support ismedian(float8) returns float8median(interval) returns interval and nothing else --- we can rely on implicit casting to convert any other numeric input type to float8. BTW, as far as the implementation issues go, telling tuplesort that it can use gigabytes of memory no matter what seems quite unacceptable. Put this thing into a hash aggregation and you'll blow out your memory in no time. I don't think it's even a good idea to use work_mem there. I wonder whether it'd be a good idea to augment AggCheckCallContext() so that there's a way for aggregates to find out how much memory they ought to try to use. In a simple aggregation situation it's probably OK to use work_mem, but in a hash aggregation you'd better use less --- perhaps work_mem divided by the number of groups expected. Also, I believe that the lack-of-cleanup problem for tuplesorts spilling to disk should be fixable by using an exprcontext shutdown callback (see RegisterExprContextCallback). Comments? regards, tom lane
2010/10/10 Tom Lane <tgl@sss.pgh.pa.us>: > Dean Rasheed <dean.a.rasheed@gmail.com> writes: >> In the meantime, the attached variation of the patch fixes the temp >> file issue and will support all 3 cases. It gives OK performance for >> (1) and (2), and poor performance for (3). That could be viewed as a >> future development task, which perhaps the window function API would >> help with. I think it would be a shame to drop support for (2) just >> because we can't do (3) efficiently yet. > > I started looking at this patch, and noticed that we got so caught up > in implementation issues that we forgot the unresolved problem of data > types. The original patch defined median(anyelement), which is only > well-defined for an odd number of inputs; for an even number of inputs > you have to take the left or right item in the central pair. I see > that this version defines > median(float8) returns float8 > median(float4) returns float8 > median(numeric) returns numeric > median(int8) returns numeric > median(int4) returns numeric > median(int2) returns numeric > which strikes me as possibly being overkill. > > It was pointed out upthread that while median isn't presently > in the standard, Oracle defines it in terms of percentile_cont(0.5) > which *is* in the standard. What I read in SQL:2008 is that > percentile_cont is defined for all numeric types (returning > approximate numeric with implementation-defined precision), > and for interval (returning interval), and not for any other > input type. So it appears to me that what we ought to support > is > median(float8) returns float8 > median(interval) returns interval > and nothing else --- we can rely on implicit casting to convert > any other numeric input type to float8. +1 > > BTW, as far as the implementation issues go, telling tuplesort that it > can use gigabytes of memory no matter what seems quite unacceptable. > Put this thing into a hash aggregation and you'll blow out your memory > in no time. I don't think it's even a good idea to use work_mem there. > I wonder whether it'd be a good idea to augment AggCheckCallContext() > so that there's a way for aggregates to find out how much memory they > ought to try to use. In a simple aggregation situation it's probably > OK to use work_mem, but in a hash aggregation you'd better use less > --- perhaps work_mem divided by the number of groups expected. > > Also, I believe that the lack-of-cleanup problem for tuplesorts spilling > to disk should be fixable by using an exprcontext shutdown callback (see > RegisterExprContextCallback). this issue is only for window aggregates, and this way is block - the problem with tuplestore is not only wit cleanup, but more in performance. But using a MemoryContext shutdown callback is good idea. Regards Pavel Stehule > > Comments? > > regards, tom lane >
On 10 October 2010 22:16, Tom Lane <tgl@sss.pgh.pa.us> wrote: > It was pointed out upthread that while median isn't presently > in the standard, Oracle defines it in terms of percentile_cont(0.5) > which *is* in the standard. What I read in SQL:2008 is that > percentile_cont is defined for all numeric types (returning > approximate numeric with implementation-defined precision), > and for interval (returning interval), and not for any other > input type. So it appears to me that what we ought to support > is > median(float8) returns float8 > median(interval) returns interval > and nothing else --- we can rely on implicit casting to convert > any other numeric input type to float8. > Yeah that would be much simpler. BTW, why has percentile been removed from this patch? As the more general, and SQL standard function, that would seem to be the more useful one to include. Upthread it was mentioned that there is already an ntile window function, but actually that's a completely different thing. > BTW, as far as the implementation issues go, telling tuplesort that it > can use gigabytes of memory no matter what seems quite unacceptable. > Put this thing into a hash aggregation and you'll blow out your memory > in no time. I don't think it's even a good idea to use work_mem there. Argh! Yes that sounds like a much more serious problem. Interestingly I couldn't seem to produce this effect. Every effort I make to write a query to test this with median ends up being executed using a GroupAggregate, while the equivalent query with avg uses a HashAggregate. I don't understand why they are being treated differently. > I wonder whether it'd be a good idea to augment AggCheckCallContext() > so that there's a way for aggregates to find out how much memory they > ought to try to use. In a simple aggregation situation it's probably > OK to use work_mem, but in a hash aggregation you'd better use less > --- perhaps work_mem divided by the number of groups expected. Wouldn't that risk not allowing any memory at all the to aggregate in some cases? I don't have a better idea mind you, short of somehow not allowing hash aggregation for this function. > Also, I believe that the lack-of-cleanup problem for tuplesorts spilling > to disk should be fixable by using an exprcontext shutdown callback (see > RegisterExprContextCallback). Ah! I wasn't aware of such a callback. Sounds perfect for the job. Regards, Dean > > Comments? > > regards, tom lane >
2010/10/11 Dean Rasheed <dean.a.rasheed@gmail.com>: > On 10 October 2010 22:16, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It was pointed out upthread that while median isn't presently >> in the standard, Oracle defines it in terms of percentile_cont(0.5) >> which *is* in the standard. What I read in SQL:2008 is that >> percentile_cont is defined for all numeric types (returning >> approximate numeric with implementation-defined precision), >> and for interval (returning interval), and not for any other >> input type. So it appears to me that what we ought to support >> is >> median(float8) returns float8 >> median(interval) returns interval >> and nothing else --- we can rely on implicit casting to convert >> any other numeric input type to float8. >> > > Yeah that would be much simpler. > > BTW, why has percentile been removed from this patch? As the more > general, and SQL standard function, that would seem to be the more > useful one to include. Upthread it was mentioned that there is already > an ntile window function, but actually that's a completely different > thing. The reason for removing was impossibility to specify so some parameter must by immutable - in this case p parameter should be immutable otherwise the result is undefined. Regards Pavel Stehule > > >> BTW, as far as the implementation issues go, telling tuplesort that it >> can use gigabytes of memory no matter what seems quite unacceptable. >> Put this thing into a hash aggregation and you'll blow out your memory >> in no time. I don't think it's even a good idea to use work_mem there. > > Argh! Yes that sounds like a much more serious problem. > > Interestingly I couldn't seem to produce this effect. Every effort I > make to write a query to test this with median ends up being executed > using a GroupAggregate, while the equivalent query with avg uses a > HashAggregate. I don't understand why they are being treated > differently. > > >> I wonder whether it'd be a good idea to augment AggCheckCallContext() >> so that there's a way for aggregates to find out how much memory they >> ought to try to use. In a simple aggregation situation it's probably >> OK to use work_mem, but in a hash aggregation you'd better use less >> --- perhaps work_mem divided by the number of groups expected. > > Wouldn't that risk not allowing any memory at all the to aggregate in > some cases? I don't have a better idea mind you, short of somehow not > allowing hash aggregation for this function. > > >> Also, I believe that the lack-of-cleanup problem for tuplesorts spilling >> to disk should be fixable by using an exprcontext shutdown callback (see >> RegisterExprContextCallback). > > Ah! I wasn't aware of such a callback. Sounds perfect for the job. > > Regards, > Dean > > >> >> Comments? >> >> regards, tom lane >> >
On 11 October 2010 10:55, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> BTW, why has percentile been removed from this patch? As the more >> general, and SQL standard function, that would seem to be the more >> useful one to include. Upthread it was mentioned that there is already >> an ntile window function, but actually that's a completely different >> thing. > > The reason for removing was impossibility to specify so some parameter > must by immutable - in this case p parameter should be immutable > otherwise the result is undefined. > Could we not just make an arbitrary choice, like the last non-null value, and then document that. I can't believe that this would ever be an issue in practice. I don't think there is any way to deduce the following from behaviour from the documentation: select string_agg(i::text, repeat(',',i)) from generate_series(1,10) as g(i); string_agg -------------------------------------------------------------------1,,2,,,3,,,,4,,,,,5,,,,,,6,,,,,,,7,,,,,,,,8,,,,,,,,,9,,,,,,,,,,10 (1 row) but I don't see it as a real problem for the aggregate. Thoughts? Regards, Dean
On Sun, Oct 10, 2010 at 5:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > It was pointed out upthread that while median isn't presently > in the standard, Oracle defines it in terms of percentile_cont(0.5) > which *is* in the standard. What I read in SQL:2008 is that > percentile_cont is defined for all numeric types (returning > approximate numeric with implementation-defined precision), > and for interval (returning interval), and not for any other > input type. So it appears to me that what we ought to support > is > median(float8) returns float8 > median(interval) returns interval > and nothing else --- we can rely on implicit casting to convert > any other numeric input type to float8. Isn't there a possibility of a precision loss if numeric gets cast to float8? Should we include an explicit variant from numeric? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > On 10 October 2010 22:16, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> BTW, as far as the implementation issues go, telling tuplesort that it >> can use gigabytes of memory no matter what seems quite unacceptable. >> Put this thing into a hash aggregation and you'll blow out your memory >> in no time. �I don't think it's even a good idea to use work_mem there. > Argh! Yes that sounds like a much more serious problem. > Interestingly I couldn't seem to produce this effect. Every effort I > make to write a query to test this with median ends up being executed > using a GroupAggregate, while the equivalent query with avg uses a > HashAggregate. I don't understand why they are being treated > differently. If you're using recent sources, there's some code in count_agg_clauses() that assumes that an aggregate with transtype INTERNAL will use ALLOCSET_DEFAULT_INITSIZE (ie 8K) workspace. So that'll discourage the planner from selecting HashAggregate except for a pretty small number of groups. The problem is that there's still a whole lot of daylight between 8K and 2G, so plenty of room to go wrong. The other approach that we could take here is to replace the ALLOCSET_DEFAULT_INITSIZE hack (which is certainly no more than a hack) with some way for an aggregate to declare how much space it'll eat, or more simply to mark it as "never use in HashAgg". This was discussed earlier but it would require a significant amount of dogwork and no one was real excited about doing it. Maybe it's time to bite that bullet though. Reflecting on it, I think it'd be best to allow an agg to provide an estimation function that'd be told the input data type and expected number of rows --- even on a per-aggregate basis, a constant estimate just isn't good enough. regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Oct 10, 2010 at 5:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It was pointed out upthread that while median isn't presently >> in the standard, Oracle defines it in terms of percentile_cont(0.5) >> which *is* in the standard. �What I read in SQL:2008 is that >> percentile_cont is defined for all numeric types (returning >> approximate numeric with implementation-defined precision), >> and for interval (returning interval), and not for any other >> input type. �So it appears to me that what we ought to support >> is >> � � � �median(float8) returns float8 >> � � � �median(interval) returns interval >> and nothing else --- we can rely on implicit casting to convert >> any other numeric input type to float8. > Isn't there a possibility of a precision loss if numeric gets cast to > float8? So? The standard says "implementation-defined precision". We can define it as giving results that are good to float8. I find it hard to imagine an application for median() where that's not good enough; and what's more, the difference in comparison speed between float8 and numeric would render median() on numeric pretty useless anyway. regards, tom lane
On Mon, Oct 11, 2010 at 10:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sun, Oct 10, 2010 at 5:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> It was pointed out upthread that while median isn't presently >>> in the standard, Oracle defines it in terms of percentile_cont(0.5) >>> which *is* in the standard. What I read in SQL:2008 is that >>> percentile_cont is defined for all numeric types (returning >>> approximate numeric with implementation-defined precision), >>> and for interval (returning interval), and not for any other >>> input type. So it appears to me that what we ought to support >>> is >>> median(float8) returns float8 >>> median(interval) returns interval >>> and nothing else --- we can rely on implicit casting to convert >>> any other numeric input type to float8. > >> Isn't there a possibility of a precision loss if numeric gets cast to >> float8? > > So? The standard says "implementation-defined precision". We can > define it as giving results that are good to float8. I find it hard to > imagine an application for median() where that's not good enough; > and what's more, the difference in comparison speed between float8 and > numeric would render median() on numeric pretty useless anyway. I suppose for most applications it won't matter; I just hate losing precision, and have a deep skepticism of floats, as we've discussed previously. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 11 October 2010 15:03, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Dean Rasheed <dean.a.rasheed@gmail.com> writes: >> On 10 October 2010 22:16, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> BTW, as far as the implementation issues go, telling tuplesort that it >>> can use gigabytes of memory no matter what seems quite unacceptable. >>> Put this thing into a hash aggregation and you'll blow out your memory >>> in no time. I don't think it's even a good idea to use work_mem there. > >> Argh! Yes that sounds like a much more serious problem. > >> Interestingly I couldn't seem to produce this effect. Every effort I >> make to write a query to test this with median ends up being executed >> using a GroupAggregate, while the equivalent query with avg uses a >> HashAggregate. I don't understand why they are being treated >> differently. > > If you're using recent sources, there's some code in count_agg_clauses() > that assumes that an aggregate with transtype INTERNAL will use > ALLOCSET_DEFAULT_INITSIZE (ie 8K) workspace. So that'll discourage the > planner from selecting HashAggregate except for a pretty small number of > groups. The problem is that there's still a whole lot of daylight > between 8K and 2G, so plenty of room to go wrong. > > The other approach that we could take here is to replace the > ALLOCSET_DEFAULT_INITSIZE hack (which is certainly no more than a hack) > with some way for an aggregate to declare how much space it'll eat, > or more simply to mark it as "never use in HashAgg". This was discussed > earlier but it would require a significant amount of dogwork and no one > was real excited about doing it. Maybe it's time to bite that bullet > though. Reflecting on it, I think it'd be best to allow an agg to > provide an estimation function that'd be told the input data type and > expected number of rows --- even on a per-aggregate basis, a constant > estimate just isn't good enough. How good will that estimate of the number of rows be though? If they're coming from a SRF it could be a huge under-estimate, and you'd still risk eating all the memory, if you allowed a hash aggregate. Regards, Dean > > regards, tom lane >
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > On 11 October 2010 15:03, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Reflecting on it, I think it'd be best to allow an agg to >> provide an estimation function that'd be told the input data type and >> expected number of rows --- even on a per-aggregate basis, a constant >> estimate just isn't good enough. > How good will that estimate of the number of rows be though? It can't possibly be any worse than a hard-wired constant ;-) > If they're coming from a SRF it could be a huge under-estimate, and you'd > still risk eating all the memory, if you allowed a hash aggregate. If, for a particular aggregate, you're too chicken to ever allow hash aggregation, you could just return a very large number from the estimation hook function. I doubt that's a very useful behavior in the majority of cases though. regards, tom lane
On 11 October 2010 16:44, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Dean Rasheed <dean.a.rasheed@gmail.com> writes: >> On 11 October 2010 15:03, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Reflecting on it, I think it'd be best to allow an agg to >>> provide an estimation function that'd be told the input data type and >>> expected number of rows --- even on a per-aggregate basis, a constant >>> estimate just isn't good enough. > >> How good will that estimate of the number of rows be though? > > It can't possibly be any worse than a hard-wired constant ;-) > >> If they're coming from a SRF it could be a huge under-estimate, and you'd >> still risk eating all the memory, if you allowed a hash aggregate. > > If, for a particular aggregate, you're too chicken to ever allow hash > aggregation, you could just return a very large number from the > estimation hook function. I doubt that's a very useful behavior in the > majority of cases though. > Yeah, for median I'd be too chicken :-) set work_mem = '2MB'; explain analyse select median(i) from generate_series(1,40000) as g(i) group by i; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------HashAggregate (cost=15.00..17.50 rows=200 width=4) (actual time=229.524..287.153 rows=40000 loops=1) -> Function Scan on generate_series g (cost=0.00..10.00 rows=1000 width=4) (actual time=8.738..16.595 rows=40000 loops=1)Total runtime: 811.460 ms (3 rows) The estimate of 200 x 8K is below work_mem, so it uses a hash aggregate. In reality, each tuplesort allocates around 30K initially, so it very quickly uses over 1GB. A better estimate for the aggregate wouldn't improve this situation much. Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > The estimate of 200 x 8K is below work_mem, so it uses a hash > aggregate. In reality, each tuplesort allocates around 30K initially, > so it very quickly uses over 1GB. A better estimate for the aggregate > wouldn't improve this situation much. Sure it would: an estimate of 30K would keep the planner from using hash aggregation. regards, tom lane
On 11 October 2010 18:37, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Dean Rasheed <dean.a.rasheed@gmail.com> writes: >> The estimate of 200 x 8K is below work_mem, so it uses a hash >> aggregate. In reality, each tuplesort allocates around 30K initially, >> so it very quickly uses over 1GB. A better estimate for the aggregate >> wouldn't improve this situation much. > > Sure it would: an estimate of 30K would keep the planner from using > hash aggregation. > Not if work_mem was 10MB. Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > On 11 October 2010 18:37, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Sure it would: an estimate of 30K would keep the planner from using >> hash aggregation. > Not if work_mem was 10MB. And? If the memory requirement actually fits, you're in good shape. regards, tom lane
On 11 October 2010 18:48, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Dean Rasheed <dean.a.rasheed@gmail.com> writes: >> On 11 October 2010 18:37, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Sure it would: an estimate of 30K would keep the planner from using >>> hash aggregation. > >> Not if work_mem was 10MB. > > And? If the memory requirement actually fits, you're in good shape. > Yeah but the actual memory requirement, if it uses a hash aggregate, is over 1GB, and could easily be much higher. It's also hard to kill, because it eats up that memory so quickly. Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > On 11 October 2010 18:48, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> And? �If the memory requirement actually fits, you're in good shape. > Yeah but the actual memory requirement, if it uses a hash aggregate, > is over 1GB, and could easily be much higher. In that case the estimate of 30K per instance was wrong. You still haven't explained why this is impossible to estimate, or even particularly hard, as long as we can provide some code that knows specifically about the behavior of this aggregate. The amount of space needed to sort X amount of data is not unpredictable. regards, tom lane
On 11 October 2010 19:05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Dean Rasheed <dean.a.rasheed@gmail.com> writes: >> On 11 October 2010 18:48, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> And? If the memory requirement actually fits, you're in good shape. > >> Yeah but the actual memory requirement, if it uses a hash aggregate, >> is over 1GB, and could easily be much higher. > > In that case the estimate of 30K per instance was wrong. > You still haven't explained why this is impossible to estimate, > or even particularly hard, as long as we can provide some code that > knows specifically about the behavior of this aggregate. The amount > of space needed to sort X amount of data is not unpredictable. > The estimate that's wrong is the number of rows that the SRF is going to return. If I'm reading the plan right, the planner thinks that the aggregate is going to be called 200 times on groups of 5 rows. Actually, it's called 40000 times on groups of 1 row. Regards, Dean
On Sun, Oct 10, 2010 at 2:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > It was pointed out upthread that while median isn't presently > in the standard, Oracle defines it in terms of percentile_cont(0.5) > which *is* in the standard. Uhmm, then why don't we implement that? We could provide median() as a short-cut but percentile_cont() doesn't sound much harder to implement than median() and more general. -- greg
> Uhmm, then why don't we implement that? We could provide median() as a > short-cut but percentile_cont() doesn't sound much harder to implement > than median() and more general. I could really use percentile_cont(0.9), actually, for query response-time analysis. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Hello 2010/10/11 Greg Stark <gsstark@mit.edu>: > On Sun, Oct 10, 2010 at 2:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It was pointed out upthread that while median isn't presently >> in the standard, Oracle defines it in terms of percentile_cont(0.5) >> which *is* in the standard. > > Uhmm, then why don't we implement that? We could provide median() as a > short-cut but percentile_cont() doesn't sound much harder to implement > than median() and more general. The problem is in interface. The original patch did it, but I removed it. We cannot to unsure immutability of some parameters now. Can we enhance a AGGREGATE to allow some mark like IMMUTABLE parameter and probably we should to support ANSI syntax: PERCENTILE_CONT ( expression1 ) WITHIN GROUP ( ORDER BY expression2 [ ASC | DESC ] ) This syntax allows to divide a muttable and immutable parameters. Regards Pavel Stehule > > -- > greg >
2010/10/12 Pavel Stehule <pavel.stehule@gmail.com>: > Hello > > 2010/10/11 Greg Stark <gsstark@mit.edu>: >> On Sun, Oct 10, 2010 at 2:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> It was pointed out upthread that while median isn't presently >>> in the standard, Oracle defines it in terms of percentile_cont(0.5) >>> which *is* in the standard. >> >> Uhmm, then why don't we implement that? We could provide median() as a >> short-cut but percentile_cont() doesn't sound much harder to implement >> than median() and more general. > > The problem is in interface. The original patch did it, but I removed > it. We cannot to unsure immutability of some parameters now. Can we > enhance a AGGREGATE to allow some mark like IMMUTABLE parameter and > probably we should to support ANSI syntax: > > PERCENTILE_CONT ( expression1 ) > WITHIN GROUP ( ORDER BY expression2 [ ASC | DESC ] ) > > This syntax allows to divide a muttable and immutable parameters. If this is only a syntax sugar for mutable/immutable parameter, then I guess it's time to take it serious to implement in our syntax, although I'm not sure if it affects more execution model than interface. Regards, -- Hitoshi Harada
2010/10/12 Hitoshi Harada <umi.tanuki@gmail.com>: > 2010/10/12 Pavel Stehule <pavel.stehule@gmail.com>: >> Hello >> >> 2010/10/11 Greg Stark <gsstark@mit.edu>: >>> On Sun, Oct 10, 2010 at 2:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> It was pointed out upthread that while median isn't presently >>>> in the standard, Oracle defines it in terms of percentile_cont(0.5) >>>> which *is* in the standard. >>> >>> Uhmm, then why don't we implement that? We could provide median() as a >>> short-cut but percentile_cont() doesn't sound much harder to implement >>> than median() and more general. >> >> The problem is in interface. The original patch did it, but I removed >> it. We cannot to unsure immutability of some parameters now. Can we >> enhance a AGGREGATE to allow some mark like IMMUTABLE parameter and >> probably we should to support ANSI syntax: >> >> PERCENTILE_CONT ( expression1 ) >> WITHIN GROUP ( ORDER BY expression2 [ ASC | DESC ] ) >> >> This syntax allows to divide a muttable and immutable parameters. > > If this is only a syntax sugar for mutable/immutable parameter, then I > guess it's time to take it serious to implement in our syntax, > although I'm not sure if it affects more execution model than > interface. I though about it, the question is an interface for PL languages. There are not problem for C. Regards Pavel Stehule > > Regards, > > > > -- > Hitoshi Harada >
Hello I am looking on SQL standard for some info about "within group" clause. This clause is necessary for functions: rank, dense_rank, cume_dist, percent_rank and percentile_disc and persentile_cont. These functions needs a clause "WITHIN GROUP". If I understand, then these functions are not simple aggregates - its some between window functions and aggregates. Questions: * is clause "WITHIN GROUP" just syntactic sugar for our aggregate with ORDER BY? I am thinking so not. There are predefined set of functions that can be used with this clause. * what is correct implementation of these functions? When I am looking on parameters, these functions are very similar to window functions. So there are two two ways for implementation. Implement it as special case of window functions or implement it as special case of aggregates. Regards Pavel Stehule 2010/10/12 Hitoshi Harada <umi.tanuki@gmail.com>: > 2010/10/12 Pavel Stehule <pavel.stehule@gmail.com>: >> Hello >> >> 2010/10/11 Greg Stark <gsstark@mit.edu>: >>> On Sun, Oct 10, 2010 at 2:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> It was pointed out upthread that while median isn't presently >>>> in the standard, Oracle defines it in terms of percentile_cont(0.5) >>>> which *is* in the standard. >>> >>> Uhmm, then why don't we implement that? We could provide median() as a >>> short-cut but percentile_cont() doesn't sound much harder to implement >>> than median() and more general. >> >> The problem is in interface. The original patch did it, but I removed >> it. We cannot to unsure immutability of some parameters now. Can we >> enhance a AGGREGATE to allow some mark like IMMUTABLE parameter and >> probably we should to support ANSI syntax: >> >> PERCENTILE_CONT ( expression1 ) >> WITHIN GROUP ( ORDER BY expression2 [ ASC | DESC ] ) >> >> This syntax allows to divide a muttable and immutable parameters. > > If this is only a syntax sugar for mutable/immutable parameter, then I > guess it's time to take it serious to implement in our syntax, > although I'm not sure if it affects more execution model than > interface. > > Regards, > > > > -- > Hitoshi Harada >
On mån, 2010-10-11 at 20:46 +0200, Pavel Stehule wrote: > The problem is in interface. The original patch did it, but I removed > it. We cannot to unsure immutability of some parameters now. How about you store the "immutable" parameter in the transition state and error out if it changes between calls?
2010/10/13 Peter Eisentraut <peter_e@gmx.net>: > On mån, 2010-10-11 at 20:46 +0200, Pavel Stehule wrote: >> The problem is in interface. The original patch did it, but I removed >> it. We cannot to unsure immutability of some parameters now. > > How about you store the "immutable" parameter in the transition state > and error out if it changes between calls? > yes, it's possibility. Now I looking there and I see the as more important problem the conformance with ANSI SQL. see my last post. There can be a kind of aggregate functions based on tuplesort. Regards Pavel >
2010/10/13 Pavel Stehule <pavel.stehule@gmail.com>: > 2010/10/13 Peter Eisentraut <peter_e@gmx.net>: >> On mån, 2010-10-11 at 20:46 +0200, Pavel Stehule wrote: >>> The problem is in interface. The original patch did it, but I removed >>> it. We cannot to unsure immutability of some parameters now. >> >> How about you store the "immutable" parameter in the transition state >> and error out if it changes between calls? >> > > yes, it's possibility. Now I looking there and I see the as more > important problem the conformance with ANSI SQL. see my last post. > There can be a kind of aggregate functions based on tuplesort. more - all these functions needs to solve same problem with planner and hash agg. So maybe is time to add a flag ISTUPLESORTED to pg_proc and add solve these functions together. Pavel > > Regards > > Pavel > > >> >
On Wed, Oct 13, 2010 at 5:54 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Or to put it more bluntly - what is the "problem with planner and hash > agg" that all of these functions need to solve? And why does it need > a flag in pg_proc? Why can't't we leave it to the individual > functions to perform a sort of one is needed? > So I think that's backwards. Why is the function doing data manipulations like sorts that we usually put in the plan? Is there some some key meta information that should be flagged in pg_proc and general functionality the executor should be providing so that this general class of problems can be solved efficiently? Otherwise I fear lots of things we would expect to be efficient such as calculating the top, median, and last items in the same sort order would require three separate sorts of the same data. We have a planner capable of comparing sort orders and estimating the costs of different plans, we should be using it. -- greg
On Wed, Oct 13, 2010 at 6:56 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2010/10/13 Pavel Stehule <pavel.stehule@gmail.com>: >> 2010/10/13 Peter Eisentraut <peter_e@gmx.net>: >>> On mån, 2010-10-11 at 20:46 +0200, Pavel Stehule wrote: >>>> The problem is in interface. The original patch did it, but I removed >>>> it. We cannot to unsure immutability of some parameters now. >>> >>> How about you store the "immutable" parameter in the transition state >>> and error out if it changes between calls? >>> >> >> yes, it's possibility. Now I looking there and I see the as more >> important problem the conformance with ANSI SQL. see my last post. >> There can be a kind of aggregate functions based on tuplesort. > > more - all these functions needs to solve same problem with planner > and hash agg. So maybe is time to add a flag ISTUPLESORTED to pg_proc > and add solve these functions together. I think that the design of this patch is still sufficiently up in the air that it is not going to be practical to get it committed during the current CommitFest, which is nearly over, so I am going to mark it as Returned with Feedback. I suggest that we continue discussing it, however, so that we can get things squared away for the next CommitFest. It seems that the fundamental question here is whether median is an instance of some more general problem, or whether it's a special case; and more importantly, if it is an instance of a more general problem, what is the shape of that general problem? Or to put it more bluntly - what is the "problem with planner and hash agg" that all of these functions need to solve? And why does it need a flag in pg_proc? Why can't't we leave it to the individual functions to perform a sort of one is needed? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2010/10/14 Robert Haas <robertmhaas@gmail.com>: > On Wed, Oct 13, 2010 at 6:56 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> 2010/10/13 Pavel Stehule <pavel.stehule@gmail.com>: >>> 2010/10/13 Peter Eisentraut <peter_e@gmx.net>: >>>> On mån, 2010-10-11 at 20:46 +0200, Pavel Stehule wrote: >>>>> The problem is in interface. The original patch did it, but I removed >>>>> it. We cannot to unsure immutability of some parameters now. >>>> >>>> How about you store the "immutable" parameter in the transition state >>>> and error out if it changes between calls? >>>> >>> >>> yes, it's possibility. Now I looking there and I see the as more >>> important problem the conformance with ANSI SQL. see my last post. >>> There can be a kind of aggregate functions based on tuplesort. >> >> more - all these functions needs to solve same problem with planner >> and hash agg. So maybe is time to add a flag ISTUPLESORTED to pg_proc >> and add solve these functions together. > > I think that the design of this patch is still sufficiently up in the > air that it is not going to be practical to get it committed during > the current CommitFest, which is nearly over, so I am going to mark it > as Returned with Feedback. I suggest that we continue discussing it, > however, so that we can get things squared away for the next > CommitFest. It seems that the fundamental question here is whether > median is an instance of some more general problem, or whether it's a > special case; and more importantly, if it is an instance of a more > general problem, what is the shape of that general problem? +1 Median implemented as special case of some special sort of functions will be better. The use case of ANSI SQL functions are more general - but it needs discussion about design. I didn't find too much consistency in standard. These functions are defined individually - not as some special kind of functions. All functions from standard has a immutable parameters - but Oracle listagg function has one parameter mutable and second immutable. More we should better to solve using these functions together with window clause. I didn't find any note about using combination in standard, but minimally Oracle allows a within_group and over clauses together. On second hand - this work can be really useful. We can get a bigger conformity with ANSI SQL 200x and with other db - DB2, Oracle, MSSQL, Sybase support this feature. > > Or to put it more bluntly - what is the "problem with planner and hash > agg" that all of these functions need to solve? And why does it need > a flag in pg_proc? Why can't't we leave it to the individual > functions to perform a sort of one is needed? These functions are hungry - It takes a 30 kb (minimum tuplesort) per group. More there is relative general pattern, that can be shared - there can be minimaly 6 functions, that just fill tuplesort in iterations - so these code can be shared, tuplesort can be reseted and used respectively. And it's question if requested sort can be used in outer optimalizations. Primary thing for solving is memory usage. Regards Pavel Stehule > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
On Wed, Oct 13, 2010 at 10:37 PM, Greg Stark <gsstark@mit.edu> wrote: > On Wed, Oct 13, 2010 at 5:54 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> Or to put it more bluntly - what is the "problem with planner and hash >> agg" that all of these functions need to solve? And why does it need >> a flag in pg_proc? Why can't't we leave it to the individual >> functions to perform a sort of one is needed? >> > > So I think that's backwards. Why is the function doing data > manipulations like sorts that we usually put in the plan? Is there > some some key meta information that should be flagged in pg_proc and > general functionality the executor should be providing so that this > general class of problems can be solved efficiently? > > Otherwise I fear lots of things we would expect to be efficient such > as calculating the top, median, and last items in the same sort order > would require three separate sorts of the same data. We have a planner > capable of comparing sort orders and estimating the costs of different > plans, we should be using it. Good point. I think you're right. I'm not sure what the best design for it is, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company