Обсуждение: BUG #12228: Primary use-case of PERCENT_RANK not supported
The following bug has been logged on the website: Bug reference: 12228 Logged by: Jonathon Lachlan-Hache Email address: jonlachlan@gmail.com PostgreSQL version: 9.4beta2 Operating system: MacOSX Description: I'm using the 9.4 RC1 and trying out the WITHIN GROUP functions. I used a percentile_disc pattern successfully, however I think I came across a problem with using percent_rank(). Here is my SQL: SELECT percent_rank(datavalue) WITHIN GROUP (ORDER BY datavalue) as pct_rank, measureid FROM measuredata WHERE surveyyear=2013 GROUP BY measureid; I want to be able to run a query that programmatically displays the 'pct_rank' of datavalue. My table is organized into four columns: organizationid measureid surveyyear datavalue The 'datavalue' column is the measure, whereas all the other columns are attributes. When I run percent_rank() it needs to determine what percentile is the 'datavalue', but within the scope of the same 'measureid' and 'surveyyear'. The percent_rank() function does not appear to support this. The above query returns the following: ERROR: column "measuredata.datavalue" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT percent_rank(datavalue) WITHIN GROUP (ORDER BY datava... ^ DETAIL: Direct arguments of an ordered-set aggregate must use only grouped columns. ********** Error ********** ERROR: column "measuredata.datavalue" must appear in the GROUP BY clause or be used in an aggregate function SQL state: 42803 Detail: Direct arguments of an ordered-set aggregate must use only grouped columns. Character: 21 This doesn't make sense, because if I include 'datavalue' as a GROUP BY clause, then my group N-size is exactly 1. Basically, because the argument of percent_rank() must also be present in the GROUP BY clause, there is no way to use percent_rank() to programatically determine the percent-rank of a value within a set. And this is likely to be the primary use-case of any kind of percent-rank function, so it would make sense to include it in the new implementation. Thanks so much, Jon
jonlachlan@gmail.com writes: > I'm using the 9.4 RC1 and trying out the WITHIN GROUP functions. I used a > percentile_disc pattern successfully, however I think I came across a > problem with using percent_rank(). Here is my SQL: > SELECT percent_rank(datavalue) WITHIN GROUP (ORDER BY datavalue) as > pct_rank, measureid > FROM measuredata > WHERE surveyyear=2013 > GROUP BY measureid; That query isn't very sensible: the direct argument of percent_rank() has to be a constant over any one aggregation group, else the percentile calculation is meaningless. > I want to be able to run a query that programmatically displays the > 'pct_rank' of datavalue. I'm not 100% sure what you mean by that, but I suspect you are looking for something closer to the basic percent_rank() window function, not the hypothetical-set function. Hypothetical-set functions are for computing the measure that would be attributed to a row that's not actually present in the data. Moreover, since they're aggregates, they produce only one output per GROUP BY group, and I don't understand what you mean by "pct_rank of datavalue" at a group level. Perhaps what you want is SELECT percent_rank() OVER (PARTITION BY measureid ORDER BY datavalue) as pct_rank, measureid FROM measuredata WHERE surveyyear=2013; It could also be that what you're after will require doing a window function like that in a sub-SELECT, and then grouping and/or aggregating in the outer query. regards, tom lane
<div dir="ltr"><div class="gmail_extra">Yes, you are right that I just needed the basic window function. I must have beenconfused because I was using percentile_disc() and assumed I would be using the same WITHIN GROUP syntax. <br /><br />Iwas not clear before about the hypothetical set functions and their purpose. I have looked at the documentation at <a href="http://www.postgresql.org/docs/9.4/static/functions-aggregate.html">http://www.postgresql.org/docs/9.4/static/functions-aggregate.html</a> andnow see the difference. <br /><br /></div><div class="gmail_extra">Might it be worthwhile to include a suggestion in theerror message and/or documentation "Use of percent_rank() WITHIN GROUP is a hypothetical-set aggregate function. If youdo not intend to use a hypothetical-set aggregate function, try using a window function with PARTITION BY instead of WITHINGROUP"? Not sure if that aligns with your style guidelines, but just a thought.<br /><br /></div><div class="gmail_extra">Thanksso much for your previous response,<br /></div><div class="gmail_extra">Jon<br /></div></div>