Re: WITHIN GROUP patch

Поиск
Список
Период
Сортировка
От David Johnston
Тема Re: WITHIN GROUP patch
Дата
Msg-id 1386297738251-5782041.post@n5.nabble.com
обсуждение исходный текст
Ответ на Re: WITHIN GROUP patch  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: WITHIN GROUP patch
Список pgsql-hackers
Tom Lane-2 wrote
> Further questions about WITHIN GROUP:
> 
> I believe that the spec requires that the "direct" arguments of an inverse
> or hypothetical-set aggregate must not contain any Vars of the current
> query level.  They don't manage to say that in plain English, of course,
> but in the 
> <hypothetical set function>
>  case the behavior is defined in
> terms of this sub-select:
> 
>     ( SELECT 0, SK1, ..., SKK
>       FROM TNAME UNION ALL
>       VALUES (1, VE1, ..., VEK) )
> 
> where SKn are the sort key values, TNAME is an alias for the input table,
> and VEn are the direct arguments.  Any input-table Vars the aggregate
> might refer to are thus in scope only for the SKn not the VEn.  (This
> definition also makes it clear that the VEn are to be evaluated once,
> not once per row.)  In the 
> <inverse distribution function>
>  case, they
> resort to claiming that the value of the 
> <inverse distribution function
> argument>
>  can be replaced by a numeric literal, which again makes it clear
> that it's supposed to be evaluated just once.
> 
> So that's all fine, but the patch seems a bit confused about it.  If you
> try to cheat, you get an error message that's less than apropos:
> 
> regression=# select cume_dist(f1) within group(order by f1) from text_tbl
> ;
> ERROR:  column "text_tbl.f1" must appear in the GROUP BY clause or be used
> in an aggregate function
> LINE 1: select cume_dist(f1) within group(order by f1) from text_tbl...
>                          ^
> 
> I'd have hoped for a message along the line of "fixed arguments of
> cume_dist() must not contain variables", similar to the phrasing we
> use when you try to put a variable in LIMIT.
> 
> Also, there are some comments and code changes in nodeAgg.c that seem
> to be on the wrong page here:
>   
> +       /*
> +        * Use the representative input tuple for any references to
> +        * non-aggregated input columns in the qual and tlist.    (If we
> are not
> +        * grouping, and there are no input rows at all, we will come here
> +        * with an empty firstSlot ... but if not grouping, there can't be
> any
> +        * references to non-aggregated input columns, so no problem.)
> +        * We do this before finalizing because for ordered set functions,
> +        * finalize_aggregates can evaluate arguments referencing the
> tuple.
> +        */
> +       econtext->ecxt_outertuple = firstSlot;
> + 
> 
> The last two lines of that comment are new, all the rest was moved from
> someplace else.  Those last two lines are wrong according to the above
> reasoning, and if they were correct the argument made in the pre-existing
> part of the comment would be all wet, meaning the code could in fact crash
> when trying to evaluate a Var reference in finalize_aggregates.
> 
> So I'm inclined to undo this part of the patch (and anything else that's
> rearranging logic with an eye to Var references in finalize_aggregates),
> and to try to fix parse_agg.c so it gives a more specific error message
> for this case.

The original design references the spec as allowing a column reference if it
is a group by key:

Select cume_dist(f1) within group ( order by f1 ) from text_tbl group by f1

No example was shown where this would be useful...but nonetheless the
comment and error both presume that such is true.

I would presume the implementation would only supply rows for sorting from
the single group in question so for practical purposes the columns have a
constant value for the entirety of the evaluation and so this makes sense
and acts just like partition by on a window aggregate.

Question, are there any tests that exercise this desired behavior?  That
assuming agreement can be had that the group by behavior is indeed spec or
something custom we want to support.

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/Re-WITHIN-GROUP-patch-tp5773851p5782041.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Joe Conway
Дата:
Сообщение: dblink performance regression
Следующее
От: Tom Lane
Дата:
Сообщение: Re: dblink performance regression