Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]
От | Noah Misch |
---|---|
Тема | Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division] |
Дата | |
Msg-id | 20130715021512.GA23930@tornado.leadboat.com обсуждение исходный текст |
Ответ на | Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division] (David Fetter <david@fetter.org>) |
Ответы |
Re: FILTER for aggregates [was Re: Department of Redundancy
Department: makeNode(FuncCall) division]
(David Fetter <david@fetter.org>)
Re: Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division] (David Fetter <david@fetter.org>) |
Список | pgsql-hackers |
On Sun, Jul 07, 2013 at 06:37:26PM -0700, David Fetter wrote: > On Sat, Jul 06, 2013 at 11:49:21AM +0100, Dean Rasheed wrote: > > Overall I think this patch offers useful additional functionality, in > > compliance with the SQL spec, which should be handy to simplify > > complex grouping queries. As I understand this feature, it is syntactic sugar for the typical case of an aggregate with a strict transition function. For example, "min(x) FILTER (WHERE y > 0)" is rigorously equivalent to "min(CASE y > 0 THEN x END)". Every SQL aggregate is strict (ISO 9075-2 section 4.15.4), so for standard queries it is *only* syntactic sugar. In PostgreSQL, it lets you do novel things with, for example, array_agg(). Is that accurate? > > I think this is ready for committer. The patch was thorough. I updated applicable comments, revisited some cosmetic choices, and made these functional changes: 1. The pg_stat_statements "query jumble" should incorporate the filter. 2. The patch did not update costing. I made it add the cost of the filter expression the same way we add the cost of the argument expressions. This makes "min(x) FILTER (WHERE y > 0)" match "min(case y > 0 THEN x end)" in terms of cost, which is a fair start. At some point, we could do better by reducing the argument cost by the filter selectivity. A few choices/standard interpretations may deserve discussion. The patch makes filter clauses contribute to the subquery level chosen to be the "aggregation query". This is observable through the behavior of these two standard-conforming queries: select (select count(outer_c) from (values (1)) t0(inner_c)) from (values (2),(3)) t1(outer_c); -- outer query is aggregation query select (select count(outer_c) filter (where inner_c <> 0) from (values (1)) t0(inner_c)) from (values (2),(3)) t1(outer_c); -- inner query is aggregation query I believe SQL (ISO 9075-2 section 6.9 SR 3,4,6) does require this. Since that still isn't crystal-clear to me, I mention it in case anyone has a different reading. Distinct from that, albeit in a similar vein, SQL does not permit outer references in a filter clause. This patch permits them; I think that qualifies as a reasonable PostgreSQL extension. > --- a/doc/src/sgml/keywords.sgml > +++ b/doc/src/sgml/keywords.sgml > @@ -3200,7 +3200,7 @@ > </row> > <row> > <entry><token>OVER</token></entry> > - <entry>reserved (can be function or type)</entry> > + <entry>non-reserved</entry> I committed this one-line correction separately. > --- a/src/backend/optimizer/plan/planagg.c > +++ b/src/backend/optimizer/plan/planagg.c > @@ -314,7 +314,7 @@ find_minmax_aggs_walker(Node *node, List **context) > ListCell *l; > > Assert(aggref->agglevelsup == 0); > - if (list_length(aggref->args) != 1 || aggref->aggorder != NIL) > + if (list_length(aggref->args) != 1 || aggref->aggorder != NIL || aggref->agg_filter != NULL) > return true; /* it couldn't be MIN/MAX */ > /* note: we do not care if DISTINCT is mentioned ... */ I twitched upon reading this, because neither ORDER BY nor FILTER preclude the aggregate being MIN or MAX. Perhaps Andrew can explain why he put aggorder there back in 2009. All I can figure is that writing max(c ORDER BY x) is so unlikely that we'd too often waste the next syscache lookup. But the same argument would apply to DISTINCT. With FILTER, the rationale is entirely different. The aggregate could well be MIN/MAX; we just haven't implemented the necessary support elsewhere in this file. See attached patch revisions. The first patch edits find_minmax_aggs_walker() per my comments just now. The second is an update of your FILTER patch with the changes to which I alluded above; it applies atop the first patch. Would you verify that I didn't ruin anything? Barring problems, will commit. Are you the sole named author of this patch? That's what the CF page says, but that wasn't fully clear to me from the list discussions. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com
Вложения
В списке pgsql-hackers по дате отправления: