Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]

Поиск
Список
Период
Сортировка
От David Fetter
Тема Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]
Дата
Msg-id 20130715032051.GB9844@fetter.org
обсуждение исходный текст
Ответ на Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]  (Noah Misch <noah@leadboat.com>)
Список pgsql-hackers
On Sun, Jul 14, 2013 at 10:15:12PM -0400, Noah Misch wrote:
> On Sun, Jul 07, 2013 at 06:37:26PM -0700, David Fetter wrote:
> > On Sat, Jul 06, 2013 at 11:49:21AM +0100, Dean Rasheed wrote:
> > > Overall I think this patch offers useful additional functionality, in
> > > compliance with the SQL spec, which should be handy to simplify
> > > complex grouping queries.
> 
> As I understand this feature, it is syntactic sugar for the typical case of an
> aggregate with a strict transition function.  For example, "min(x) FILTER
> (WHERE y > 0)" is rigorously equivalent to "min(CASE y > 0 THEN x END)".
> Every SQL aggregate is strict (ISO 9075-2 section 4.15.4), so for standard
> queries it is *only* syntactic sugar.  In PostgreSQL, it lets you do novel
> things with, for example, array_agg().  Is that accurate?
> 
> > > I think this is ready for committer.
> 
> The patch was thorough.  I updated applicable comments, revisited some
> cosmetic choices, and made these functional changes:
> 
> 1. The pg_stat_statements "query jumble" should incorporate the filter.
> 
> 2. The patch did not update costing.  I made it add the cost of the filter
> expression the same way we add the cost of the argument expressions.  This
> makes "min(x) FILTER (WHERE y > 0)" match "min(case y > 0 THEN x end)" in
> terms of cost, which is a fair start.  At some point, we could do better by
> reducing the argument cost by the filter selectivity.

Thanks!

> A few choices/standard interpretations may deserve discussion.  The patch
> makes filter clauses contribute to the subquery level chosen to be the
> "aggregation query".  This is observable through the behavior of these two
> standard-conforming queries:
> 
> select (select count(outer_c)
>         from (values (1)) t0(inner_c))
> from (values (2),(3)) t1(outer_c); -- outer query is aggregation query
> select (select count(outer_c) filter (where inner_c <> 0)
>         from (values (1)) t0(inner_c))
> from (values (2),(3)) t1(outer_c); -- inner query is aggregation query
> 
> I believe SQL (ISO 9075-2 section 6.9 SR 3,4,6) does require this.  Since that
> still isn't crystal-clear to me, I mention it in case anyone has a different
> reading.
> 
> Distinct from that, albeit in a similar vein, SQL does not permit outer
> references in a filter clause.  This patch permits them; I think that
> qualifies as a reasonable PostgreSQL extension.

Thanks again.

> > --- a/doc/src/sgml/keywords.sgml
> > +++ b/doc/src/sgml/keywords.sgml
> 
> > @@ -3200,7 +3200,7 @@
> >     </row>
> >     <row>
> >      <entry><token>OVER</token></entry>
> > -    <entry>reserved (can be function or type)</entry>
> > +    <entry>non-reserved</entry>
> 
> I committed this one-line correction separately.
> 
> > --- a/src/backend/optimizer/plan/planagg.c
> > +++ b/src/backend/optimizer/plan/planagg.c
> > @@ -314,7 +314,7 @@ find_minmax_aggs_walker(Node *node, List **context)
> >          ListCell   *l;
> >  
> >          Assert(aggref->agglevelsup == 0);
> > -        if (list_length(aggref->args) != 1 || aggref->aggorder != NIL)
> > +        if (list_length(aggref->args) != 1 || aggref->aggorder != NIL || aggref->agg_filter != NULL)
> >              return true;        /* it couldn't be MIN/MAX */
> >          /* note: we do not care if DISTINCT is mentioned ... */
> 
> I twitched upon reading this, because neither ORDER BY nor FILTER preclude the
> aggregate being MIN or MAX.  Perhaps Andrew can explain why he put aggorder
> there back in 2009.  All I can figure is that writing max(c ORDER BY x) is so
> unlikely that we'd too often waste the next syscache lookup.  But the same
> argument would apply to DISTINCT.  With FILTER, the rationale is entirely
> different.  The aggregate could well be MIN/MAX; we just haven't implemented
> the necessary support elsewhere in this file.

Excellent reasoning, and good catch.

> See attached patch revisions.  The first patch edits
> find_minmax_aggs_walker() per my comments just now.  The second is
> an update of your FILTER patch with the changes to which I alluded
> above; it applies atop the first patch.  Would you verify that I
> didn't ruin anything?  Barring problems, will commit.
> 
> Are you the sole named author of this patch?  That's what the CF
> page says, but that wasn't fully clear to me from the list
> discussions.

While Andrew's help was invaluable and pervasive, I wrote the patch,
and all flaws in it are my fault.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Add regression tests for COLLATE
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Proposal - Support for National Characters functionality