Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]

Поиск
Список
Период
Сортировка
От Noah Misch
Тема Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]
Дата
Msg-id 20130715021512.GA23930@tornado.leadboat.com
обсуждение исходный текст
Ответ на Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]  (David Fetter <david@fetter.org>)
Ответы Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]  (David Fetter <david@fetter.org>)
Re: Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]  (David Fetter <david@fetter.org>)
Список pgsql-hackers
On Sun, Jul 07, 2013 at 06:37:26PM -0700, David Fetter wrote:
> On Sat, Jul 06, 2013 at 11:49:21AM +0100, Dean Rasheed wrote:
> > Overall I think this patch offers useful additional functionality, in
> > compliance with the SQL spec, which should be handy to simplify
> > complex grouping queries.

As I understand this feature, it is syntactic sugar for the typical case of an
aggregate with a strict transition function.  For example, "min(x) FILTER
(WHERE y > 0)" is rigorously equivalent to "min(CASE y > 0 THEN x END)".
Every SQL aggregate is strict (ISO 9075-2 section 4.15.4), so for standard
queries it is *only* syntactic sugar.  In PostgreSQL, it lets you do novel
things with, for example, array_agg().  Is that accurate?

> > I think this is ready for committer.

The patch was thorough.  I updated applicable comments, revisited some
cosmetic choices, and made these functional changes:

1. The pg_stat_statements "query jumble" should incorporate the filter.

2. The patch did not update costing.  I made it add the cost of the filter
expression the same way we add the cost of the argument expressions.  This
makes "min(x) FILTER (WHERE y > 0)" match "min(case y > 0 THEN x end)" in
terms of cost, which is a fair start.  At some point, we could do better by
reducing the argument cost by the filter selectivity.


A few choices/standard interpretations may deserve discussion.  The patch
makes filter clauses contribute to the subquery level chosen to be the
"aggregation query".  This is observable through the behavior of these two
standard-conforming queries:

select (select count(outer_c)
        from (values (1)) t0(inner_c))
from (values (2),(3)) t1(outer_c); -- outer query is aggregation query
select (select count(outer_c) filter (where inner_c <> 0)
        from (values (1)) t0(inner_c))
from (values (2),(3)) t1(outer_c); -- inner query is aggregation query

I believe SQL (ISO 9075-2 section 6.9 SR 3,4,6) does require this.  Since that
still isn't crystal-clear to me, I mention it in case anyone has a different
reading.

Distinct from that, albeit in a similar vein, SQL does not permit outer
references in a filter clause.  This patch permits them; I think that
qualifies as a reasonable PostgreSQL extension.

> --- a/doc/src/sgml/keywords.sgml
> +++ b/doc/src/sgml/keywords.sgml

> @@ -3200,7 +3200,7 @@
>     </row>
>     <row>
>      <entry><token>OVER</token></entry>
> -    <entry>reserved (can be function or type)</entry>
> +    <entry>non-reserved</entry>

I committed this one-line correction separately.

> --- a/src/backend/optimizer/plan/planagg.c
> +++ b/src/backend/optimizer/plan/planagg.c
> @@ -314,7 +314,7 @@ find_minmax_aggs_walker(Node *node, List **context)
>          ListCell   *l;
>
>          Assert(aggref->agglevelsup == 0);
> -        if (list_length(aggref->args) != 1 || aggref->aggorder != NIL)
> +        if (list_length(aggref->args) != 1 || aggref->aggorder != NIL || aggref->agg_filter != NULL)
>              return true;        /* it couldn't be MIN/MAX */
>          /* note: we do not care if DISTINCT is mentioned ... */

I twitched upon reading this, because neither ORDER BY nor FILTER preclude the
aggregate being MIN or MAX.  Perhaps Andrew can explain why he put aggorder
there back in 2009.  All I can figure is that writing max(c ORDER BY x) is so
unlikely that we'd too often waste the next syscache lookup.  But the same
argument would apply to DISTINCT.  With FILTER, the rationale is entirely
different.  The aggregate could well be MIN/MAX; we just haven't implemented
the necessary support elsewhere in this file.


See attached patch revisions.  The first patch edits find_minmax_aggs_walker()
per my comments just now.  The second is an update of your FILTER patch with
the changes to which I alluded above; it applies atop the first patch.  Would
you verify that I didn't ruin anything?  Barring problems, will commit.

Are you the sole named author of this patch?  That's what the CF page says,
but that wasn't fully clear to me from the list discussions.

Thanks,
nm

--
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: review: Non-recursive processing of AND/OR lists
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Eliminating PD_ALL_VISIBLE, take 2