Обсуждение: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquerypullup

Поиск
Список
Период
Сортировка

[BUGS] Improper const-evaluation of HAVING with grouping sets and subquerypullup

От
Heikki Linnakangas
Дата:
This query produces an incorrect result:

regression=# select four, x
   from (select four, ten, 'foo'::text as x from tenk1 ) as t
   group by grouping sets(four, x) having x = 'foo' order by four;
  four |  x
------+-----
     0 |
     1 |
     2 |
     3 |
       | foo
(5 rows)

The "having x = 'foo'" clause should've filtered out the rows where x is 
NULL, leaving only the last row as the result. Even though x is a 
constant 'foo' in the subquery, HAVING clause is supposed to be 
evaluated after grouping. What happens is that subquery pullup replaces 
x with the constant, and the "'foo' = 'foo'" qual is later 
const-evaluated to true.

I propose the attached patch to fix that. It forces the use of 
PlaceHolderVars in subquery pullup, if the parent query has grouping 
sets and HAVING. I'm not 100% sure that's the right approach or a misuse 
of the placeholder system, so comments welcome. At first, I tried to set 
wrap_non_vars=true only when processing the havingQual, so that 
placeholders would only be there. But that didn't work out, I think 
because grouping sets planning would then put both the Const, and the 
PlaceHolderVar for the Const, in the Agg's targetlist, but only one of 
them would be set to NULL when doing the grouping.

Another thing is that the check could be made much tighter, so that 
PlaceHolderVars were only used for expressions actually used in the 
HAVING. But it didn't seem worth the trouble to me.

- Heikki

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Вложения

Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup

От
Tom Lane
Дата:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> This query produces an incorrect result:
> regression=# select four, x
>    from (select four, ten, 'foo'::text as x from tenk1 ) as t
>    group by grouping sets(four, x) having x = 'foo' order by four;

> The "having x = 'foo'" clause should've filtered out the rows where x is 
> NULL, leaving only the last row as the result. Even though x is a 
> constant 'foo' in the subquery, HAVING clause is supposed to be 
> evaluated after grouping. What happens is that subquery pullup replaces 
> x with the constant, and the "'foo' = 'foo'" qual is later 
> const-evaluated to true.

Ouch.

> I propose the attached patch to fix that. It forces the use of 
> PlaceHolderVars in subquery pullup, if the parent query has grouping 
> sets and HAVING. I'm not 100% sure that's the right approach or a misuse 
> of the placeholder system, so comments welcome.

Seems like the point is that grouping sets can inject null values of
columns, in more or less the same way that outer joins can.  So it
seems like using PlaceHolderVars, which were invented to account
for that property of outer joins, is a reasonable approach to a fix.
I don't have time to review the patch in detail right now though;
do you want to put it in the CF queue?

One thing I'm wondering is why only the HAVING clause would be subject
to the problem.  I'm a bit surprised that the "x" in the targetlist
didn't become a constant as well.  This may be pointing to some
klugery in the GROUPING SETS patch that we could clean up if we
use placeholders for this.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup

От
Andrew Gierth
Дата:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> One thing I'm wondering is why only the HAVING clause would beTom> subject to the problem. I'm a bit surprised
thatthe "x" in theTom> targetlist didn't become a constant as well. This may be pointingTom> to some klugery in the
GROUPINGSETS patch that we could clean upTom> if we use placeholders for this.
 

This shows that the problem can extend to the targetlist too:

select four, x || 'x' from (select four, ten, 'foo'::text as x from tenk1 ) as tgroup by grouping sets(four, x);
four | ?column? 
------+----------   3 | foox   0 | foox   1 | foox   2 | foox     | foox
(5 rows)

What seems to happen in the original case is that the 'foo' constant
ends up in the projection of the input to the aggregate node, with a Var
in the output.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup

От
Andrew Gierth
Дата:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>> I propose the attached patch to fix that. It forces the use of>> PlaceHolderVars in subquery pullup, if the parent
queryhas grouping>> sets and HAVING. I'm not 100% sure that's the right approach or a>> misuse of the placeholder
system,so comments welcome.
 

I've been testing Heikki's patch with the havingQual condition removed,
and I haven't found any issues yet.
Tom> One thing I'm wondering is why only the HAVING clause would beTom> subject to the problem. I'm a bit surprised
thatthe "x" in theTom> targetlist didn't become a constant as well. This may be pointingTom> to some klugery in the
GROUPINGSETS patch that we could clean upTom> if we use placeholders for this.
 

As far as I can tell, the "x" _does_ become a constant, but then in
setrefs, because it's still labelled with a sortgroupref, it gets
replaced by a Var again. I don't recall touching any of that in the GS
work, because it was already like that for plain GROUP BY.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup

От
Andrew Gierth
Дата:
>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
Andrew> I've been testing Heikki's patch with the havingQual conditionAndrew> removed, and I haven't found any issues
yet.

Further experimentation reveals that this change has the effect of
blocking constant-folding in aggregate input expressions that refer to
constant outputs of the subquery.

Is it worth doing anything about that?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup

От
Tom Lane
Дата:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> Further experimentation reveals that this change has the effect of
> blocking constant-folding in aggregate input expressions that refer to
> constant outputs of the subquery.
> Is it worth doing anything about that?

Uh ... but I thought the point here is that the outputs aren't really
constant in the presence of grouping sets.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup

От
Andrew Gierth
Дата:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>> Further experimentation reveals that this change has the effect of>> blocking constant-folding in aggregate input
expressionsthat refer>> to constant outputs of the subquery. Is it worth doing anything>> about that?
 
Tom> Uh ... but I thought the point here is that the outputs aren'tTom> really constant in the presence of grouping
sets.

select x, y, sum(z) from (select 1 as x, 2 as y, 3 as z) sgroup by grouping sets (x,y);

x and y aren't constants, but z is.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup

От
Tom Lane
Дата:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> Uh ... but I thought the point here is that the outputs aren't
>  Tom> really constant in the presence of grouping sets.

> select x, y, sum(z) from (select 1 as x, 2 as y, 3 as z) s
>  group by grouping sets (x,y);

> x and y aren't constants, but z is.

OK, but that just means we should put PHV wrapping around only the
grouping-set columns.

BTW, also need to think about GS expressions, eg

select x+y, sum(z) from (select 1 as x, 2 as y, 3 as z) sgroup by grouping sets (x+y);

Not real sure what needs to happen here.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup

От
Andrew Gierth
Дата:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>> x and y aren't constants, but z is.
Tom> OK, but that just means we should put PHV wrapping around only theTom> grouping-set columns.

Well, can we also take advantage of the fact that we know that anything
that's not in the grouping-set columns must be in an aggregate argument,
and just omit the PHV inside aggregate args? (And even if grouping
columns appear inside aggregate args, they are _not_ nulled out there.)
Tom> BTW, also need to think about GS expressions, eg
Tom> select x+y, sum(z) from (select 1 as x, 2 as y, 3 as z) sTom>  group by grouping sets (x+y);
Tom> Not real sure what needs to happen here.

That one currently works (note you have to add another grouping set to
test it, since the case of exactly one grouping set is reduced to plain
GROUP BY) because setrefs fixes up the reference after-the-fact,
replacing the outer x+y (or whatever it got munged to) with a Var based
on matching the sortgroupref. This currently fails:

select (x+y)*1, sum(z) from (select 1 as x, 2 as y, 3 as z) sgroup by grouping sets (x+y, x);

because the logic in setrefs that would normally detect that (x+y)
exists in the child tlist doesn't fire because the whole expression was
replaced by a constant.

With the patch to use PHVs it works, but I admit to some confusion over
exactly why.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Improper const-evaluation of HAVING with grouping sets andsubquery pullup

От
Heikki Linnakangas
Дата:
Here's another interesting case, without any subqueries:

postgres=# SELECT g as newalias1, g as newalias3
FROM generate_series(1,3) g
GROUP BY newalias1, ROLLUP(newalias3); newalias1 | newalias3
-----------+-----------         1 |         1         3 |         3         2 |         2         2 |         2
3|         3         1 |         1
 
(6 rows)

Why are there no "summary" rows with NULLs, despite the ROLLUP? If you 
replace one of the g's with (g+0), you get the expected result:

postgres=# SELECT g as newalias1, (g+0) as newalias3
FROM generate_series(1,3) g
GROUP BY newalias1, ROLLUP(newalias3); newalias1 | newalias3
-----------+-----------         1 |         1         3 |         3         2 |         2         2 |         3 |
 1 |
 
(6 rows)

- Heikki


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup

От
Andrew Gierth
Дата:
>>>>> "Heikki" == Heikki Linnakangas <hlinnaka@iki.fi> writes:
Heikki> Here's another interesting case, without any subqueries:Heikki> postgres=# SELECT g as newalias1, g as
newalias3Heikki>FROM generate_series(1,3) gHeikki> GROUP BY newalias1, ROLLUP(newalias3);Heikki>  newalias1 |
newalias3Heikki>-----------+-----------Heikki>          1 |         1Heikki>          3 |         3Heikki>          2 |
       2Heikki>          2 |         2Heikki>          3 |         3Heikki>          1 |         1Heikki> (6 rows)
 
Heikki> Why are there no "summary" rows with NULLs, despite the ROLLUP?

To my knowledge this is the correct result. (Though neither version of
the query is legal per the SQL spec; allowing expressions and aliases in
GROUP BY are nonstandard extensions.)

Here's why it happens: after substituting for the aliases, you have

GROUP BY g, rollup(g)

which is equivalent to

GROUP BY GROUPING SETS ((g,g), (g))

which is equivalent to

GROUP BY GROUPING SETS ((g), (g))

because duplicate terms within a single grouping set are redundant just
as they are in GROUP BY.
Heikki> If you replace one of the g's with (g+0), you get the expectedHeikki> result:

Well, in this case the terms in the grouping set are no longer
duplicate; the expansion becomes

GROUP BY GROUPING SETS ((g,(g+0)), (g))

and therefore the (g+0) expression becomes null for one of the resulting
sets.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup

От
Andrew Gierth
Дата:
>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
Andrew> To my knowledge this is the correct result. (Though neitherAndrew> version of the query is legal per the SQL
spec;allowingAndrew> expressions and aliases in GROUP BY are nonstandardAndrew> extensions.)
 

And neither Oracle nor MSSQL (at least the versions available on
sqlfiddle) allow either query, so there's nothing I can compare against.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup

От
Tom Lane
Дата:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Heikki" == Heikki Linnakangas <hlinnaka@iki.fi> writes:
>  Heikki> Why are there no "summary" rows with NULLs, despite the ROLLUP?

> Here's why it happens: after substituting for the aliases, you have
> GROUP BY g, rollup(g)
> which is equivalent to
> GROUP BY GROUPING SETS ((g,g), (g))
> which is equivalent to
> GROUP BY GROUPING SETS ((g), (g))

I don't think I buy this explanation, because the plan tree doesn't show
any indication that we're actually folding (g,g) to (g):

regression=# EXPLAIN SELECT g as newalias1, g as newalias3
FROM generate_series(1,3) g
GROUP BY newalias1, ROLLUP(newalias3);                                  QUERY PLAN
--------------------------------------------------------------------------------HashAggregate  (cost=15.00..21.50
rows=400width=8)  Hash Key: g, g  Hash Key: g  ->  Function Scan on generate_series g  (cost=0.00..10.00 rows=1000
width=8)
(4 rows)

regression=# EXPLAIN SELECT g as newalias1, g+0 as newalias3
FROM generate_series(1,3) g
GROUP BY newalias1, ROLLUP(newalias3);                                  QUERY PLAN
--------------------------------------------------------------------------------HashAggregate  (cost=17.50..25.00
rows=400width=8)  Hash Key: g, (g + 0)  Hash Key: g  ->  Function Scan on generate_series g  (cost=0.00..12.50
rows=1000width=8) 
(4 rows)

If these behave differently, why does the plan structure look the same?

I think that Heikki's expectation is the correct one, and the reason the
output looks the way it does is that setrefs.c is dropping the ball
somehow and confusing the two "g" references.  The finished plan has two
identical Var references in the Agg node's output tlist:
      :targetlist (         {TARGETENTRY          :expr             {VAR             :varno 65001             :varattno
1            :vartype 23             :vartypmod -1             :varcollid 0             :varlevelsup 0
:varnoold1             :varoattno 1             :location 7            }         :resno 1          :resname newalias1
      :ressortgroupref 1          :resorigtbl 0          :resorigcol 0          :resjunk false         }
{TARGETENTRY         :expr             {VAR             :varno 65001             :varattno 1             :vartype 23
        :vartypmod -1             :varcollid 0             :varlevelsup 0             :varnoold 1
:varoattno1             :location 23            }         :resno 2          :resname newalias3
:ressortgroupref2          :resorigtbl 0          :resorigcol 0          :resjunk false         }      ) 

The two "g" values are correctly distinguished in the FunctionScan's
tlist, however:
         :targetlist (            {TARGETENTRY             :expr                {VAR                :varno 1
   :varattno 1                :vartype 23                :vartypmod -1                :varcollid 0
:varlevelsup0                :varnoold 1                :varoattno 1                :location 7               }
  :resno 1             :resname <>             :ressortgroupref 1             :resorigtbl 0             :resorigcol 0
         :resjunk false            }            {TARGETENTRY             :expr                {VAR
:varno1                :varattno 1                :vartype 23                :vartypmod -1                :varcollid 0
             :varlevelsup 0                :varnoold 1                :varoattno 1                :location 23
    }            :resno 2             :resname <>             :ressortgroupref 2             :resorigtbl 0
:resorigcol0             :resjunk false            }         ) 

We should have used ressortgroupref matching to prevent this, but without
having checked the code right now, I think that we might only apply that
logic to non-Var tlist entries.  If the Agg output tlist had mentioned
column 2 not column 1 of the child node, I bet we'd get the right answer.

Digression: this seems like another example in which the "same" Var can
represent two different values.  I've had a bee in my bonnet for awhile
that we need to stop doing that, but I'm not entirely sure what to do
instead.  In the case of Vars that might go to null because of an outer
join, we could perhaps fix things by not smashing join alias Vars to their
referents (at least, not till much much later in the planner than we do
now).  That doesn't seem to apply here though.  Perhaps the GROUP BY
operation ought to be understood as providing its own set of output Vars?
That would mean creating an RTE to represent GROUP BY, but maybe that's
not awful.

In the nearer term, maybe we can get a back-patchable fix by being
more careful about ressortgroupref matching.
        regards, tom lane


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup

От
Tom Lane
Дата:
I wrote:
> I think that Heikki's expectation is the correct one, and the reason the
> output looks the way it does is that setrefs.c is dropping the ball
> somehow and confusing the two "g" references.  ...
> We should have used ressortgroupref matching to prevent this, but without
> having checked the code right now, I think that we might only apply that
> logic to non-Var tlist entries.  If the Agg output tlist had mentioned
> column 2 not column 1 of the child node, I bet we'd get the right answer.

Indeed, the attached patch passes all regression tests and produces the
same answers for both of Heikki's examples:

regression=# SELECT g as newalias1, g as newalias3
FROM generate_series(1,3) g
GROUP BY newalias1, ROLLUP(newalias3);
 newalias1 | newalias3 
-----------+-----------
         1 |         1
         3 |         3
         2 |         2
         2 |          
         3 |          
         1 |          
(6 rows)


            regards, tom lane

diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1382b67..9aeaf87 100644
*** a/src/backend/optimizer/plan/setrefs.c
--- b/src/backend/optimizer/plan/setrefs.c
*************** set_upper_references(PlannerInfo *root,
*** 1744,1751 ****
          TargetEntry *tle = (TargetEntry *) lfirst(l);
          Node       *newexpr;

!         /* If it's a non-Var sort/group item, first try to match by sortref */
!         if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
          {
              newexpr = (Node *)
                  search_indexed_tlist_for_sortgroupref(tle->expr,
--- 1744,1751 ----
          TargetEntry *tle = (TargetEntry *) lfirst(l);
          Node       *newexpr;

!         /* If it's a sort/group item, first try to match by sortref */
!         if (tle->ressortgroupref != 0)
          {
              newexpr = (Node *)
                  search_indexed_tlist_for_sortgroupref(tle->expr,
*************** search_indexed_tlist_for_non_var(Expr *n
*** 2113,2119 ****

  /*
   * search_indexed_tlist_for_sortgroupref --- find a sort/group expression
-  *        (which is assumed not to be just a Var)
   *
   * If a match is found, return a Var constructed to reference the tlist item.
   * If no match, return NULL.
--- 2113,2118 ----

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup

От
Andrew Gierth
Дата:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> I don't think I buy this explanation, because the plan treeTom> doesn't show any indication that we're actually
folding(g,g) toTom> (g):
 

Huh. Yes, you're right.
Tom> Digression: this seems like another example in which the "same"Tom> Var can represent two different values.

At one point in the evolution of the GS patch it had a new node type,
GroupedVar or some such name, to represent the possibly-nulled-out
values. I'm not sure that it was being introduced early enough in
planning to have prevented this problem (I suspect not, I'd have to dig
up the old code). That stuff was all ripped out very late in the
development process because as I recall, both you and Andres disliked
it.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup

От
Tom Lane
Дата:
I wrote:
>> We should have used ressortgroupref matching to prevent this, but without
>> having checked the code right now, I think that we might only apply that
>> logic to non-Var tlist entries.  If the Agg output tlist had mentioned
>> column 2 not column 1 of the child node, I bet we'd get the right answer.

> Indeed, the attached patch passes all regression tests and produces the
> same answers for both of Heikki's examples:

I did some back-porting work on this.  The patch applies back to 9.5
where grouping sets were introduced, but it only fixes the problem
in 9.6 and later --- in 9.5, you still get the wrong output :-(.

Bisecting says the behavior changed at commit 3fc6e2d7f "Make the upper
part of the planner work by generating and comparing Paths".  Ugh.
We are certainly not back-patching that into 9.5, and I'm disinclined
even to try to identify exactly why that commit changed the behavior.

Given that this is such a weird corner case, and we've not heard
complaints from actual users about it, I'm inclined just to apply
the patch in 9.6 and up and call it good.

            regards, tom lane

diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1382b67..fa9a3f0 100644
*** a/src/backend/optimizer/plan/setrefs.c
--- b/src/backend/optimizer/plan/setrefs.c
*************** set_upper_references(PlannerInfo *root,
*** 1744,1751 ****
          TargetEntry *tle = (TargetEntry *) lfirst(l);
          Node       *newexpr;

!         /* If it's a non-Var sort/group item, first try to match by sortref */
!         if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
          {
              newexpr = (Node *)
                  search_indexed_tlist_for_sortgroupref(tle->expr,
--- 1744,1751 ----
          TargetEntry *tle = (TargetEntry *) lfirst(l);
          Node       *newexpr;

!         /* If it's a sort/group item, first try to match by sortref */
!         if (tle->ressortgroupref != 0)
          {
              newexpr = (Node *)
                  search_indexed_tlist_for_sortgroupref(tle->expr,
*************** search_indexed_tlist_for_non_var(Expr *n
*** 2113,2119 ****

  /*
   * search_indexed_tlist_for_sortgroupref --- find a sort/group expression
-  *        (which is assumed not to be just a Var)
   *
   * If a match is found, return a Var constructed to reference the tlist item.
   * If no match, return NULL.
--- 2113,2118 ----
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index fd618af..833d515 100644
*** a/src/test/regress/expected/groupingsets.out
--- b/src/test/regress/expected/groupingsets.out
*************** select a, d, grouping(a,b,c)
*** 360,365 ****
--- 360,394 ----
   2 | 2 |        2
  (4 rows)

+ -- check that distinct grouping columns are kept separate
+ -- even if they are equal()
+ explain (costs off)
+ select g as alias1, g as alias2
+   from generate_series(1,3) g
+  group by alias1, rollup(alias2);
+                    QUERY PLAN
+ ------------------------------------------------
+  GroupAggregate
+    Group Key: g, g
+    Group Key: g
+    ->  Sort
+          Sort Key: g
+          ->  Function Scan on generate_series g
+ (6 rows)
+
+ select g as alias1, g as alias2
+   from generate_series(1,3) g
+  group by alias1, rollup(alias2);
+  alias1 | alias2
+ --------+--------
+       1 |      1
+       1 |
+       2 |      2
+       2 |
+       3 |      3
+       3 |
+ (6 rows)
+
  -- simple rescan tests
  select a, b, sum(v.x)
    from (values (1),(2)) v(x), gstest_data(v.x)
diff --git a/src/test/regress/sql/groupingsets.sql b/src/test/regress/sql/groupingsets.sql
index 564ebc9..2b4ab69 100644
*** a/src/test/regress/sql/groupingsets.sql
--- b/src/test/regress/sql/groupingsets.sql
*************** select a, d, grouping(a,b,c)
*** 141,146 ****
--- 141,157 ----
    from gstest3
   group by grouping sets ((a,b), (a,c));

+ -- check that distinct grouping columns are kept separate
+ -- even if they are equal()
+ explain (costs off)
+ select g as alias1, g as alias2
+   from generate_series(1,3) g
+  group by alias1, rollup(alias2);
+
+ select g as alias1, g as alias2
+   from generate_series(1,3) g
+  group by alias1, rollup(alias2);
+
  -- simple rescan tests

  select a, b, sum(v.x)

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Improper const-evaluation of HAVING with grouping sets andsubquery pullup

От
Michael Paquier
Дата:
On Thu, Oct 26, 2017 at 4:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Given that this is such a weird corner case, and we've not heard
> complaints from actual users about it, I'm inclined just to apply
> the patch in 9.6 and up and call it good.

Tom, are you planning to finish wrapping up this one? For now I am
moving it to next CF.
-- 
Michael


Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup

От
Andrew Gierth
Дата:
>>>>> "Michael" == Michael Paquier <michael.paquier@gmail.com> writes:
Michael> Tom, are you planning to finish wrapping up this one? For nowMichael> I am moving it to next CF.

I can take it if Tom and Heikki are busy.

-- 
Andrew.


Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup

От
Tom Lane
Дата:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Michael" == Michael Paquier <michael.paquier@gmail.com> writes:
>  Michael> Tom, are you planning to finish wrapping up this one? For now
>  Michael> I am moving it to next CF.

> I can take it if Tom and Heikki are busy.

FWIW, I'm not really comfortable that the proposed patch is correct
or complete.  It may just need more study to get there.
        regards, tom lane


Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup

От
Tom Lane
Дата:
I wrote:
> FWIW, I'm not really comfortable that the proposed patch is correct
> or complete.  It may just need more study to get there.

I've done another round of study on this patch.  The attached updated
version is the same code as Heikki proposed (minus the incorrect
restriction to queries with HAVING quals), but I reworked the comments
and expanded the regression test cases.

One thing that wasn't clear to me before was whether we need wrap_non_vars
for this case or not; we don't for outer joins, so I was unconvinced about
it here.  It turns out we do: the point of the wrapper is to prevent
constant folding or other expression preprocessing from merging a
pulled-up expression with the surrounding expression, resulting in
something that won't match the grouping set expression when it comes time
to do that matching.  For instance if we have a boolean subquery output
expression, say "x = y as cond", and that gets hoisted into an upper
expression "not cond", then without the PHV wrapper we will happily
simplify that to "x <> y" which will not match the grouping set
expression.  There's a regression test below that misbehaves if you take
out the "wrap_non_vars = true" line.

I spent some time thinking about Andrew's observation that we don't really
need the wrappers everyplace.  It's true, but pullup_replace_vars is far
from being able to do the right thing there, and I'm not sure that trying
to teach it to do so is reasonable.  (I'm inclined to think that the idea
I threw out upthread about converting grouping expressions into Vars
belonging to a new RTE kind might be the way to go.)  In any case I don't
think we'd possibly come out with a patch simple enough to back-patch.
So let's leave that optimization for future work.

I think the attached is probably ready to go, though I've not checked yet
whether it will work pre-9.6.  Anyone want to do more review?

            regards, tom lane

diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 0e2a220..45d82da 100644
*** a/src/backend/optimizer/prep/prepjointree.c
--- b/src/backend/optimizer/prep/prepjointree.c
*************** pull_up_simple_subquery(PlannerInfo *roo
*** 1003,1013 ****

      /*
       * The subquery's targetlist items are now in the appropriate form to
!      * insert into the top query, but if we are under an outer join then
!      * non-nullable items and lateral references may have to be turned into
!      * PlaceHolderVars.  If we are dealing with an appendrel member then
!      * anything that's not a simple Var has to be turned into a
!      * PlaceHolderVar.  Set up required context data for pullup_replace_vars.
       */
      rvcontext.root = root;
      rvcontext.targetlist = subquery->targetList;
--- 1003,1010 ----

      /*
       * The subquery's targetlist items are now in the appropriate form to
!      * insert into the top query, except that we may need to wrap them in
!      * PlaceHolderVars.  Set up required context data for pullup_replace_vars.
       */
      rvcontext.root = root;
      rvcontext.targetlist = subquery->targetList;
*************** pull_up_simple_subquery(PlannerInfo *roo
*** 1019,1032 ****
          rvcontext.relids = NULL;
      rvcontext.outer_hasSubLinks = &parse->hasSubLinks;
      rvcontext.varno = varno;
!     rvcontext.need_phvs = (lowest_nulling_outer_join != NULL ||
!                            containing_appendrel != NULL);
!     rvcontext.wrap_non_vars = (containing_appendrel != NULL);
      /* initialize cache array with indexes 0 .. length(tlist) */
      rvcontext.rv_cache = palloc0((list_length(subquery->targetList) + 1) *
                                   sizeof(Node *));

      /*
       * Replace all of the top query's references to the subquery's outputs
       * with copies of the adjusted subtlist items, being careful not to
       * replace any of the jointree structure. (This'd be a lot cleaner if we
--- 1016,1064 ----
          rvcontext.relids = NULL;
      rvcontext.outer_hasSubLinks = &parse->hasSubLinks;
      rvcontext.varno = varno;
!     /* these flags will be set below, if needed */
!     rvcontext.need_phvs = false;
!     rvcontext.wrap_non_vars = false;
      /* initialize cache array with indexes 0 .. length(tlist) */
      rvcontext.rv_cache = palloc0((list_length(subquery->targetList) + 1) *
                                   sizeof(Node *));

      /*
+      * If we are under an outer join then non-nullable items and lateral
+      * references may have to be turned into PlaceHolderVars.
+      */
+     if (lowest_nulling_outer_join != NULL)
+         rvcontext.need_phvs = true;
+
+     /*
+      * If we are dealing with an appendrel member then anything that's not a
+      * simple Var has to be turned into a PlaceHolderVar.  We force this to
+      * ensure that what we pull up doesn't get merged into a surrounding
+      * expression during later processing and then fail to match the
+      * expression actually available from the appendrel.
+      */
+     if (containing_appendrel != NULL)
+     {
+         rvcontext.need_phvs = true;
+         rvcontext.wrap_non_vars = true;
+     }
+
+     /*
+      * If the parent query uses grouping sets, we need a PlaceHolderVar for
+      * anything that's not a simple Var.  Again, this ensures that expressions
+      * retain their separate identity so that they will match grouping set
+      * columns when appropriate.  (It'd be sufficient to wrap values used in
+      * grouping set columns, and do so only in non-aggregated portions of the
+      * tlist and havingQual, but that would require a lot of infrastructure
+      * that pullup_replace_vars hasn't currently got.)
+      */
+     if (parse->groupingSets)
+     {
+         rvcontext.need_phvs = true;
+         rvcontext.wrap_non_vars = true;
+     }
+
+     /*
       * Replace all of the top query's references to the subquery's outputs
       * with copies of the adjusted subtlist items, being careful not to
       * replace any of the jointree structure. (This'd be a lot cleaner if we
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index 833d515..cbfdbfd 100644
*** a/src/test/regress/expected/groupingsets.out
--- b/src/test/regress/expected/groupingsets.out
*************** select g as alias1, g as alias2
*** 389,394 ****
--- 389,439 ----
        3 |
  (6 rows)

+ -- check that pulled-up subquery outputs still go to null when appropriate
+ select four, x
+   from (select four, ten, 'foo'::text as x from tenk1) as t
+   group by grouping sets (four, x)
+   having x = 'foo';
+  four |  x
+ ------+-----
+       | foo
+ (1 row)
+
+ select four, x || 'x'
+   from (select four, ten, 'foo'::text as x from tenk1) as t
+   group by grouping sets (four, x)
+   order by four;
+  four | ?column?
+ ------+----------
+     0 |
+     1 |
+     2 |
+     3 |
+       | foox
+ (5 rows)
+
+ select (x+y)*1, sum(z)
+  from (select 1 as x, 2 as y, 3 as z) s
+  group by grouping sets (x+y, x);
+  ?column? | sum
+ ----------+-----
+         3 |   3
+           |   3
+ (2 rows)
+
+ select x, not x as not_x, q2 from
+   (select *, q1 = 1 as x from int8_tbl i1) as t
+   group by grouping sets(x, q2)
+   order by x, q2;
+  x | not_x |        q2
+ ---+-------+-------------------
+  f | t     |
+    |       | -4567890123456789
+    |       |               123
+    |       |               456
+    |       |  4567890123456789
+ (5 rows)
+
  -- simple rescan tests
  select a, b, sum(v.x)
    from (values (1),(2)) v(x), gstest_data(v.x)
diff --git a/src/test/regress/sql/groupingsets.sql b/src/test/regress/sql/groupingsets.sql
index 2b4ab69..b28d821 100644
*** a/src/test/regress/sql/groupingsets.sql
--- b/src/test/regress/sql/groupingsets.sql
*************** select g as alias1, g as alias2
*** 152,157 ****
--- 152,177 ----
    from generate_series(1,3) g
   group by alias1, rollup(alias2);

+ -- check that pulled-up subquery outputs still go to null when appropriate
+ select four, x
+   from (select four, ten, 'foo'::text as x from tenk1) as t
+   group by grouping sets (four, x)
+   having x = 'foo';
+
+ select four, x || 'x'
+   from (select four, ten, 'foo'::text as x from tenk1) as t
+   group by grouping sets (four, x)
+   order by four;
+
+ select (x+y)*1, sum(z)
+  from (select 1 as x, 2 as y, 3 as z) s
+  group by grouping sets (x+y, x);
+
+ select x, not x as not_x, q2 from
+   (select *, q1 = 1 as x from int8_tbl i1) as t
+   group by grouping sets(x, q2)
+   order by x, q2;
+
  -- simple rescan tests

  select a, b, sum(v.x)

Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup

От
Tom Lane
Дата:
I wrote:
> I think the attached is probably ready to go, though I've not checked yet
> whether it will work pre-9.6.  Anyone want to do more review?

I pushed this back through 9.5.  Oddly, the test case Andrew proposed,

select (x+y)*1, sum(z)
 from (select 1 as x, 2 as y, 3 as z) s
 group by grouping sets (x+y, x);

passes in unmodified 9.5.  I haven't bothered to try to figure out
exactly why the difference in behavior; I have a feeling it's related
to the upper planner pathification changes in 9.6.

However ... we're not out of the woods yet.  Shortly after pushing
it occurred to me that the basic problem, const-simplification of
expressions whose subexpressions should match grouping set expressions,
can happen without any subquery at all.  For instance:

regression=# select q1=q2, not(q1=q2), q1 from int8_tbl
group by grouping sets(q1=q2, q1);
 ?column? | ?column? |        q1        
----------+----------+------------------
 f        |          |                 
 t        |          |                 
          | t        |              123
          | t        | 4567890123456789
(4 rows)

Surely that's wrong.

So, IMO, basically what this shows is that we have to identify and replace
subexpressions that should match grouping set expressions before the
planner starts to do expression processing, rather than just leaving it to
accidentally happen in setrefs.c.  I still like the idea of representing
such subexpressions as Vars referencing a dummy RTE, but that seems
unlikely to lead to a back-patchable fix.  However, I'm not sure that any
back-patchable fix would be practical anyway, so maybe we should just
accept that this is going to stay broken in the back branches.  At least
it doesn't arise for spec-compliant queries.

            regards, tom lane