Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup
Дата
Msg-id 22850.1515710814@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [BUGS] Improper const-evaluation of HAVING with grouping sets and subquery pullup  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
I wrote:
> FWIW, I'm not really comfortable that the proposed patch is correct
> or complete.  It may just need more study to get there.

I've done another round of study on this patch.  The attached updated
version is the same code as Heikki proposed (minus the incorrect
restriction to queries with HAVING quals), but I reworked the comments
and expanded the regression test cases.

One thing that wasn't clear to me before was whether we need wrap_non_vars
for this case or not; we don't for outer joins, so I was unconvinced about
it here.  It turns out we do: the point of the wrapper is to prevent
constant folding or other expression preprocessing from merging a
pulled-up expression with the surrounding expression, resulting in
something that won't match the grouping set expression when it comes time
to do that matching.  For instance if we have a boolean subquery output
expression, say "x = y as cond", and that gets hoisted into an upper
expression "not cond", then without the PHV wrapper we will happily
simplify that to "x <> y" which will not match the grouping set
expression.  There's a regression test below that misbehaves if you take
out the "wrap_non_vars = true" line.

I spent some time thinking about Andrew's observation that we don't really
need the wrappers everyplace.  It's true, but pullup_replace_vars is far
from being able to do the right thing there, and I'm not sure that trying
to teach it to do so is reasonable.  (I'm inclined to think that the idea
I threw out upthread about converting grouping expressions into Vars
belonging to a new RTE kind might be the way to go.)  In any case I don't
think we'd possibly come out with a patch simple enough to back-patch.
So let's leave that optimization for future work.

I think the attached is probably ready to go, though I've not checked yet
whether it will work pre-9.6.  Anyone want to do more review?

            regards, tom lane

diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 0e2a220..45d82da 100644
*** a/src/backend/optimizer/prep/prepjointree.c
--- b/src/backend/optimizer/prep/prepjointree.c
*************** pull_up_simple_subquery(PlannerInfo *roo
*** 1003,1013 ****

      /*
       * The subquery's targetlist items are now in the appropriate form to
!      * insert into the top query, but if we are under an outer join then
!      * non-nullable items and lateral references may have to be turned into
!      * PlaceHolderVars.  If we are dealing with an appendrel member then
!      * anything that's not a simple Var has to be turned into a
!      * PlaceHolderVar.  Set up required context data for pullup_replace_vars.
       */
      rvcontext.root = root;
      rvcontext.targetlist = subquery->targetList;
--- 1003,1010 ----

      /*
       * The subquery's targetlist items are now in the appropriate form to
!      * insert into the top query, except that we may need to wrap them in
!      * PlaceHolderVars.  Set up required context data for pullup_replace_vars.
       */
      rvcontext.root = root;
      rvcontext.targetlist = subquery->targetList;
*************** pull_up_simple_subquery(PlannerInfo *roo
*** 1019,1032 ****
          rvcontext.relids = NULL;
      rvcontext.outer_hasSubLinks = &parse->hasSubLinks;
      rvcontext.varno = varno;
!     rvcontext.need_phvs = (lowest_nulling_outer_join != NULL ||
!                            containing_appendrel != NULL);
!     rvcontext.wrap_non_vars = (containing_appendrel != NULL);
      /* initialize cache array with indexes 0 .. length(tlist) */
      rvcontext.rv_cache = palloc0((list_length(subquery->targetList) + 1) *
                                   sizeof(Node *));

      /*
       * Replace all of the top query's references to the subquery's outputs
       * with copies of the adjusted subtlist items, being careful not to
       * replace any of the jointree structure. (This'd be a lot cleaner if we
--- 1016,1064 ----
          rvcontext.relids = NULL;
      rvcontext.outer_hasSubLinks = &parse->hasSubLinks;
      rvcontext.varno = varno;
!     /* these flags will be set below, if needed */
!     rvcontext.need_phvs = false;
!     rvcontext.wrap_non_vars = false;
      /* initialize cache array with indexes 0 .. length(tlist) */
      rvcontext.rv_cache = palloc0((list_length(subquery->targetList) + 1) *
                                   sizeof(Node *));

      /*
+      * If we are under an outer join then non-nullable items and lateral
+      * references may have to be turned into PlaceHolderVars.
+      */
+     if (lowest_nulling_outer_join != NULL)
+         rvcontext.need_phvs = true;
+
+     /*
+      * If we are dealing with an appendrel member then anything that's not a
+      * simple Var has to be turned into a PlaceHolderVar.  We force this to
+      * ensure that what we pull up doesn't get merged into a surrounding
+      * expression during later processing and then fail to match the
+      * expression actually available from the appendrel.
+      */
+     if (containing_appendrel != NULL)
+     {
+         rvcontext.need_phvs = true;
+         rvcontext.wrap_non_vars = true;
+     }
+
+     /*
+      * If the parent query uses grouping sets, we need a PlaceHolderVar for
+      * anything that's not a simple Var.  Again, this ensures that expressions
+      * retain their separate identity so that they will match grouping set
+      * columns when appropriate.  (It'd be sufficient to wrap values used in
+      * grouping set columns, and do so only in non-aggregated portions of the
+      * tlist and havingQual, but that would require a lot of infrastructure
+      * that pullup_replace_vars hasn't currently got.)
+      */
+     if (parse->groupingSets)
+     {
+         rvcontext.need_phvs = true;
+         rvcontext.wrap_non_vars = true;
+     }
+
+     /*
       * Replace all of the top query's references to the subquery's outputs
       * with copies of the adjusted subtlist items, being careful not to
       * replace any of the jointree structure. (This'd be a lot cleaner if we
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index 833d515..cbfdbfd 100644
*** a/src/test/regress/expected/groupingsets.out
--- b/src/test/regress/expected/groupingsets.out
*************** select g as alias1, g as alias2
*** 389,394 ****
--- 389,439 ----
        3 |
  (6 rows)

+ -- check that pulled-up subquery outputs still go to null when appropriate
+ select four, x
+   from (select four, ten, 'foo'::text as x from tenk1) as t
+   group by grouping sets (four, x)
+   having x = 'foo';
+  four |  x
+ ------+-----
+       | foo
+ (1 row)
+
+ select four, x || 'x'
+   from (select four, ten, 'foo'::text as x from tenk1) as t
+   group by grouping sets (four, x)
+   order by four;
+  four | ?column?
+ ------+----------
+     0 |
+     1 |
+     2 |
+     3 |
+       | foox
+ (5 rows)
+
+ select (x+y)*1, sum(z)
+  from (select 1 as x, 2 as y, 3 as z) s
+  group by grouping sets (x+y, x);
+  ?column? | sum
+ ----------+-----
+         3 |   3
+           |   3
+ (2 rows)
+
+ select x, not x as not_x, q2 from
+   (select *, q1 = 1 as x from int8_tbl i1) as t
+   group by grouping sets(x, q2)
+   order by x, q2;
+  x | not_x |        q2
+ ---+-------+-------------------
+  f | t     |
+    |       | -4567890123456789
+    |       |               123
+    |       |               456
+    |       |  4567890123456789
+ (5 rows)
+
  -- simple rescan tests
  select a, b, sum(v.x)
    from (values (1),(2)) v(x), gstest_data(v.x)
diff --git a/src/test/regress/sql/groupingsets.sql b/src/test/regress/sql/groupingsets.sql
index 2b4ab69..b28d821 100644
*** a/src/test/regress/sql/groupingsets.sql
--- b/src/test/regress/sql/groupingsets.sql
*************** select g as alias1, g as alias2
*** 152,157 ****
--- 152,177 ----
    from generate_series(1,3) g
   group by alias1, rollup(alias2);

+ -- check that pulled-up subquery outputs still go to null when appropriate
+ select four, x
+   from (select four, ten, 'foo'::text as x from tenk1) as t
+   group by grouping sets (four, x)
+   having x = 'foo';
+
+ select four, x || 'x'
+   from (select four, ten, 'foo'::text as x from tenk1) as t
+   group by grouping sets (four, x)
+   order by four;
+
+ select (x+y)*1, sum(z)
+  from (select 1 as x, 2 as y, 3 as z) s
+  group by grouping sets (x+y, x);
+
+ select x, not x as not_x, q2 from
+   (select *, q1 = 1 as x from int8_tbl i1) as t
+   group by grouping sets(x, q2)
+   order by x, q2;
+
  -- simple rescan tests

  select a, b, sum(v.x)

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: [BUGS] BUG #14875: pg_createcluster fails to load--createclusterconf
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?