Обсуждение: Optimizer Doesn't Push Down Where Expressions on Rollups

Поиск
Список
Период
Сортировка

Optimizer Doesn't Push Down Where Expressions on Rollups

От
Logan Bowers
Дата:
Hello, 

I’m running into a performance issue where I’m trying to introduce a rollup into GROUP BY clause. I believe the query planner is missing an optimization when grouping by multiple terms. 

Here’s an example that replicates the problem: 

BEGIN;
CREATE TABLE limitation_demo (c1 integer, c2 integer, c3 integer, c4 integer);

INSERT INTO limitation_demo SELECT
    (random() * 20)::integer,
    (random() * 20)::integer,
    (random() * 20)::integer,
    (random() * 20)::integer
  FROM generate_series(0,10000) AS foo;

CREATE INDEX idx1 ON limitation_demo (c1, c2, c3);

--Good
EXPLAIN SELECT * FROM (SELECT c1,c2,c3, sum(c4) FROM limitation_demo GROUP BY c1,c2,c3) AS foo  WHERE c1 = 1 AND c2 = 5 ;
--Bad
EXPLAIN SELECT * FROM (SELECT c1,c2,c3, sum(c4) FROM limitation_demo GROUP BY c1, c2, ROLLUP(c3)) AS foo  WHERE c1 = 1 AND c2 = 5;
ROLLBACK;


Here are the respective query plans: 

                                    QUERY PLAN
-----------------------------------------------------------------------------------
 GroupAggregate  (cost=0.29..8.32 rows=1 width=20)
   Group Key: limitation_demo.c1, limitation_demo.c2, limitation_demo.c3
   ->  Index Scan using idx1 on limitation_demo  (cost=0.29..8.30 rows=1 width=16)
         Index Cond: ((c1 = 1) AND (c2 = 5))
(4 rows)

                                 QUERY PLAN
----------------------------------------------------------------------------
 HashAggregate  (cost=255.02..360.03 rows=2 width=20)
   Hash Key: limitation_demo.c1, limitation_demo.c2, limitation_demo.c3
   Hash Key: limitation_demo.c1, limitation_demo.c2
   Filter: ((limitation_demo.c1 = 1) AND (limitation_demo.c2 = 5))
   ->  Seq Scan on limitation_demo  (cost=0.00..155.01 rows=10001 width=16)
(5 rows)

Despite being semantically equivalent, I believe, the latter plan with the rollup does not use the index. I believe what is happening is that the WHERE clause is getting pushed down into the inner query without the ROLLUP, but is not with the ROLLUP. 

This is a simplified example. In my real-world use case, the inner query is a view, so I don’t have the option of moving the where clause. 

Can y’all let me know if I should submit this somewhere else? Thanks! 

Re: Optimizer Doesn't Push Down Where Expressions on Rollups

От
Richard Guo
Дата:
In your case, the WHERE clauses would get pushed down into the subquery
for both queries, with/without the ROLLUP. But since the subquery uses
grouping/grouping sets, the WHERE clauses would be put in HAVING of the
subquery.

Then when we plan for the subquery, we will decide whether a HAVING
clause can be transfered into WHERE. Usually we do not do that if there
are any nonempty grouping sets. Because if any referenced column isn't
present in all the grouping sets, moving such a clause into WHERE would
potentially change the results.

Thanks
Richard

Re: Optimizer Doesn't Push Down Where Expressions on Rollups

От
Tom Lane
Дата:
Richard Guo <guofenglinux@gmail.com> writes:
> In your case, the WHERE clauses would get pushed down into the subquery
> for both queries, with/without the ROLLUP. But since the subquery uses
> grouping/grouping sets, the WHERE clauses would be put in HAVING of the
> subquery.

Right, we do successfully push the clauses into HAVING of the subquery.

> Then when we plan for the subquery, we will decide whether a HAVING
> clause can be transfered into WHERE. Usually we do not do that if there
> are any nonempty grouping sets. Because if any referenced column isn't
> present in all the grouping sets, moving such a clause into WHERE would
> potentially change the results.

Yeah.  I think that it might be safe if the proposed clause can
be proven strict for (some subset of?) the grouping columns, because
that would eliminate the rollup grouping sets where those columns
come out NULL because they aren't being grouped on.  (This could then
also factor into throwing away those grouping sets, perhaps.)

Anyway, this is not a bug; it's a proposed planner improvement, and
not a trivial one.

            regards, tom lane



Re: Optimizer Doesn't Push Down Where Expressions on Rollups

От
Logan Bowers
Дата:
Just in case it wasn’t obvious from the example, I’m talking about only cases where the all groups in the grouping set
sharea subset of columns in common and those are the columns being conditioned in the WHERE clause.  

I get it if it’s not something actionable, but it is a bummer to see query time explode when going from one grouping
setto two grouping sets. :/  

Cheers, and I’ve been a big fan of y’alls work for going on two decades now. Thank you!

> On Mar 11, 2020, at 7:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Richard Guo <guofenglinux@gmail.com> writes:
>> In your case, the WHERE clauses would get pushed down into the subquery
>> for both queries, with/without the ROLLUP. But since the subquery uses
>> grouping/grouping sets, the WHERE clauses would be put in HAVING of the
>> subquery.
>
> Right, we do successfully push the clauses into HAVING of the subquery.
>
>> Then when we plan for the subquery, we will decide whether a HAVING
>> clause can be transfered into WHERE. Usually we do not do that if there
>> are any nonempty grouping sets. Because if any referenced column isn't
>> present in all the grouping sets, moving such a clause into WHERE would
>> potentially change the results.
>
> Yeah.  I think that it might be safe if the proposed clause can
> be proven strict for (some subset of?) the grouping columns, because
> that would eliminate the rollup grouping sets where those columns
> come out NULL because they aren't being grouped on.  (This could then
> also factor into throwing away those grouping sets, perhaps.)
>
> Anyway, this is not a bug; it's a proposed planner improvement, and
> not a trivial one.
>
>             regards, tom lane




Re: Optimizer Doesn't Push Down Where Expressions on Rollups

От
Richard Guo
Дата:
On Wed, Mar 11, 2020 at 10:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
> In your case, the WHERE clauses would get pushed down into the subquery
> for both queries, with/without the ROLLUP. But since the subquery uses
> grouping/grouping sets, the WHERE clauses would be put in HAVING of the
> subquery.

Right, we do successfully push the clauses into HAVING of the subquery.

> Then when we plan for the subquery, we will decide whether a HAVING
> clause can be transfered into WHERE. Usually we do not do that if there
> are any nonempty grouping sets. Because if any referenced column isn't
> present in all the grouping sets, moving such a clause into WHERE would
> potentially change the results.

Yeah.  I think that it might be safe if the proposed clause can
be proven strict for (some subset of?) the grouping columns, because
that would eliminate the rollup grouping sets where those columns
come out NULL because they aren't being grouped on.  (This could then
also factor into throwing away those grouping sets, perhaps.)

This seems correct to me. If we can prove the HAVING clause is strict
for some grouping columns, then we can throw away the grouping sets that
do not contain these grouping columns, since their results would be
eliminated by this HAVING clause. After that we can move this HAVING
clause to WHERE. I'm thinking about this example:

select c1, c2, sum(c4) from t group by
    grouping sets ((c1, c2), (c2, c3), (c1, c4)) having c2 = 2;

select c1, c2, sum(c4) from t group by
    grouping sets ((c1, c2), (c2, c3)) having c2 = 2;

select c1, c2, sum(c4) from t where c2 = 2 group by
    grouping sets ((c1, c2), (c2, c3));


For non-strict HAVING clause, if its referenced columns are present in
all the grouping sets, I think we should also be able to move it to
WHERE.

Thanks
Richard 

Re: Optimizer Doesn't Push Down Where Expressions on Rollups

От
Richard Guo
Дата:
Hi,

(cc'ing -hackers)

We used to push down clauses from HAVING to WHERE when grouping sets are
used in 61444bfb and then reverted it in a6897efa because of wrong
results issue. As now there are people suffering from performance issue
as described in [1], I'm wondering if we should give it another try.


Thanks
Richard

On Thu, Mar 12, 2020 at 6:22 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Wed, Mar 11, 2020 at 10:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
> In your case, the WHERE clauses would get pushed down into the subquery
> for both queries, with/without the ROLLUP. But since the subquery uses
> grouping/grouping sets, the WHERE clauses would be put in HAVING of the
> subquery.

Right, we do successfully push the clauses into HAVING of the subquery.

> Then when we plan for the subquery, we will decide whether a HAVING
> clause can be transfered into WHERE. Usually we do not do that if there
> are any nonempty grouping sets. Because if any referenced column isn't
> present in all the grouping sets, moving such a clause into WHERE would
> potentially change the results.

Yeah.  I think that it might be safe if the proposed clause can
be proven strict for (some subset of?) the grouping columns, because
that would eliminate the rollup grouping sets where those columns
come out NULL because they aren't being grouped on.  (This could then
also factor into throwing away those grouping sets, perhaps.)

This seems correct to me. If we can prove the HAVING clause is strict
for some grouping columns, then we can throw away the grouping sets that
do not contain these grouping columns, since their results would be
eliminated by this HAVING clause. After that we can move this HAVING
clause to WHERE. I'm thinking about this example:

select c1, c2, sum(c4) from t group by
    grouping sets ((c1, c2), (c2, c3), (c1, c4)) having c2 = 2;

select c1, c2, sum(c4) from t group by
    grouping sets ((c1, c2), (c2, c3)) having c2 = 2;

select c1, c2, sum(c4) from t where c2 = 2 group by
    grouping sets ((c1, c2), (c2, c3));


For non-strict HAVING clause, if its referenced columns are present in
all the grouping sets, I think we should also be able to move it to
WHERE.

Thanks
Richard 

Re: Optimizer Doesn't Push Down Where Expressions on Rollups

От
Richard Guo
Дата:
Hi,

(cc'ing -hackers)

We used to push down clauses from HAVING to WHERE when grouping sets are
used in 61444bfb and then reverted it in a6897efa because of wrong
results issue. As now there are people suffering from performance issue
as described in [1], I'm wondering if we should give it another try.


Thanks
Richard

On Thu, Mar 12, 2020 at 6:22 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Wed, Mar 11, 2020 at 10:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
> In your case, the WHERE clauses would get pushed down into the subquery
> for both queries, with/without the ROLLUP. But since the subquery uses
> grouping/grouping sets, the WHERE clauses would be put in HAVING of the
> subquery.

Right, we do successfully push the clauses into HAVING of the subquery.

> Then when we plan for the subquery, we will decide whether a HAVING
> clause can be transfered into WHERE. Usually we do not do that if there
> are any nonempty grouping sets. Because if any referenced column isn't
> present in all the grouping sets, moving such a clause into WHERE would
> potentially change the results.

Yeah.  I think that it might be safe if the proposed clause can
be proven strict for (some subset of?) the grouping columns, because
that would eliminate the rollup grouping sets where those columns
come out NULL because they aren't being grouped on.  (This could then
also factor into throwing away those grouping sets, perhaps.)

This seems correct to me. If we can prove the HAVING clause is strict
for some grouping columns, then we can throw away the grouping sets that
do not contain these grouping columns, since their results would be
eliminated by this HAVING clause. After that we can move this HAVING
clause to WHERE. I'm thinking about this example:

select c1, c2, sum(c4) from t group by
    grouping sets ((c1, c2), (c2, c3), (c1, c4)) having c2 = 2;

select c1, c2, sum(c4) from t group by
    grouping sets ((c1, c2), (c2, c3)) having c2 = 2;

select c1, c2, sum(c4) from t where c2 = 2 group by
    grouping sets ((c1, c2), (c2, c3));


For non-strict HAVING clause, if its referenced columns are present in
all the grouping sets, I think we should also be able to move it to
WHERE.

Thanks
Richard