Re: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS

Поиск

Список

Период

Сортировка

От	Andy Fan
Тема	Re: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS
Дата	21 августа 2020 г. 07:15:06
Msg-id	CAKU4AWqxP2ePgxf8-RNLXPQVOG7=CbTPbfCbtV1ebiHtMT3L6Q@mail.gmail.com обсуждение исходный текст
Ответ на	Re: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS (Andy Fan <zhihui.fan1213@gmail.com>)
Список	pgsql-bugs

Дерево обсуждения

On Fri, Aug 21, 2020 at 11:50 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Fri, Aug 21, 2020 at 10:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andy Fan <zhihui.fan1213@gmail.com> writes:
> On Fri, Aug 21, 2020 at 5:51 AM Andrew Gierth <andrew@tao11.riddles.org.uk>
> wrote:
>> Unless I'm missing something, it should be safe to reference output
>> columns that are not mentioned in any grouping set,

> I think such columns usually are aggregation expr, If we want to push down
> a qual which reference to an aggregation expr, we have to push down
> to having cause, However I am not sure such pushing down really helps.

Well, they can either be aggregates, or functions of the grouping
columns. You're right that there's not much we can do (today) with
restrictions on aggregate outputs, but there can be value in pushing
down restrictions on the other sort.

As an example, consider the regression database's tenk1 table, and
for argument's sake add

regression=# create index on tenk1 (abs(hundred));
CREATE INDEX

Then we can get

regression=# explain select * from (select hundred, ten, abs(hundred) a, count(*) c from tenk1 group by 1,2) ss where a = 42;
QUERY PLAN
------------------------------------------------------------------------------------
HashAggregate (cost=225.98..227.18 rows=96 width=20)
Group Key: tenk1.hundred, tenk1.ten
-> Bitmap Heap Scan on tenk1 (cost=5.06..225.23 rows=100 width=8)
Recheck Cond: (abs(hundred) = 42)
-> Bitmap Index Scan on tenk1_abs_idx (cost=0.00..5.04 rows=100 width=0)
Index Cond: (abs(hundred) = 42)
(6 rows)

which is a lot cheaper than the pure seqscan you get with no pushed-down
condition.

One thing that I find curious is that if I alter this example to use
grouping sets, say

regression=# explain select * from (select hundred, ten, abs(hundred) a, count(*) c from tenk1 group by grouping sets (1,2)) ss where a = 42;
QUERY PLAN
-----------------------------------------------------------------
HashAggregate (cost=495.00..546.65 rows=2 width=20)
Hash Key: tenk1.hundred
Hash Key: tenk1.ten
Filter: (abs(tenk1.hundred) = 42)
-> Seq Scan on tenk1 (cost=0.00..445.00 rows=10000 width=8)
(5 rows)

i.e. it's not seeing the abs() condition as pushable below the
aggregation. I'm not quite sure if that's a necessary restriction
or a missed optimization.

regards, tom lane

Both of the queries can push down the qual "a = 42" to the subquery->havingQual
since we have group-by clause, this method unify the process for aggregation call
and non-aggregation expr. . so it become to

select .. from (select .. from tenk1 group .. having (abs(hundred) = 2);

later in the subquery_planner, we will try to pull the having clause to where clause.
then the Q2 failed to do so.

/*
* In some cases we may want to transfer a HAVING clause into WHERE. We
* cannot do so if the HAVING clause contains aggregates (obviously) or
* volatile functions (since a HAVING clause is supposed to be executed
* only once per group). We also can't do this if there are any nonempty
* grouping sets; moving such a clause into WHERE would potentially change
* the results, if any referenced column isn't present in all the grouping
* sets. (If there are only empty grouping sets, then the HAVING clause
* must be degenerate as discussed below.)
*/

I'm still trying to understand the comment, though.

This should be a correct behavior, we should not push down in the Q2 case. Here is an example:

regression=# create table tgs(a int, b int);
CREATE TABLE
regression=# insert into tgs values(1, 1), (1, 2), (2, 2);
INSERT 0 3
regression=# select * from (select a, b, count(*) from tgs group by grouping sets((a), (b))) t where b = 1;
a | b | count
---+---+-------
| 1 | 1
(1 row)

regression=# select * from (select a, b, count(*) from tgs group by grouping sets((a), (b)) having b = 1) t;
a | b | count
---+---+-------
| 1 | 1
(1 row)

regression=# select * from (select a, b, count(*) from tgs where b = 1 group by grouping sets((a), (b)) ) t;
a | b | count
---+---+-------
1 | | 1
| 1 | 1

(2 rows)

At the same time, our optimizer is smart enough to handle the below case (only 1 set in group sets, which equals

group by).

regression=# explain select * from (select a, b, count(*) from tgs group by grouping sets((a, b)) ) t where b = 1;
QUERY PLAN
-----------------------------------------------------------------
GroupAggregate (cost=38.44..38.66 rows=11 width=16)
Group Key: tgs.a, tgs.b
-> Sort (cost=38.44..38.47 rows=11 width=8)
Sort Key: tgs.a
-> Seq Scan on tgs (cost=0.00..38.25 rows=11 width=8)
Filter: (b = 1)
(6 rows)

Best Regards

Andy Fan

В списке pgsql-bugs по дате отправления:

Предыдущее

От: Andy Fan
Дата: 21 августа 2020 г., 06:50:10
Сообщение: Re: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS

Следующее

От: Andrew Gierth
Дата: 21 августа 2020 г., 16:04:50
Сообщение: Re: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BUG #16585: Wrong filtering on a COALESCE field after using GROUPING SETS

Предыдущее

Следующее