Re: Partial aggregates pushdown

Поиск
Список
Период
Сортировка
От Alexander Pyhalov
Тема Re: Partial aggregates pushdown
Дата
Msg-id e2e23366a3897f2fd89ac4a43dd3b82c@postgrespro.ru
обсуждение исходный текст
Ответ на RE: Partial aggregates pushdown  ("Fujii.Yuki@df.MitsubishiElectric.co.jp" <Fujii.Yuki@df.MitsubishiElectric.co.jp>)
Ответы Re: Partial aggregates pushdown
Список pgsql-hackers
Fujii.Yuki@df.MitsubishiElectric.co.jp писал 2023-07-19 03:43:
> Hi Mr.Pyhalov, hackers.

> 3)
> I modified the patch to safely do a partial aggregate pushdown for
> queries which contain having clauses.
> 

Hi.
Sorry, but I don't see how it could work.
For example, the attached test returns wrong result:

CREATE FUNCTION f() RETURNS INT AS $$
begin
   return 10;
end
$$ LANGUAGE PLPGSQL;

SELECT b, sum(a) FROM pagg_tab GROUP BY b HAVING sum(a) < f() ORDER BY 
1;
  b  | sum
----+-----
   0 |   0
  10 |   0
  20 |   0
  30 |   0
  40 |   0
+(5 rows)

In fact the above query should have returned 0 rows, as

SELECT b, sum(a) FROM pagg_tab GROUP BY b ORDER BY 1;
  b  | sum
----+------
   0 |  600
   1 |  660
   2 |  720
   3 |  780
   4 |  840
   5 |  900
   6 |  960
   7 | 1020
   8 | 1080
   9 | 1140
  10 |  600
  11 |  660
  12 |  720
....
shows no such rows.

Or, on the same data

SELECT b, sum(a) FROM pagg_tab GROUP BY b HAVING sum(a) > 660 ORDER BY 
1;

You'll get 0 rows.

But
SELECT b, sum(a) FROM pagg_tab GROUP BY b;
  b  | sum
----+------
  42 |  720
  29 | 1140
   4 |  840
  34 |  840
  41 |  660
   0 |  600
  40 |  600
gives.

The issue is that you can't calculate "partial" having. You should 
compare full aggregate in filter, but it's not possible on the level of 
one partition.
And you have this in plans

  Finalize GroupAggregate
    Output: pagg_tab.b, avg(pagg_tab.a), max(pagg_tab.a), count(*)
    Group Key: pagg_tab.b
    Filter: (sum(pagg_tab.a) < 700)
    ->  Sort
          Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)), (PARTIAL 
max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a))
          Sort Key: pagg_tab.b
          ->  Append
                ->  Foreign Scan
                      Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)), 
(PARTIAL max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a))
                      Filter: ((PARTIAL sum(pagg_tab.a)) < 700)   !!!! 
<--- here we can't compare anything yet, sum is incomplete.
                      Relations: Aggregate on (public.fpagg_tab_p1 
pagg_tab)
                      Remote SQL: SELECT b, avg_p_int4(a), max(a), 
count(*), sum(a) FROM public.pagg_tab_p1 GROUP BY 1
                ->  Foreign Scan
                      Output: pagg_tab_1.b, (PARTIAL avg(pagg_tab_1.a)), 
(PARTIAL max(pagg_tab_1.a)), (PARTIAL count(*)), (PARTIAL 
sum(pagg_tab_1.a))
                      Filter: ((PARTIAL sum(pagg_tab_1.a)) < 700)
                      Relations: Aggregate on (public.fpagg_tab_p2 
pagg_tab_1)
                      Remote SQL: SELECT b, avg_p_int4(a), max(a), 
count(*), sum(a) FROM public.pagg_tab_p2 GROUP BY 1
                ->  Foreign Scan
                      Output: pagg_tab_2.b, (PARTIAL avg(pagg_tab_2.a)), 
(PARTIAL max(pagg_tab_2.a)), (PARTIAL count(*)), (PARTIAL 
sum(pagg_tab_2.a))
                      Filter: ((PARTIAL sum(pagg_tab_2.a)) < 700)
                      Relations: Aggregate on (public.fpagg_tab_p3 
pagg_tab_2)
                      Remote SQL: SELECT b, avg_p_int4(a), max(a), 
count(*), sum(a) FROM public.pagg_tab_p3 GROUP BY 1

In foreign_grouping_ok()
6586                         if (IsA(expr, Aggref))
6587                         {
6588                                 if (partial)
6589                                 {
6590                                         mark_partial_aggref((Aggref 
*) expr, AGGSPLIT_INITIAL_SERIAL);
6591                                         continue;
6592                                 }
6593                                 else if (!is_foreign_expr(root, 
grouped_rel, expr))
6594                                         return false;
6595
6596                                 tlist = add_to_flat_tlist(tlist, 
list_make1(expr));
6597                         }

at least you shouldn't do anything with expr, if is_foreign_expr() 
returned false. If we restrict pushing down queries with havingQuals, 
I'm not quite sure how Aggref can appear in local_conds.

As for changes in planner.c (setGroupClausePartial()) I have several 
questions.

1) Why don't we add non_group_exprs to pathtarget->exprs when 
partial_target->exprs is not set?

2) We replace extra->partial_target->exprs with partial_target->exprs 
after processing. Why are we sure that after this tleSortGroupRef is 
correct?

-- 
Best regards,
Alexander Pyhalov,
Postgres Professional
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: inconsistency between the VM page visibility status and the visibility status of the page
Следующее
От: shveta malik
Дата:
Сообщение: Re: Synchronizing slots from primary to standby