Re: Partial aggregates pushdown

Поиск
Список
Период
Сортировка
От Alexander Pyhalov
Тема Re: Partial aggregates pushdown
Дата
Msg-id 31ab8f7b91cef6e837b3e31ea1e35a9a@postgrespro.ru
обсуждение исходный текст
Ответ на RE: Partial aggregates pushdown  ("Fujii.Yuki@df.MitsubishiElectric.co.jp" <Fujii.Yuki@df.MitsubishiElectric.co.jp>)
Ответы Re: Partial aggregates pushdown  (Alexander Pyhalov <a.pyhalov@postgrespro.ru>)
Список pgsql-hackers
Fujii.Yuki@df.MitsubishiElectric.co.jp писал(а) 2024-03-16 05:28:
> Hi. Mr.Pyhalov.
>> >>
>> >> I don't see it that way. What we would push to the foreign server
>> >> would be something like SELECT count(a) FROM t. Then, after we get
>> >> the results back and combine the various partial counts locally, we
>> >> would locally evaluate the HAVING clause afterward. That is, partial
>> >> aggregation is a barrier to pushing down HAVING clause itself, but it
>> >> doesn't preclude pushing down the aggregation.
>> > I have made modifications in the attached patch to ensure that when
>> > the HAVING clause is present, the HAVING clause is executed locally
>> > while the partial aggregations are pushed down.
>> >
>> >
>> 
>> Sorry, I don't see how it works. When we have partial aggregates and 
>> having clause, foreign_grouping_ok() returns false and
>> add_foreign_grouping_paths() adds no paths.
>> I'm not saying it's necessary to fix this in the first patch version.
> Our sincere apologies. I had attached an older version before this 
> modification.
> 

Hi.

In foreign_grouping_ok() having qual is added to local conds here:

6635                         if (is_foreign_expr(root, grouped_rel, 
expr) && !partial)
6636                                 fpinfo->remote_conds = 
lappend(fpinfo->remote_conds, rinfo);
6637                         else
6638                                 fpinfo->local_conds = 
lappend(fpinfo->local_conds, rinfo);
6639                 }
6640         }


This is incorrect. If you look at plan for query in postgres_fdw.sql


-- Partial aggregates are safe to push down when there is a HAVING 
clause
EXPLAIN (VERBOSE, COSTS OFF)
SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING 
sum(a) < 700 ORDER BY 1;
                                                                      
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------
  Finalize GroupAggregate
    Output: pagg_tab.b, avg(pagg_tab.a), max(pagg_tab.a), count(*)
    Group Key: pagg_tab.b
    Filter: (sum(pagg_tab.a) < 700)
    ->  Sort
          Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)), (PARTIAL 
max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a))
          Sort Key: pagg_tab.b
          ->  Append
                ->  Foreign Scan
                      Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)), 
(PARTIAL max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a))
                      Filter: ((sum(pagg_tab.a)) < 700)
                      Relations: Aggregate on (public.fpagg_tab_p1 
pagg_tab)
                      Remote SQL: SELECT b, avg(PARTIAL_AGGREGATE a), 
max(a), count(*), sum(a), sum(a) FROM public.pagg_tab_p1 GROUP BY 1
                ->  Foreign Scan
                      Output: pagg_tab_1.b, (PARTIAL avg(pagg_tab_1.a)), 
(PARTIAL max(pagg_tab_1.a)), (PARTIAL count(*)), (PARTIAL 
sum(pagg_tab_1.a))
                      Filter: ((sum(pagg_tab_1.a)) < 700)
                      Relations: Aggregate on (public.fpagg_tab_p2 
pagg_tab_1)
                      Remote SQL: SELECT b, avg(PARTIAL_AGGREGATE a), 
max(a), count(*), sum(a), sum(a) FROM public.pagg_tab_p2 GROUP BY 1
                ->  Foreign Scan
                      Output: pagg_tab_2.b, (PARTIAL avg(pagg_tab_2.a)), 
(PARTIAL max(pagg_tab_2.a)), (PARTIAL count(*)), (PARTIAL 
sum(pagg_tab_2.a))
                      Filter: ((sum(pagg_tab_2.a)) < 700)
                      Relations: Aggregate on (public.fpagg_tab_p3 
pagg_tab_2)
                      Remote SQL: SELECT b, avg(PARTIAL_AGGREGATE a), 
max(a), count(*), sum(a), sum(a) FROM public.pagg_tab_p3 GROUP BY 1


You can see that filter is applied before append. The result is correct 
only by chance, as sum in every partition is actually < 700. If you 
lower this bound, let's say, to 200, you'll start getting wrong results 
as data is filtered prior to aggregation.

It seems, however, that in partial case you should just avoid pulling 
conditions from having qual at all, all filters will be applied on upper 
level. Something like

diff --git a/contrib/postgres_fdw/postgres_fdw.c 
b/contrib/postgres_fdw/postgres_fdw.c
index 42eb17ae7c0..54918b9f1a4 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -6610,7 +6610,7 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo 
*grouped_rel,
          * Classify the pushable and non-pushable HAVING clauses and 
save them in
          * remote_conds and local_conds of the grouped rel's fpinfo.
          */
-       if (extra->havingQual)
+       if (extra->havingQual && !partial)
         {
                 foreach(lc, (List *) extra->havingQual)
                 {
@@ -6632,7 +6632,7 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo 
*grouped_rel,
                                                                          
  grouped_rel->relids,
                                                                          
  NULL,
                                                                          
  NULL);
-                       if (is_foreign_expr(root, grouped_rel, expr) && 
!partial)
+                       if (is_foreign_expr(root, grouped_rel, expr))
                                 fpinfo->remote_conds = 
lappend(fpinfo->remote_conds, rinfo);
                         else
                                 fpinfo->local_conds = 
lappend(fpinfo->local_conds, rinfo);

>> From: Alexander Pyhalov <a.pyhalov@postgrespro.ru>
>> Sent: Wednesday, February 28, 2024 10:43 PM
>> contrib/postgres_fdw/deparse.c: comment before appendFunctionName() 
>> has gone, this seems to be wrong.
> Fixed.
> 
>> From: Alexander Pyhalov <a.pyhalov@postgrespro.ru>
>> Sent: Wednesday, February 28, 2024 10:43 PM
>> In finalize_aggregate()
>> 
>> 1079         /*
>> 1080          * Apply the agg's finalfn if one is provided, else 
>> return
>> transValue.
>> 1081          */
>> 
>> Comment should be updated to note behavior for agg_partial aggregates.
> Fixed.

Comment in nodeAgg.c seems to be strange:

1079         /*
1080          * If the agg's finalfn is provided and PARTIAL_AGGREGATE 
keyword is
1081          * not specified, apply the agg's finalfn.
1082          * If PARTIAL_AGGREGATE keyword is specified and the 
transValue type
1083          * is internal, apply the agg's serialfn. In this case, if 
the agg's
1084          * serialfn must not be invalid. Otherwise return 
transValue.
1085          */

Likely, you mean:

... In this case the agg'ss serialfn must not be invalid...


Lower, in the same file, please, correct error message:

1136                 if(!OidIsValid(peragg->serialfn_oid))
1137                         elog(ERROR, "serialfunc is note provided 
for partial aggregate");

it should be "serialfunc is not provided for partial aggregate"

Also something is wrong with the following test :

  SELECT /* aggregate <> partial aggregate */
      array_agg(c_int4array), array_agg(b),
      avg(b::int2), avg(b::int4), avg(b::int8), avg(c_interval),
      avg(b::float4), avg(b::float8),
      corr(b::float8, (b * b)::float8),
      covar_pop(b::float8, (b * b)::float8),
      covar_samp(b::float8, (b * b)::float8),
      regr_avgx((2 * b)::float8, b::float8),
.....

Its results have changed since last patch. Do they depend on daylight 
saving time?

-- 
Best regards,
Alexander Pyhalov,
Postgres Professional



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Introduce XID age and inactive timeout based replication slot invalidation
Следующее
От: Alexander Korotkov
Дата:
Сообщение: Re: Add Index-level REINDEX with multiple jobs