Re: Partial aggregates pushdown
От | Bruce Momjian |
---|---|
Тема | Re: Partial aggregates pushdown |
Дата | |
Msg-id | ZC95C0+PVhVP3iax@momjian.us обсуждение исходный текст |
Ответ на | RE: Partial aggregates pushdown ("Fujii.Yuki@df.MitsubishiElectric.co.jp" <Fujii.Yuki@df.MitsubishiElectric.co.jp>) |
Ответы |
RE: Partial aggregates pushdown
("Fujii.Yuki@df.MitsubishiElectric.co.jp" <Fujii.Yuki@df.MitsubishiElectric.co.jp>)
|
Список | pgsql-hackers |
On Fri, Mar 31, 2023 at 05:49:21AM +0000, Fujii.Yuki@df.MitsubishiElectric.co.jp wrote: > Hi Mr.Momjian > > > First, am I correct? > Yes, you are correct. This patch uses new special aggregate functions for partial aggregate > (then we call this partialaggfunc). First, my apologies for not addressing this sooner. I was so focused on my own tasks that I didn't realize this very important patch was not getting attention. I will try my best to get it into PG 17. What amazes me is that you didn't need to create _any_ actual aggregate functions. Rather, you just needed to hook existing functions into the aggregate tables for partial FDW execution. > > Second, how far away is this from being committable > > and/or what work needs to be done to get it committable, either for PG 16 or 17? > I believe there are three: 1. and 2. are not clear if they are necessary or not; 3. are clearly necessary. > I would like to hear the opinions of the development community on whether or not 1. and 2. need to be addressed. > > 1. Making partialaggfunc user-defined function > In v17, I make partialaggfuncs as built-in functions. > Because of this approach, v17 changes specification of BKI file and pg_aggregate. > For now, partialaggfuncs are needed by only postgres_fdw which is just an extension of PostgreSQL. > In the future, when revising the specifications for BKI files and pg_aggregate when modifying existing PostgreSQL functions, > It is necessary to align them with this patch's changes. > I am concerned that this may be undesirable. > So I am thinking that v17 should be modified to making partialaggfunc as user defined function. I think we have three possible cases for aggregates pushdown to FDWs: 1) Postgres built-in aggregate functions 2) Postgres user-defined & extension aggregate functions 3) aggregate functions calls to non-PG FDWs Your patch handles #1 by checking that the FDW Postgres version is the same as the calling Postgres version. However, it doesn't check for extension versions, and frankly, I don't see how we could implement that cleanly without significant complexity. I suggest we remove the version check requirement --- instead just document that the FDW Postgres version should be the same or newer than the calling Postgres server --- that way, we can assume that whatever is in the system catalogs of the caller is in the receiving side. We should add a GUC to turn off this optimization for cases where the FDW Postgres version is older than the caller. This handles case 1-2. For case 3, I don't even know how much pushdown those do of _any_ aggregates to non-PG servers, let along parallel FDW ones. Does anyone know the details? > 2. Automation of creating definition of partialaggfuncs > In development of v17, I manually create definition of partialaggfuncs for avg, min, max, sum, count. > I am concerned that this may be undesirable. > So I am thinking that v17 should be modified to automate creating definition of partialaggfuncs > for all built-in aggregate functions. Are there any other builtin functions that need this? I think we can just provide documention for extensions on how to do this. > 3. Documentation > I need add explanation of partialaggfunc to documents on postgres_fdw and other places. I can help with that once we decide on the above. I think 'partialaggfn' should be named 'aggpartialfn' to match other columns in pg_aggregate. I am confused by these changes to pg_aggegate: +{ aggfnoid => 'sum_p_int8', aggtransfn => 'int8_avg_accum', + aggfinalfn => 'int8_avg_serialize', aggcombinefn => 'int8_avg_combine', + aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'int8_avg_deserialize', + aggtranstype => 'internal', aggtransspace => '48' }, ... +{ aggfnoid => 'sum_p_numeric', aggtransfn => 'numeric_avg_accum', + aggfinalfn => 'numeric_avg_serialize', aggcombinefn => 'numeric_avg_combine', + aggserialfn => 'numeric_avg_serialize', + aggdeserialfn => 'numeric_avg_deserialize', + aggtranstype => 'internal', aggtransspace => '128' }, Why are these marked as 'sum' but use 'avg' functions? It would be good to explain exactly how this is diffent from background worker parallelism. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
Вложения
В списке pgsql-hackers по дате отправления: