Обсуждение: CTE push down
Hi. Currently PostgreSQL supports CTE push down for SELECT statements, but it is implemented as turning each CTE reference into subquery. When CTE is referenced multiple times, we have choice - to materialize CTE (and disable quals distribution to the CTE query) or inline it (and so run CTE query multiple times, which can be inefficient, for example, when CTE references foreign tables). I was looking if it is possible to collect quals referencing CTE, combine in OR qual and add them to CTE query. So far I consider the following changes. 1) Modify SS_process_ctes() to add a list of RestrictInfo* to PlannerInfo - one NULL RestrictInfo pointer per CTE (let's call this list cte_restrictinfos for now)/ 2) In distribute_restrictinfo_to_rels(), when we get rel of RTE_CTE relkind and sure that can safely pushdown restrictinfo, preserve restrictinfo in cte_restrictinfos, converting multiple restrictions to "OR" RestrictInfos. 3) In the end of subquery_planner() (after inheritance_planner() or grouping_planner()) we can check if cte_restrictinfos contain some non-null RestrictInfo pointers and recreate plan for corresponding CTEs, distributing quals to relations inside CTE queries. For now I'm not sure how to handle vars mapping when we push restrictinfos to the level of cte root or when we push it down to the cte plan, but properly mapping vars seems seems to be doable. Is there something else I miss? Does somebody work on alternative solution or see issues in such approach? -- Best regards, Alexander Pyhalov, Postgres Professional
On Tue, Apr 13, 2021 at 6:58 PM Alexander Pyhalov <a.pyhalov@postgrespro.ru> wrote: > > Hi. > > Currently PostgreSQL supports CTE push down for SELECT statements, but > it is implemented as turning each CTE reference into subquery. > > When CTE is referenced multiple times, we have choice - to materialize > CTE (and disable quals distribution to the CTE query) or inline it (and > so run CTE query multiple times, > which can be inefficient, for example, when CTE references foreign > tables). > > I was looking if it is possible to collect quals referencing CTE, > combine in OR qual and add them to CTE query. > > So far I consider the following changes. > > 1) Modify SS_process_ctes() to add a list of RestrictInfo* to > PlannerInfo - one NULL RestrictInfo pointer per CTE (let's call this > list cte_restrictinfos for now)/ > 2) In distribute_restrictinfo_to_rels(), when we get rel of RTE_CTE > relkind and sure that can safely pushdown restrictinfo, preserve > restrictinfo in cte_restrictinfos, converting multiple restrictions to > "OR" RestrictInfos. > 3) In the end of subquery_planner() (after inheritance_planner() or > grouping_planner()) we can check if cte_restrictinfos contain some > non-null RestrictInfo pointers and recreate plan for corresponding CTEs, > distributing quals to relations inside CTE queries. > > For now I'm not sure how to handle vars mapping when we push > restrictinfos to the level of cte root or when we push it down to the > cte plan, but properly mapping vars seems seems to be doable. I think similar mapping happens when we push quals that reference a named JOIN down to join rels. I didn't take a look at it, but I think it happens before planning time. But some similar machinary might help in this case. I believe step2 is needed to avoid materializing rows which will never be selected. That would be a good improvement. However, care needs to be taken for volatile quals. I think, the quals on CTE will be evaluated twice, once when materializing the CTE result and second time when scanning the materialized result. volatile quals may produce different results when run multiple times. > > Is there something else I miss? > Does somebody work on alternative solution or see issues in such > approach? IMO, a POC patch will help understand your idea. -- Best Wishes, Ashutosh Bapat
Ashutosh Bapat писал 2021-04-14 16:01: > On Tue, Apr 13, 2021 at 6:58 PM Alexander Pyhalov > <a.pyhalov@postgrespro.ru> wrote: > I believe step2 is needed to avoid materializing rows which will never > be selected. That would be a good improvement. However, care needs to > be taken for volatile quals. I think, the quals on CTE will be > evaluated twice, once when materializing the CTE result and second > time when scanning the materialized result. volatile quals may produce > different results when run multiple times. > >> >> Is there something else I miss? >> Does somebody work on alternative solution or see issues in such >> approach? > > IMO, a POC patch will help understand your idea. Hi. I have a POC patch, which allows to distribute restrictinfos inside CTEs. However, I found I can't efficiently do partition pruning. When CTE replan stage happens, plans are already done. I can create alternative paths for relations, for example, like in Try-prune-partitions patch. However, new paths are not propagated to finalrel (UPPER_REL). I'm not sure how to achieve this and need some advice. Should we redo part of work, done by grouping_planner(), in the end of SS_replan_ctes()? Should we rely on executor partition pruning (with current patches it doesn't work)? Should we create init plans for ctes after grouping_planner(), not before? -- Best regards, Alexander Pyhalov, Postgres Professional