Обсуждение: CTE push down

Поиск
Список
Период
Сортировка

CTE push down

От
Alexander Pyhalov
Дата:
Hi.

Currently PostgreSQL supports CTE push down for SELECT statements, but 
it is implemented as turning each CTE reference into subquery.

When CTE is referenced multiple times, we have choice - to materialize 
CTE (and disable quals distribution to the CTE query) or inline it (and 
so run CTE query multiple times,
which can be inefficient, for example, when CTE references foreign 
tables).

I was looking if it is possible to collect quals referencing CTE, 
combine in OR qual and add them to CTE query.

So far I consider the following changes.

1) Modify SS_process_ctes() to add a list of RestrictInfo* to 
PlannerInfo - one NULL RestrictInfo pointer per CTE (let's call this 
list cte_restrictinfos for now)/
2) In distribute_restrictinfo_to_rels(), when we get rel of RTE_CTE 
relkind and sure that can safely pushdown restrictinfo, preserve 
restrictinfo in cte_restrictinfos, converting multiple restrictions to 
"OR" RestrictInfos.
3) In the end of subquery_planner() (after inheritance_planner() or  
grouping_planner()) we can check if cte_restrictinfos contain some 
non-null RestrictInfo pointers and recreate plan for corresponding CTEs, 
distributing quals to relations inside CTE queries.

For now I'm not sure how to handle vars mapping when we push 
restrictinfos to the level of cte root or when we push it down to the 
cte plan, but properly mapping vars seems seems to be doable.

Is there something else I miss?
Does somebody work on alternative solution or see issues in such 
approach?

-- 
Best regards,
Alexander Pyhalov,
Postgres Professional



Re: CTE push down

От
Ashutosh Bapat
Дата:
On Tue, Apr 13, 2021 at 6:58 PM Alexander Pyhalov
<a.pyhalov@postgrespro.ru> wrote:
>
> Hi.
>
> Currently PostgreSQL supports CTE push down for SELECT statements, but
> it is implemented as turning each CTE reference into subquery.
>
> When CTE is referenced multiple times, we have choice - to materialize
> CTE (and disable quals distribution to the CTE query) or inline it (and
> so run CTE query multiple times,
> which can be inefficient, for example, when CTE references foreign
> tables).
>
> I was looking if it is possible to collect quals referencing CTE,
> combine in OR qual and add them to CTE query.
>
> So far I consider the following changes.
>
> 1) Modify SS_process_ctes() to add a list of RestrictInfo* to
> PlannerInfo - one NULL RestrictInfo pointer per CTE (let's call this
> list cte_restrictinfos for now)/
> 2) In distribute_restrictinfo_to_rels(), when we get rel of RTE_CTE
> relkind and sure that can safely pushdown restrictinfo, preserve
> restrictinfo in cte_restrictinfos, converting multiple restrictions to
> "OR" RestrictInfos.
> 3) In the end of subquery_planner() (after inheritance_planner() or
> grouping_planner()) we can check if cte_restrictinfos contain some
> non-null RestrictInfo pointers and recreate plan for corresponding CTEs,
> distributing quals to relations inside CTE queries.
>
> For now I'm not sure how to handle vars mapping when we push
> restrictinfos to the level of cte root or when we push it down to the
> cte plan, but properly mapping vars seems seems to be doable.

I think similar mapping happens when we push quals that reference a
named JOIN down to join rels. I didn't take a look at it, but I think
it happens before planning time. But some similar machinary might help
in this case.

I believe step2 is needed to avoid materializing rows which will never
be selected. That would be a good improvement. However, care needs to
be taken for volatile quals. I think, the quals on CTE will be
evaluated twice, once when materializing the CTE result and second
time when scanning the materialized result. volatile quals may produce
different results when run multiple times.

>
> Is there something else I miss?
> Does somebody work on alternative solution or see issues in such
> approach?

IMO, a POC patch will help understand your idea.

-- 
Best Wishes,
Ashutosh Bapat



Re: CTE push down

От
Alexander Pyhalov
Дата:
Ashutosh Bapat писал 2021-04-14 16:01:
> On Tue, Apr 13, 2021 at 6:58 PM Alexander Pyhalov
> <a.pyhalov@postgrespro.ru> wrote:

> I believe step2 is needed to avoid materializing rows which will never
> be selected. That would be a good improvement. However, care needs to
> be taken for volatile quals. I think, the quals on CTE will be
> evaluated twice, once when materializing the CTE result and second
> time when scanning the materialized result. volatile quals may produce
> different results when run multiple times.
> 
>> 
>> Is there something else I miss?
>> Does somebody work on alternative solution or see issues in such
>> approach?
> 
> IMO, a POC patch will help understand your idea.

Hi.

I have a POC patch, which allows to distribute restrictinfos inside 
CTEs.
However, I found I can't efficiently do partition pruning.
When CTE replan stage happens, plans are already done. I can create 
alternative paths for relations,
for example, like in Try-prune-partitions patch.

However, new paths are not propagated to finalrel (UPPER_REL).
I'm not sure how to achieve this and need some advice.
Should we redo part of work, done by grouping_planner(), in the end of 
SS_replan_ctes()?
Should we rely on executor partition pruning (with current patches it 
doesn't work)?
Should we create init plans for ctes after grouping_planner(), not 
before?

-- 
Best regards,
Alexander Pyhalov,
Postgres Professional
Вложения