Re: Parallel Inserts in CREATE TABLE AS
От | Zhihong Yu |
---|---|
Тема | Re: Parallel Inserts in CREATE TABLE AS |
Дата | |
Msg-id | CALNJ-vTMt4AhR61TrQ9VYUeMQHRLoPR=KFMZRehT0Lv+-agigg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Parallel Inserts in CREATE TABLE AS (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Ответы |
Re: Parallel Inserts in CREATE TABLE AS
(Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
|
Список | pgsql-hackers |
Hi,
+ if (!OidIsValid(col->collOid) &&
+ type_is_collatable(col->typeName->typeOid))
+ ereport(ERROR,
+ type_is_collatable(col->typeName->typeOid))
+ ereport(ERROR,
...
+ attrList = lappend(attrList, col);
Should attrList be freed when ereport is called ?
+ query->CTASParallelInsInfo &= CTAS_PARALLEL_INS_UNDEF;
Since CTAS_PARALLEL_INS_UNDEF is 0, isn't the above equivalent to assigning the value of 0 ?
Cheers
On Wed, Dec 9, 2020 at 5:43 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Wed, Dec 9, 2020 at 10:16 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Dec 8, 2020 at 6:24 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> >
> > > > I'm not quite sure how to address this. Can we not allow the planner
> > > > to consider that the select is for CTAS and check only after the
> > > > planning is done for the Gather node and other checks?
> > > >
> > >
> > > IIUC, you are saying that we should not influence the cost of gather node
> > > even when the insertion would be done by workers? I think that should be
> > > our fallback option anyway but that might miss some paths to be considered
> > > parallel where the cost becomes more due to parallel_tuple_cost (aka tuple
> > > transfer cost). I think the idea is we can avoid the tuple transfer cost
> > > only when Gather is the top node because only at that time we can push
> > > insertion down, right? How about if we have some way to detect the same
> > > before calling generate_useful_gather_paths()? I think when we are calling
> > > apply_scanjoin_target_to_paths() in grouping_planner(), if the
> > > query_level is 1, it is for CTAS, and it doesn't have a chance to create
> > > UPPER_REL (doesn't have grouping, order, limit, etc clause) then we can
> > > probably assume that the Gather will be top_node. I am not sure about this
> > > but I think it is worth exploring.
> > >
> >
> > I took a look at the parallel insert patch and have the same idea.
> > https://commitfest.postgresql.org/31/2844/
> >
> > * Consider generating Gather or Gather Merge paths. We must only do this
> > * if the relation is parallel safe, and we don't do it for child rels to
> > * avoid creating multiple Gather nodes within the same plan. We must do
> > * this after all paths have been generated and before set_cheapest, since
> > * one of the generated paths may turn out to be the cheapest one.
> > */
> > if (rel->consider_parallel && !IS_OTHER_REL(rel))
> > generate_useful_gather_paths(root, rel, false);
> >
> > IMO Gatherpath created here seems the right one which can possible ignore parallel cost if in CTAS.
> > But We need check the following parse option which will create path to be the parent of Gatherpath here.
> >
> > if (root->parse->rowMarks)
> > if (limit_needed(root->parse))
> > if (root->parse->sortClause)
> > if (root->parse->distinctClause)
> > if (root->parse->hasWindowFuncs)
> > if (root->parse->groupClause || root->parse->groupingSets || root->parse->hasAggs || root->root->hasHavingQual)
> >
>
> Yeah, and as I pointed earlier, along with this we also need to
> consider that the RelOptInfo must be the final target(top level rel).
>
Attaching v10 patch set that includes the change suggested above for
ignoring parallel tuple cost and also few more test cases. I split the
patch as per Amit's suggestion. v10-0001 contains parallel inserts
code without planner tuple cost changes and test cases. v10-0002 has
required changes for ignoring planner tuple cost calculations.
Please review it further.
After the review and addressing all the comments, I plan to make some
code common so that it can be used for Parallel Inserts in REFRESH
MATERIALIZED VIEW. Thoughts?
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления:
Следующее
От: "tsunakawa.takay@fujitsu.com"Дата:
Сообщение: RE: Parallel INSERT (INTO ... SELECT ...)