Re: Parallel Inserts in CREATE TABLE AS

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Parallel Inserts in CREATE TABLE AS
Дата
Msg-id CAA4eK1KdsYKdzGXocEkNoZ=meExf9NZA7DdoBZd323605b-YaQ@mail.gmail.com
обсуждение исходный текст
Ответ на RE: Parallel Inserts in CREATE TABLE AS  ("Hou, Zhijie" <houzj.fnst@cn.fujitsu.com>)
Ответы Re: Parallel Inserts in CREATE TABLE AS  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Список pgsql-hackers
On Mon, Dec 7, 2020 at 11:32 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> Hi
>
> +       /*
> +        * Flag to let the planner know that the SELECT query is for CTAS. This is
> +        * used to calculate the tuple transfer cost from workers to gather node(in
> +        * case parallelism kicks in for the SELECT part of the CTAS), to zero as
> +        * each worker will insert its share of tuples in parallel.
> +        */
> +       if (IsParallelInsertInCTASAllowed(into, NULL))
> +               query->isForCTAS = true;
>
>
> +       /*
> +        * We do not compute the parallel_tuple_cost for CTAS because the number of
> +        * tuples that are transferred from workers to the gather node is zero as
> +        * each worker, in parallel, inserts the tuples that are resulted from its
> +        * chunk of plan execution. This change may make the parallel plan cheap
> +        * among all other plans, and influence the planner to consider this
> +        * parallel plan.
> +        */
> +       if (!(root->parse->isForCTAS &&
> +               root->query_level == 1))
> +               run_cost += parallel_tuple_cost * path->path.rows;
>
> I noticed that the parallel_tuple_cost will still be ignored,
> When Gather is not the top node.
>
> Example:
>         Create table test(i int);
>         insert into test values(generate_series(1,10000000,1));
>         explain create table ntest3 as select * from test where i < 200 limit 10000;
>
>                                   QUERY PLAN
> -------------------------------------------------------------------------------
>  Limit  (cost=1000.00..97331.33 rows=1000 width=4)
>    ->  Gather  (cost=1000.00..97331.33 rows=1000 width=4)
>          Workers Planned: 2
>          ->  Parallel Seq Scan on test  (cost=0.00..96331.33 rows=417 width=4)
>                Filter: (i < 200)
>
>
> The isForCTAS will be true because [create table as], the
> query_level is always 1 because there is no subquery.
> So even if gather is not the top node, parallel cost will still be ignored.
>
> Is that works as expected ?
>

I don't think that is expected and is not the case without this patch.
The cost shouldn't be changed for existing cases where the write is
not pushed to workers.

-- 
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Greg Nancarrow
Дата:
Сообщение: Re: Parallel INSERT (INTO ... SELECT ...)
Следующее
От: "Hou, Zhijie"
Дата:
Сообщение: RE: Parallel copy