Re: INSERT INTO SELECT, Why Parallelism is not selected?

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: INSERT INTO SELECT, Why Parallelism is not selected?
Дата
Msg-id CAA4eK1Ks8Sqs29VHPS6koNj5E9YQdkGCzgGsSrQMeUbQfe28yg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: INSERT INTO SELECT, Why Parallelism is not selected?  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Список pgsql-hackers
On Tue, Aug 18, 2020 at 1:37 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Jul 14, 2020 at 1:20 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Mon, Jul 13, 2020 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > I think we can do more than this by
> > > parallelizing the Insert part of this query as well as we have lifted
> > > group locking restrictions related to RelationExtension and Page lock
> > > in PG13.  It would be really cool to do that unless we see any
> > > fundamental problems with it.
> >
> > +1, I think it will be cool to support for insert part here as well as
> > insert part in 'Create Table As Select..' as well.
> >
>
> +1 to parallelize inserts. Currently, ExecInsert() and CTAS use
> table_tuple_insert(), if we parallelize these parts, each worker will
> be inserting it's tuples(one tuple at a time) into the same data page,
> until space is available, if not a new data page can be obtained by
> any of the worker, others might start inserting into it. This way,
> will there be lock contention on data pages?
>

It is possible but we need to check how much that is a bottleneck
because that should not be a big part of the operation. And, it won't
be any worse than inserts via multiple backends. I think it is
important to do that way, otherwise, some of the pages can remain
half-empty.

Right now, the plan for Insert ... Select is like
Insert on <tbl_x>
   ->  Seq Scan on <tbl_y>
         ....

In the above the scan could be index scan as well. What we want is:
Gather
  -> Insert on <tbl_x>
       ->  Seq Scan on <tbl_y>
         ....

>. Do we also need to make
> inserts to use table_multi_insert() (like the way "COPY" uses) instead
> of table_tuple_insert()?
>

I am not sure at this stage but if it turns out to be a big problem
then we might think of inventing some way to allow individual workers
to operate on different pages. I think even without that we should be
able to make a big gain as reads, filtering, etc can be done in
parallel.

-- 
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: use pg_get_functiondef() in pg_dump
Следующее
От: Fujii Masao
Дата:
Сообщение: Re: Creating a function for exposing memory usage of backend process