RE: Perform streaming logical transactions by background workers and parallel apply

Поиск
Список
Период
Сортировка
От houzj.fnst@fujitsu.com
Тема RE: Perform streaming logical transactions by background workers and parallel apply
Дата
Msg-id OS0PR01MB5716511AC4851592868E2D0594979@OS0PR01MB5716.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Perform streaming logical transactions by background workers and parallel apply  (Dilip Kumar <dilipbalaut@gmail.com>)
Ответы RE: Perform streaming logical transactions by background workers and parallel apply  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Список pgsql-hackers
On Tuesday, July 26, 2022 5:34 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> On Tue, Jul 26, 2022 at 2:30 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Jul 22, 2022 at 8:27 AM wangw.fnst@fujitsu.com
> > <wangw.fnst@fujitsu.com> wrote:
> > >
> > > On Tues, Jul 19, 2022 at 10:29 AM I wrote:
> > > > Attach the news patches.
> > >
> > > Not able to apply patches cleanly because the change in HEAD
> (366283961a).
> > > Therefore, I rebased the patch based on the changes in HEAD.
> > >
> > > Attach the new patches.
> >
> > +    /* Check the foreign keys. */
> > +    fkeys = RelationGetFKeyList(entry->localrel);
> > +    if (fkeys)
> > +        entry->parallel_apply = PARALLEL_APPLY_UNSAFE;
> >
> > So if there is a foreign key on any of the tables which are parts of a
> > subscription then we do not allow changes for that subscription to be
> > applied in parallel?  I think this is a big limitation because having
> > foreign key on the table is very normal right?  I agree that if we
> > allow them then there could be failure due to out of order apply
> > right? but IMHO we should not put the restriction instead let it fail
> > if there is ever such conflict.  Because if there is a conflict the
> > transaction will be sent again.  Do we see that there could be wrong
> > or inconsistent results if we allow such things to be executed in
> > parallel.  If not then IMHO just to avoid some corner case failure we
> > are restricting very normal cases.
> 
> some more comments..
> 1.
> +            /*
> +             * If we have found a free worker or if we are already
> applying this
> +             * transaction in an apply background worker, then we
> pass the data to
> +             * that worker.
> +             */
> +            if (first_segment)
> +                apply_bgworker_send_data(stream_apply_worker, s->len,
> + s->data);
> 
> Comment says that if we have found a free worker or we are already applying in
> the worker then pass the changes to the worker but actually as per the code
> here we are only passing in case of first_segment?
> 
> I think what you are trying to say is that if it is first segment then send the
> 
> 2.
> +        /*
> +         * This is the main apply worker. Check if there is any free apply
> +         * background worker we can use to process this transaction.
> +         */
> +        if (first_segment)
> +            stream_apply_worker = apply_bgworker_start(stream_xid);
> +        else
> +            stream_apply_worker = apply_bgworker_find(stream_xid);
> 
> So currently, whenever we get a new streamed transaction we try to start a new
> background worker for that.  Why do we need to start/close the background
> apply worker every time we get a new streamed transaction.  I mean we can
> keep the worker in the pool for time being and if there is a new transaction
> looking for a worker then we can find from that.  Starting a worker is costly
> operation and since we are using parallelism for this mean we are expecting
> that there would be frequent streamed transaction needing parallel apply
> worker so why not to let it wait for a certain amount of time so that if load is low
> it will anyway stop and if the load is high it will be reused for next streamed
> transaction.

It seems the function name was a bit mislead. Currently, the started apply
bgworker won't exit after applying the transaction. And the
apply_bgworker_start will first try to choose a free worker. It will start a
new worker only if no free worker is available.

> 3.
> Why are we restricting parallel apply workers only for the streamed
> transactions, because streaming depends upon the size of the logical decoding
> work mem so making steaming and parallel apply tightly coupled seems too
> restrictive to me.  Do we see some obvious problems in applying other
> transactions in parallel?

We thought there could be some conflict failure and deadlock if we parallel
apply normal transaction which need transaction dependency check[1]. But I will do
some more research for this and share the result soon.

[1] https://www.postgresql.org/message-id/CAA4eK1%2BwyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw%40mail.gmail.com

Best regards,
Hou zj

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dilip Kumar
Дата:
Сообщение: Re: making relfilenodes 56 bits
Следующее
От: Matthias van de Meent
Дата:
Сообщение: Re: [PATCH] Compression dictionaries for JSONB