Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Дата
Msg-id CAA4eK1+BmkT3jjUnejOqNFW_78D5PiRHNNAwLmfBSb7cj8QSrg@mail.gmail.com
обсуждение исходный текст
Ответ на RE: [PATCH] Reuse Workers and Replication Slots during Logical Replication  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Ответы Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication  (vignesh C <vignesh21@gmail.com>)
Список pgsql-hackers
On Wed, Aug 9, 2023 at 8:28 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Thursday, August 3, 2023 7:30 PM Melih Mutlu <m.melihmutlu@gmail.com>  wrote:
>
> > Right. I attached the v26 as you asked.
>
> Thanks for posting the patches.
>
> While reviewing the patch, I noticed one rare case that it's possible that there
> are two table sync worker for the same table in the same time.
>
> The patch relies on LogicalRepWorkerLock to prevent concurrent access, but the
> apply worker will start a new worker after releasing the lock. So, at the point[1]
> where the lock is released and the new table sync worker has not been started,
> it seems possible that another old table sync worker will be reused for the
> same table.
>
>                                 /* Now safe to release the LWLock */
>                                 LWLockRelease(LogicalRepWorkerLock);
> *[1]
>                                 /*
>                                  * If there are free sync worker slot(s), start a new sync
>                                  * worker for the table.
>                                  */
>                                 if (nsyncworkers < max_sync_workers_per_subscription)
>                                 ...
>                                                 logicalrep_worker_launch(MyLogicalRepWorker->dbid,
>

Yeah, this is a problem. I think one idea to solve this is by
extending the lock duration till we launch the tablesync worker but we
should also consider changing this locking scheme such that there is a
better way to indicate that for a particular rel, tablesync is in
progress. Currently, the code in TablesyncWorkerMain() also acquires
the lock in exclusive mode even though the tablesync for a rel is in
progress which I guess could easily heart us for larger values of
max_logical_replication_workers. So, that could be another motivation
to think for a different locking scheme.

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Incorrect handling of OOM in WAL replay leading to data loss
Следующее
От: Pavel Luzanov
Дата:
Сообщение: Re: PG 16 draft release notes ready