RE: [PATCH] Reuse Workers and Replication Slots during Logical Replication

Поиск
Список
Период
Сортировка
От Zhijie Hou (Fujitsu)
Тема RE: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Дата
Msg-id OS0PR01MB5716C00FD3B8A0EFA51800909412A@OS0PR01MB5716.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication  (Melih Mutlu <m.melihmutlu@gmail.com>)
Ответы Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Thursday, August 3, 2023 7:30 PM Melih Mutlu <m.melihmutlu@gmail.com>  wrote:

> Right. I attached the v26 as you asked. 

Thanks for posting the patches.
 
While reviewing the patch, I noticed one rare case that it's possible that there
are two table sync worker for the same table in the same time.

The patch relies on LogicalRepWorkerLock to prevent concurrent access, but the
apply worker will start a new worker after releasing the lock. So, at the point[1]
where the lock is released and the new table sync worker has not been started,
it seems possible that another old table sync worker will be reused for the
same table.

                /* Now safe to release the LWLock */
                LWLockRelease(LogicalRepWorkerLock);
*[1]
                /*
                 * If there are free sync worker slot(s), start a new sync
                 * worker for the table.
                 */
                if (nsyncworkers < max_sync_workers_per_subscription)
                ...
                        logicalrep_worker_launch(MyLogicalRepWorker->dbid,

I can reproduce it by using gdb.

Steps:
1. set max_sync_workers_per_subscription to 1 and setup pub/sub which publishes
   two tables(table A and B).
2. when the table sync worker for the table A started, use gdb to block it
   before being reused for another table.
3. set max_sync_workers_per_subscription to 2 and use gdb to block the apply
   worker at the point after releasing the LogicalRepWorkerLock and before
   starting another table sync worker for table B.
4. release the blocked table sync worker, then we can see the table sync worker
   is also reused for table B.
5. release the apply worker, then we can see the apply worker will start
   another table sync worker for the same table(B).

I think it would be better to prevent this case from happening as this case
will give some unexpected ERROR or LOG. Note that I haven't checked if it would
cause worse problems like duplicate copy or others.

Best Regards,
Hou zj

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Smith
Дата:
Сообщение: Re: Handle infinite recursion in logical replication setup
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?