Re: 024_add_drop_pub.pl might fail due to deadlock
От | vignesh C |
---|---|
Тема | Re: 024_add_drop_pub.pl might fail due to deadlock |
Дата | |
Msg-id | CALDaNm14FkrASB8jj27k6MSgrDpOJSZpVv=y=BHvhAoz5B7rNw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: 024_add_drop_pub.pl might fail due to deadlock (Ajin Cherian <itsajin@gmail.com>) |
Список | pgsql-hackers |
On Mon, 14 Jul 2025 at 15:46, Ajin Cherian <itsajin@gmail.com> wrote: > > On Tue, Jul 8, 2025 at 8:41 PM Ajin Cherian <itsajin@gmail.com> wrote: > > > > Patch with fix attached. > > I'll continue investigating whether this issue also affects HEAD. > > > > While debugging if this problem can occur on HEAD, I found out that on > head, it is mostly the tablesync worker that drops the origin on HEAD > and since the tablesysnc worker does not attempt to update the > SubscriptionRel state in that process, there doesn't seem to be the > possibility of a deadlock. But there is a rare situation where the > tablesync worker could crash or get an error just prior to dropping > the origin, then the origin is dropped in the apply worker (this is > explained in the comments in process_syncing_tables_for_sync()). If > the origin has to be dropped in the apply worker, then the same > deadlock can happen in HEAD code as well. I was able to simulate this > by using an injection point to create an error on the tablesync worker > and then the similar deadlock happens on HEAD as well. Attaching a > patch for fixing this on HEAD as well. I was able to reproduce the deadlock on HEAD as well using the attached patch, which introduces a delay in the tablesync worker before dropping the replication origin by adding a sleep of a few seconds. During this delay, the apply worker also attempts to drop the replication origin. If an ALTER SUBSCRIPTION command is executed concurrently, a deadlock frequently occurs: 2025-07-14 15:59:53.572 IST [141100] DETAIL: Process 141100 waits for AccessExclusiveLock on object 2 of class 6000 of database 0; blocked by process 140974. Process 140974 waits for AccessShareLock on object 16396 of class 6100 of database 0; blocked by process 141100. Process 141100: alter subscription sub1 drop publication pub1 Process 140974: <command string not enabled> After apply the attached patch, create the logical replication setup for a publication pub1 having table t1 and then run the following commands in a loop: alter subscription sub1 drop publication pub1; alter subscription sub1 add publication pub1; sleep 4 Regards, Vignesh
Вложения
В списке pgsql-hackers по дате отправления: