Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
От | Masahiko Sawada |
---|---|
Тема | Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION |
Дата | |
Msg-id | CAD21AoAHMoTL_9CEtKKne_AqoOF304k5mR7PpyxghrnGeGTH_A@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION (Fujii Masao <masao.fujii@gmail.com>) |
Ответы |
Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
|
Список | pgsql-hackers |
On Wed, Feb 8, 2017 at 12:04 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Tue, Feb 7, 2017 at 2:13 AM, Petr Jelinek > <petr.jelinek@2ndquadrant.com> wrote: >> On 06/02/17 17:33, Fujii Masao wrote: >>> On Sun, Feb 5, 2017 at 5:11 AM, Petr Jelinek >>> <petr.jelinek@2ndquadrant.com> wrote: >>>> On 03/02/17 19:38, Fujii Masao wrote: >>>>> On Sat, Feb 4, 2017 at 12:49 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>>>> On Fri, Feb 3, 2017 at 10:56 AM, Kyotaro HORIGUCHI >>>>>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >>>>>>> At Fri, 3 Feb 2017 01:02:47 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in <CAHGQGwHqQVHmQ7wM=eLNnp1_oy-GVSSAcaJXWjE4nc2twSqXRg@mail.gmail.com> >>>>>>>> On Thu, Feb 2, 2017 at 2:36 PM, Michael Paquier >>>>>>>> <michael.paquier@gmail.com> wrote: >>>>>>>>> On Thu, Feb 2, 2017 at 2:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>>>>>>>> Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: >>>>>>>>>>> Then, the reason for the TRY-CATCH cluase is that I found that >>>>>>>>>>> some functions called from there can throw exceptions. >>>>>>>>>> >>>>>>>>>> Yes, but all LWLocks should be released by normal error recovery. >>>>>>>>>> It should not be necessary for this code to clean that up by hand. >>>>>>>>>> If it were necessary, there would be TRY-CATCH around every single >>>>>>>>>> LWLockAcquire in the backend, and we'd have an unreadable and >>>>>>>>>> unmaintainable system. Please don't add a TRY-CATCH unless it's >>>>>>>>>> *necessary* -- and you haven't explained why this one is. >>>>>>>> >>>>>>>> Yes. >>>>>>> >>>>>>> Thank you for the suggestion. I minunderstood that. >>>>>>> >>>>>>>>> Putting hands into the code and at the problem, I can see that >>>>>>>>> dropping a subscription on a node makes it unresponsive in case of a >>>>>>>>> stop. And that's just because calls to LWLockRelease are missing as in >>>>>>>>> the patch attached. A try/catch problem should not be necessary. >>>>>>>> >>>>>>>> Thanks for the patch! >>>>>>>> >>>>>>>> With the patch, LogicalRepLauncherLock is released at the end of >>>>>>>> DropSubscription(). But ISTM that the lock should be released just after >>>>>>>> logicalrep_worker_stop() and there is no need to protect the removal of >>>>>>>> replication slot with the lock. >>>>>>> >>>>>>> That's true. logicalrep_worker_stop returns after confirmig that >>>>>>> worker->proc is cleard, so no false relaunch cannot be caused. >>>>>>> After all, logicalrep_worker_stop is surrounded by >>>>>>> LWLockAcquire/Relase pair. So it can be moved into the funciton >>>>>>> and make the lock secrion to be more narrower. >>>>> >>>>> If we do this, Assert(LWLockHeldByMe(LogicalRepLauncherLock)) should be >>>>> removed and the comment for logicalrep_worker_stop() should be updated. >>>>> >>>>> Your approach may cause the deadlock. The launcher takes LogicalRepWorkerLock >>>>> while holding LogicalRepLauncherLock. OTOH, with your approach, >>>>> logicalrep_worker_stop() takes LogicalRepLauncherLock while holding >>>>> LogicalRepWorkerLock. >>>>> >>>>> Therefore I pushed the simple patch which adds LWLockRelease() just after >>>>> logicalrep_worker_stop(). >>>>> >>>>> Another problem that I found while reading the code is that the launcher can >>>>> start up the worker with the subscription that DROP SUBSCRIPTION just removed. >>>>> That is, DROP SUBSCRIPTION removes the target entry from pg_subscription, >>>>> but the launcher can see it and start new worker until the transaction for >>>>> DROP has been committed. >>>>> >>>> >>>> That was the reason why DropSubscription didn't release the lock in the >>>> first place. It was supposed to be released at the end of the >>>> transaction though. >>> >>> OK, I understood why you used the lock in that way. But using LWLock >>> for that purpose is not valid. >>> >> >> Yeah, I just tried to avoid what we are doing now really hard :) >> >>>>> To fix this issue, I think that DROP SUBSCRIPTION should take >>>>> AccessExclusiveLock on pg_subscription, instead of RowExclusiveLock, >>>>> so that the launcher cannot see the entry to be being removed. >>>>> >>>> >>>> The whole point of having LogicalRepLauncherLock is to avoid having to >>>> do this, so if we do this we could probably get rid of it. >>> >>> Yes, let's remove LogicalRepLauncherLock and lock pg_subscription >>> with AccessExclusive mode at the beginning of DROP SUBSCRIPTION. >>> Attached patch does this. >>> >> >> Okay, looks reasonable to me. > > Thanks for the review! > But ISMT that I should suspend committing the patch until we fix the issue > that Sawada reported in other thread. That bugfix may change the related > code and design very much. > https://www.postgresql.org/message-id/CAD21AoD+VO93zZ4ZQtZQb-jZ_wMko3OgGdx1MXO4T+8q_zHDDA@mail.gmail.com > That patch has been committed. And this issue still happens. Should we add this to the open item list so it doesn't get missed? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Amit KapilaДата:
Сообщение: Re: [HACKERS] Parallel seq. plan is not coming against inheritance orpartition table