Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION

Поиск
Список
Период
Сортировка
От Petr Jelinek
Тема Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
Дата
Msg-id dd53e623-901c-3d64-0631-41b619596380@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION  (Fujii Masao <masao.fujii@gmail.com>)
Ответы Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION  (Fujii Masao <masao.fujii@gmail.com>)
Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 03/02/17 19:38, Fujii Masao wrote:
> On Sat, Feb 4, 2017 at 12:49 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Fri, Feb 3, 2017 at 10:56 AM, Kyotaro HORIGUCHI
>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>> At Fri, 3 Feb 2017 01:02:47 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in
<CAHGQGwHqQVHmQ7wM=eLNnp1_oy-GVSSAcaJXWjE4nc2twSqXRg@mail.gmail.com>
>>>> On Thu, Feb 2, 2017 at 2:36 PM, Michael Paquier
>>>> <michael.paquier@gmail.com> wrote:
>>>>> On Thu, Feb 2, 2017 at 2:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>>> Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
>>>>>>> Then, the reason for the TRY-CATCH cluase is that I found that
>>>>>>> some functions called from there can throw exceptions.
>>>>>>
>>>>>> Yes, but all LWLocks should be released by normal error recovery.
>>>>>> It should not be necessary for this code to clean that up by hand.
>>>>>> If it were necessary, there would be TRY-CATCH around every single
>>>>>> LWLockAcquire in the backend, and we'd have an unreadable and
>>>>>> unmaintainable system.  Please don't add a TRY-CATCH unless it's
>>>>>> *necessary* -- and you haven't explained why this one is.
>>>>
>>>> Yes.
>>>
>>> Thank you for the suggestion. I minunderstood that.
>>>
>>>>> Putting hands into the code and at the problem, I can see that
>>>>> dropping a subscription on a node makes it unresponsive in case of a
>>>>> stop. And that's just because calls to LWLockRelease are missing as in
>>>>> the patch attached. A try/catch problem should not be necessary.
>>>>
>>>> Thanks for the patch!
>>>>
>>>> With the patch, LogicalRepLauncherLock is released at the end of
>>>> DropSubscription(). But ISTM that the lock should be released just after
>>>> logicalrep_worker_stop() and there is no need to protect the removal of
>>>> replication slot with the lock.
>>>
>>> That's true. logicalrep_worker_stop returns after confirmig that
>>> worker->proc is cleard, so no false relaunch cannot be caused.
>>> After all, logicalrep_worker_stop is surrounded by
>>> LWLockAcquire/Relase pair. So it can be moved into the funciton
>>> and make the lock secrion to be more narrower.
> 
> If we do this, Assert(LWLockHeldByMe(LogicalRepLauncherLock)) should be
> removed and the comment for logicalrep_worker_stop() should be updated.
> 
> Your approach may cause the deadlock. The launcher takes LogicalRepWorkerLock
> while holding LogicalRepLauncherLock. OTOH, with your approach,
> logicalrep_worker_stop() takes LogicalRepLauncherLock while holding
> LogicalRepWorkerLock.
> 
> Therefore I pushed the simple patch which adds LWLockRelease() just after
> logicalrep_worker_stop().
> 
> Another problem that I found while reading the code is that the launcher can
> start up the worker with the subscription that DROP SUBSCRIPTION just removed.
> That is, DROP SUBSCRIPTION removes the target entry from pg_subscription,
> but the launcher can see it and start new worker until the transaction for
> DROP has been committed.
> 

That was the reason why DropSubscription didn't release the lock in the
first place. It was supposed to be released at the end of the
transaction though.

> To fix this issue, I think that DROP SUBSCRIPTION should take
> AccessExclusiveLock on pg_subscription, instead of RowExclusiveLock,
> so that the launcher cannot see the entry to be being removed.
> 

The whole point of having LogicalRepLauncherLock is to avoid having to
do this, so if we do this we could probably get rid of it.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fujii Masao
Дата:
Сообщение: Re: [HACKERS] Provide list of subscriptions and publications inpsql's completion
Следующее
От: Petr Jelinek
Дата:
Сообщение: Re: [HACKERS] Logical Replication and Character encoding