Re: Assertion failure in SnapBuildInitialSnapshot()

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Assertion failure in SnapBuildInitialSnapshot()
Дата
Msg-id CAA4eK1LryoznTWB_Y7x=oXwpmJ-w4=3JHL8CRAHh0EuJuRUs3A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Assertion failure in SnapBuildInitialSnapshot()  (Andres Freund <andres@anarazel.de>)
Ответы Re: Assertion failure in SnapBuildInitialSnapshot()  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-hackers
On Wed, Feb 8, 2023 at 1:19 AM Andres Freund <andres@anarazel.de> wrote:
>
> On 2023-02-01 11:23:57 +0530, Amit Kapila wrote:
> > On Tue, Jan 31, 2023 at 6:08 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > Attached updated patches.
> > >
> >
> > Thanks, Andres, others, do you see a better way to fix this problem? I
> > have reproduced it manually and the steps are shared at [1] and
> > Sawada-San also reproduced it, see [2].
> >
> > [1] -
https://www.postgresql.org/message-id/CAA4eK1KDFeh%3DZbvSWPx%3Dir2QOXBxJbH0K8YqifDtG3xJENLR%2Bw%40mail.gmail.com
> > [2] -
https://www.postgresql.org/message-id/CAD21AoDKJBB6p4X-%2B057Vz44Xyc-zDFbWJ%2Bg9FL6qAF5PC2iFg%40mail.gmail.com
>
> Hm. It's worrysome to now hold ProcArrayLock exclusively while iterating over
> the slots. ReplicationSlotsComputeRequiredXmin() can be called at a
> non-neglegible frequency.  Callers like CreateInitDecodingContext(), that pass
> already_locked=true worry me a lot less, because obviously that's not a very
> frequent operation.
>
> This is particularly not great because we need to acquire
> ReplicationSlotControlLock while already holding ProcArrayLock.
>
>
> But clearly there's a pretty large hole in the lock protection right now. I'm
> a bit confused about why we (Robert and I, or just I) thought it's ok to do it
> this way.
>
>
> I wonder if we could instead invert the locks, and hold
> ReplicationSlotControlLock until after ProcArraySetReplicationSlotXmin(), and
> acquire ProcArrayLock just for ProcArraySetReplicationSlotXmin().
>

Along with inverting, doesn't this mean that we need to acquire
ReplicationSlotControlLock in Exclusive mode instead of acquiring it
in shared mode? My understanding of the above locking scheme is that
in CreateInitDecodingContext(), we acquire ReplicationSlotControlLock
in Exclusive mode before acquiring ProcArrayLock in Exclusive mode and
release it after releasing ProcArrayLock. Then,
ReplicationSlotsComputeRequiredXmin() acquires
ReplicationSlotControlLock in Exclusive mode only when already_locked
is false and releases it after a call to
ProcArraySetReplicationSlotXmin(). ProcArraySetReplicationSlotXmin()
won't change.

I don't think just inverting the order without changing the lock mode
will solve the problem because still apply worker will be able to
override the replication_slot_xmin value.

-- 
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: OpenSSL 3.0.0 vs old branches