Re: Assertion failure in SnapBuildInitialSnapshot()

Поиск
Список
Период
Сортировка
От vignesh C
Тема Re: Assertion failure in SnapBuildInitialSnapshot()
Дата
Msg-id CALDaNm2sjbP9dNymzA5w=Xk33MDi+Ap7wj=fnktjxYZj3d_dxw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Assertion failure in SnapBuildInitialSnapshot()  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы Re: Assertion failure in SnapBuildInitialSnapshot()
Список pgsql-hackers
On Thu, 9 Feb 2023 at 12:02, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Feb 8, 2023 at 1:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Feb 8, 2023 at 1:19 AM Andres Freund <andres@anarazel.de> wrote:
> > >
> > > On 2023-02-01 11:23:57 +0530, Amit Kapila wrote:
> > > > On Tue, Jan 31, 2023 at 6:08 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > > > Attached updated patches.
> > > > >
> > > >
> > > > Thanks, Andres, others, do you see a better way to fix this problem? I
> > > > have reproduced it manually and the steps are shared at [1] and
> > > > Sawada-San also reproduced it, see [2].
> > > >
> > > > [1] -
https://www.postgresql.org/message-id/CAA4eK1KDFeh%3DZbvSWPx%3Dir2QOXBxJbH0K8YqifDtG3xJENLR%2Bw%40mail.gmail.com
> > > > [2] -
https://www.postgresql.org/message-id/CAD21AoDKJBB6p4X-%2B057Vz44Xyc-zDFbWJ%2Bg9FL6qAF5PC2iFg%40mail.gmail.com
> > >
> > > Hm. It's worrysome to now hold ProcArrayLock exclusively while iterating over
> > > the slots. ReplicationSlotsComputeRequiredXmin() can be called at a
> > > non-neglegible frequency.  Callers like CreateInitDecodingContext(), that pass
> > > already_locked=true worry me a lot less, because obviously that's not a very
> > > frequent operation.
> > >
> > > This is particularly not great because we need to acquire
> > > ReplicationSlotControlLock while already holding ProcArrayLock.
> > >
> > >
> > > But clearly there's a pretty large hole in the lock protection right now. I'm
> > > a bit confused about why we (Robert and I, or just I) thought it's ok to do it
> > > this way.
> > >
> > >
> > > I wonder if we could instead invert the locks, and hold
> > > ReplicationSlotControlLock until after ProcArraySetReplicationSlotXmin(), and
> > > acquire ProcArrayLock just for ProcArraySetReplicationSlotXmin().
> > >
> >
> > Along with inverting, doesn't this mean that we need to acquire
> > ReplicationSlotControlLock in Exclusive mode instead of acquiring it
> > in shared mode? My understanding of the above locking scheme is that
> > in CreateInitDecodingContext(), we acquire ReplicationSlotControlLock
> > in Exclusive mode before acquiring ProcArrayLock in Exclusive mode and
> > release it after releasing ProcArrayLock. Then,
> > ReplicationSlotsComputeRequiredXmin() acquires
> > ReplicationSlotControlLock in Exclusive mode only when already_locked
> > is false and releases it after a call to
> > ProcArraySetReplicationSlotXmin(). ProcArraySetReplicationSlotXmin()
> > won't change.
>
> I've attached the patch of this idea for discussion. In
> GetOldestSafeDecodingTransactionId() called by
> CreateInitDecodingContext(), we hold ReplicationSlotControlLock,
> ProcArrayLock, and XidGenLock at a time. So we would need to be
> careful about the ordering.

I have changed the status of the patch to "Waiting on Author" as
Robert's issues were not addressed yet. Feel free to change the status
accordingly after addressing them.

Regards,
Vignesh



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Banck
Дата:
Сообщение: Re: Set log_lock_waits=on by default
Следующее
От: Melanie Plageman
Дата:
Сообщение: Re: Show WAL write and fsync stats in pg_stat_io