RE: Assertion failure in SnapBuildInitialSnapshot()

Поиск
Список
Период
Сортировка
От Hayato Kuroda (Fujitsu)
Тема RE: Assertion failure in SnapBuildInitialSnapshot()
Дата
Msg-id TYAPR01MB586690B7DE22E362A4A65F8CF5CD9@TYAPR01MB5866.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Assertion failure in SnapBuildInitialSnapshot()  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Assertion failure in SnapBuildInitialSnapshot()  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-hackers
Dear Amit, Sawada-san,

I have also reproduced the failure on PG15 with some debug log, and I agreed that
somebody changed procArray->replication_slot_xmin to InvalidTransactionId.

> > The same assertion failure has been reported on another thread[1].
> > Since I could reproduce this issue several times in my environment
> > I've investigated the root cause.
> >
> > I think there is a race condition of updating
> > procArray->replication_slot_xmin by CreateInitDecodingContext() and
> > LogicalConfirmReceivedLocation().
> >
> > What I observed in the test was that a walsender process called:
> > SnapBuildProcessRunningXacts()
> >   LogicalIncreaseXminForSlot()
> >     LogicalConfirmReceivedLocation()
> >       ReplicationSlotsComputeRequiredXmin(false).
> >
> > In ReplicationSlotsComputeRequiredXmin() it acquired the
> > ReplicationSlotControlLock and got 0 as the minimum xmin since there
> > was no wal sender having effective_xmin.
> >
> 
> What about the current walsender process which is processing
> running_xacts via SnapBuildProcessRunningXacts()? Isn't that walsender
> slot's effective_xmin have a non-zero value? If not, then why?

Normal walsenders which are not for tablesync create a replication slot with
NOEXPORT_SNAPSHOT option. I think in this case, CreateInitDecodingContext() is
called with need_full_snapshot = false, and slot->effective_xmin is not updated.
It is set as InvalidTransactionId at ReplicationSlotCreate() and no functions update
that. Hence the slot acquired by the walsender may have Invalid effective_min.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
Следующее
От: "Joel Jacobson"
Дата:
Сообщение: [PATCH] Fix old thinko in formula to compute sweight in numeric_sqrt().