Re: BUG #17072: Assert for clogGroupNext failed due to a race condition in TransactionGroupUpdateXidStatus()

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: BUG #17072: Assert for clogGroupNext failed due to a race condition in TransactionGroupUpdateXidStatus()
Дата
Msg-id CAA4eK1+yUrD=xxxRQWiH_dFo8go_W-R-C3FsbvwkPnMtdKe74A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #17072: Assert for clogGroupNext failed due to a race condition in TransactionGroupUpdateXidStatus()  (Alexander Lakhin <exclusion@gmail.com>)
Ответы Re: BUG #17072: Assert for clogGroupNext failed due to a race condition in TransactionGroupUpdateXidStatus()
Список pgsql-bugs
On Fri, Jun 25, 2021 at 4:30 PM Alexander Lakhin <exclusion@gmail.com> wrote:
>
> Hello Amit,
> 25.06.2021 12:55, Amit Kapila wrote:
> > On Fri, Jun 25, 2021 at 12:20 AM PG Bug reporting form
> > <noreply@postgresql.org> wrote:
> >> The offending (the one that leaved a "valid" clogGroupNext) proccess is
> >> 60d48c2d.ea21. It looks like it got from the
> >> pg_atomic_compare_exchange_u32() the nextidx value that was written in the
> >> clogGroupFirst by the process 60d48c2e.ebc5, and exited just after that.
> >>
> > Your analysis seems to be in the right direction. Can you try by
> > setting clogGroupNext to INVALID_PGPROCNO
> > (pg_atomic_write_u32(&proc->clogGroupNext, INVALID_PGPROCNO);) before
> > we return false in the first while(true) loop in function
> > TransactionGroupUpdateXidStatus()?
> With this modification that assert is not triggered, all 100 iterations
> pass fine (triple checked).
>

Okay, please find the patch for the same attached.

> > I think this should be reproducible on all branches from HEAD till
> > v11. Have you tried in any other branch? I'll also try to reproduce
> > it.
> I've reproduced it on REL_11_STABLE, REL_12_STABLE, REL_13_STABLE, and
> master.
>

Please see if you can verify whether the attached fixes it in all the
branches? I have also reproduced it in a bit different way by using a
debugger. Basically, by having three sessions trying to commit at the
same time. After the first session became the first group member,
allow the second session to check if it can become a member and stop
it via debugger just before it becomes the member. Then, allow the
first session to complete the transaction and allow the third session
to become the group leader (or first group member). After that when
the second session tries to become the member, it will notice that the
leader has changed and again try to become a member of the new leader.
Then, I forced via debugger to allow the second member to return false
and perform the commit by itself. Next, disconnect and connect again
in the second session and we will see assertion failure as reported by
you. The attached patch fixes the assertion failure.

-- 
With Regards,
Amit Kapila.

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Alexander Korotkov
Дата:
Сообщение: Re: BUG #17066: Cache lookup failed when null (iso-8859-1) is passed as anycompatiblemultirange
Следующее
От: Andrey Lepikhov
Дата:
Сообщение: Assertion on create index concurrently