Re: [BUG] failed assertion in EnsurePortalSnapshotExists()

Поиск
Список
Период
Сортировка
От Drouvot, Bertrand
Тема Re: [BUG] failed assertion in EnsurePortalSnapshotExists()
Дата
Msg-id 71af37bc-fc25-ed86-acf2-176d153ffb9b@amazon.com
обсуждение исходный текст
Ответ на Re: [BUG] failed assertion in EnsurePortalSnapshotExists()  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [BUG] failed assertion in EnsurePortalSnapshotExists()  (Ranier Vilela <ranier.vf@gmail.com>)
Список pgsql-hackers
Hi,

On 9/28/21 6:50 PM, Tom Lane wrote:
> "Drouvot, Bertrand" <bdrouvot@amazon.com> writes:
>> Does it make sense (as it is currently) to set the ActiveSnapshot to
>> NULL and not ensuring the same is done for ActivePortal->portalSnapshot?
> I think that patch is just a kluge :-(
right, sounded too simple to be "true".
>
> After tracing through this I've figured out what is happening, and
> why you need to put the exception block inside a loop to make it
> happen.  The first iteration goes fine, and we end up with an empty
> ActiveSnapshot stack (and no portalSnapshots) because that's how
> the COMMIT leaves it.  In the second iteration, we try to
> re-establish the portal snapshot, but at the point where
> EnsurePortalSnapshotExists is called (from the first INSERT
> command) we are already inside a subtransaction thanks to the
> plpgsql exception block.  This means that the stacked ActiveSnapshot
> has as_level = 2, although the Portal owning it belongs to the
> outer transaction.  So at the next exception, AtSubAbort_Snapshot
> zaps the stacked ActiveSnapshot, but the Portal stays alive and
> now it has a dangling portalSnapshot pointer.

Thanks for the explanation!

> Basically there seem to be two ways to fix this, both requiring
> API additions to snapmgr.c (hence, cc'ing Alvaro for opinion):
>
> 1. Provide a variant of PushActiveSnapshot that allows the caller
> to specify the as_level to use, and then have
> EnsurePortalSnapshotExists, as well as other places that create
> Portal-associated snapshots, use this with as_level equal to the
> Portal's createSubid.  The main hazard I see here is that this could
> theoretically allow the ActiveSnapshot stack to end up with
> out-of-order as_level values, causing issues.  For the moment we
> could probably just add assertions to check that the passed as_level
> is >= next level, or maybe even that this variant is only called with
> empty ActiveSnapshot stack.
I think we may get a non empty ActiveSnapshot stack with prepared 
statements, so tempted to do the assertion on the levels.
>
> 2. Provide a way for AtSubAbort_Portals to detect whether a
> portalSnapshot pointer points to a snap that's going to be
> deleted by AtSubAbort_Snapshot, and then just have it clear any
> portalSnapshots that are about to become dangling.  (This'd amount
> to accepting the possibility that portalSnapshot is of a different
> subxact level from the portal, and dealing with the situation.)
>
> I initially thought #2 would be the way to go, but it turns out
> to be a bit messy since what we have is a Snapshot pointer not an
> ActiveSnapshotElt pointer.  We'd have to do something like search the
> ActiveSnapshot stack looking for pointer equality to the caller's
> Snapshot pointer, which seems fragile --- do we assume as_snap is
> unique for any other purpose?
>
> That being the case, I'm now leaning to #1.  Thoughts?

I'm also inclined to #1.

Please find attached a patch proposal for #1 that also adds a new test.

Thanks

Bertrand

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Added schema level support for publication.
Следующее
От: Juan José Santamaría Flecha
Дата:
Сообщение: Re: stat() vs ERROR_DELETE_PENDING, round N + 1