Re: [BUG] failed assertion in EnsurePortalSnapshotExists()

Поиск
Список
Период
Сортировка
От Ranier Vilela
Тема Re: [BUG] failed assertion in EnsurePortalSnapshotExists()
Дата
Msg-id CAEudQApS-u7PppZtp+3R-DWfT7HRHg=P4Lwy7f5E+p-_q-Jv=w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [BUG] failed assertion in EnsurePortalSnapshotExists()  ("Drouvot, Bertrand" <bdrouvot@amazon.com>)
Ответы Re: [BUG] failed assertion in EnsurePortalSnapshotExists()  ("Drouvot, Bertrand" <bdrouvot@amazon.com>)
Список pgsql-hackers
Em qua., 29 de set. de 2021 às 06:55, Drouvot, Bertrand <bdrouvot@amazon.com> escreveu:
Hi,

On 9/28/21 6:50 PM, Tom Lane wrote:
> "Drouvot, Bertrand" <bdrouvot@amazon.com> writes:
>> Does it make sense (as it is currently) to set the ActiveSnapshot to
>> NULL and not ensuring the same is done for ActivePortal->portalSnapshot?
> I think that patch is just a kluge :-(
right, sounded too simple to be "true".
>
> After tracing through this I've figured out what is happening, and
> why you need to put the exception block inside a loop to make it
> happen.  The first iteration goes fine, and we end up with an empty
> ActiveSnapshot stack (and no portalSnapshots) because that's how
> the COMMIT leaves it.  In the second iteration, we try to
> re-establish the portal snapshot, but at the point where
> EnsurePortalSnapshotExists is called (from the first INSERT
> command) we are already inside a subtransaction thanks to the
> plpgsql exception block.  This means that the stacked ActiveSnapshot
> has as_level = 2, although the Portal owning it belongs to the
> outer transaction.  So at the next exception, AtSubAbort_Snapshot
> zaps the stacked ActiveSnapshot, but the Portal stays alive and
> now it has a dangling portalSnapshot pointer.

Thanks for the explanation!

> Basically there seem to be two ways to fix this, both requiring
> API additions to snapmgr.c (hence, cc'ing Alvaro for opinion):
>
> 1. Provide a variant of PushActiveSnapshot that allows the caller
> to specify the as_level to use, and then have
> EnsurePortalSnapshotExists, as well as other places that create
> Portal-associated snapshots, use this with as_level equal to the
> Portal's createSubid.  The main hazard I see here is that this could
> theoretically allow the ActiveSnapshot stack to end up with
> out-of-order as_level values, causing issues.  For the moment we
> could probably just add assertions to check that the passed as_level
> is >= next level, or maybe even that this variant is only called with
> empty ActiveSnapshot stack.
I think we may get a non empty ActiveSnapshot stack with prepared
statements, so tempted to do the assertion on the levels.
>
> 2. Provide a way for AtSubAbort_Portals to detect whether a
> portalSnapshot pointer points to a snap that's going to be
> deleted by AtSubAbort_Snapshot, and then just have it clear any
> portalSnapshots that are about to become dangling.  (This'd amount
> to accepting the possibility that portalSnapshot is of a different
> subxact level from the portal, and dealing with the situation.)
>
> I initially thought #2 would be the way to go, but it turns out
> to be a bit messy since what we have is a Snapshot pointer not an
> ActiveSnapshotElt pointer.  We'd have to do something like search the
> ActiveSnapshot stack looking for pointer equality to the caller's
> Snapshot pointer, which seems fragile --- do we assume as_snap is
> unique for any other purpose?
>
> That being the case, I'm now leaning to #1.  Thoughts?

I'm also inclined to #1.
I have a stupid question, why duplicate PushActiveSnapshot?
Wouldn't one function be better?

PushActiveSnapshot(Snapshot snap, int as_level);

Sample calls:
PushActiveSnapshot(GetTransactionSnapshot(), GetCurrentTransactionNestLevel());
PushActiveSnapshot(queryDesc->snapshot, GetCurrentTransactionNestLevel());
PushActiveSnapshot(GetTransactionSnapshot(), portal->createSubid);

regards,
Ranier Vilela

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Failed transaction statistics to measure the logical replication progress
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: POC: Cleaning up orphaned files using undo logs