Re: AtEOXact_Snapshot timing

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: AtEOXact_Snapshot timing
Дата
Msg-id CA+TgmobOry9CwwNcrO61uhYod6j8MgVueuRX1tkbryrPTytfHA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: AtEOXact_Snapshot timing  (Andres Freund <andres@anarazel.de>)
Ответы Re: AtEOXact_Snapshot timing  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Thu, Sep 5, 2019 at 5:32 PM Andres Freund <andres@anarazel.de> wrote:
> On 2019-09-05 14:50:50 -0400, Robert Haas wrote:
> > The best way that I've been able to come up with to enforce this rule
> > after a little bit of thought is to add Assert(IsTransactionState())
> > to a bunch of functions in snapmgr.c, most notably
> > GetTransactionSnapshot and GetCatalogSnapshot.
>
> Wonder if there's a risk that callers might still have a snapshot
> around, in independent memory?  It could make sense to add such an
> assert to a visibility routine or two, maybe?
>
> I suspect that we should add non-assert condition to a central place
> like GetSnapshotData(). It's not hard to imagine extensions just using
> that directly, and that we'd never notice that with assert only
> testing. It's also hard to imagine a single if
> (unlikely(IsTransactionState())) to be expensive enough to matter
> compared to GetSnapshotData()'s own cost.

I guess we could do that, but I feel like it might be overkill.  It
seems unlikely to me that an extension would call GetSnapshotData()
directly, but if it does, it's probably some kind of advanced wizardry
and I'm happy to trust that the extension author knows what she's
doing (or she can keep both pieces if it breaks). For my $0.02,
calling GetTransactionSnapshot() in an aborted transaction is the kind
of thing that's much more likely to be an unintentional goof.

> > @@ -2732,6 +2732,18 @@ AbortTransaction(void)
> >               pgstat_report_xact_timestamp(0);
> >       }
> >
> > +     /*
> > +      * Any snapshots taken by this transaction were unsafe to use even at the
> > +      * time when we entered AbortTransaction(), since we might have, for
> > +      * example, inserted a heap tuple and failed while inserting index tuples.
> > +      * They were even more unsafe after ProcArrayEndTransaction(), since after
> > +      * that point tuples we inserted could be pruned by other backends.
> > +      * However, we postpone the cleanup until this point in the sequence
> > +      * because it has to be done after ResourceOwnerRelease() has finishing
> > +      * unregistering snapshots.
> > +      */
> > +     AtEOXact_Snapshot(false, true);
> > +
> >       /*
> >        * State remains TRANS_ABORT until CleanupTransaction().
> >        */
>
> Hm. This means that
>                 if (is_parallel_worker)
>                         CallXactCallbacks(XACT_EVENT_PARALLEL_ABORT);
>                 else
>                         CallXactCallbacks(XACT_EVENT_ABORT);
> which, together with a few of the other functions, could plausibly try
> to use snapshot related logic, may end up continuing to use an existing
> snapshot without us detecting the problem? I think?  Especially with the
> asserts present ISTM that we really should kill the existing snapshots
> directly adjacent to ProcArrayEndTransaction(). As you say, after that
> the snapshots aren't correct anymore.  And with the right assertions we
> should be safe againsts accidental reintroduction of catalog access in
> the following code?

Well, I don't really see how to make those things precisely adjacent
without a lot more rejiggering of the code, and that doesn't seem
worth it to me.  I think the main risk here is that someone tries to
use a snapshot after AbortTransaction() and before
CleanupTransaction(), and that's what I want to try to block.  The
risk that somebody's going to try to use a snapshot within
AbortTransaction() seems lower to me, although obviously not zero.
You can't do anything that will plausibly fail in this code path, and
that's generally a tighter restriction than the one we're trying to
enforce here.

I agree with you that it would be nicer if we could put the killing of
the old snapshots directly adjacent to ProcArrayEndTransaction(), and
that was the first thing I tried, but it doesn't work, because
resource owner cleanup has to run first.  It's possible that we could
split AtEOXact_Snapshot() into two pieces that run at different times,
but I think we'd be guarding against a vulnerability that is mostly
theoretical, and I don't really see the point.  The only kind of
access that somebody's likely to attempt during AbortTransaction() is
catalog access, and that's likely to trip either the assertion I added
in GetCatalogSnapshot() or the existing one in SearchCatCache().
Non-catalog access would probably fail the assertion in
GetTransactionSnapshot() or GetLatestSnapshot().  I guess we should
probably add a similar check in GetActiveSnapshot(); the attached
patch adds that.

I might be missing something here, so feel free to point out to me if
there's a plausible coding pattern I'm missing where the things you
are proposing would actually be a problem.  To me, it just feels like
you're worried about a scenario that would require writing the code in
a very unnatural way, like saving a snapshot in a global variable or
calling GetSnapshotData() directly. The fact that there's no existing
code that does those things should be enough to warn you off of doing
them.  If it's not, I doubt that a mere assertion is going to stand in
your way...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: Planning counters in pg_stat_statements (using pgss_store)
Следующее
От: Alvaro Herrera from 2ndQuadrant
Дата:
Сообщение: Re: [HACKERS] CLUSTER command progress monitor