Re: ERROR during end-of-xact/FATAL

Поиск
Список
Период
Сортировка
От Noah Misch
Тема Re: ERROR during end-of-xact/FATAL
Дата
Msg-id 20131223195208.GA1446991@tornado.leadboat.com
обсуждение исходный текст
Ответ на Re: ERROR during end-of-xact/FATAL  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Fri, Nov 15, 2013 at 09:51:32AM -0500, Robert Haas wrote:
> On Wed, Nov 13, 2013 at 11:04 AM, Noah Misch <noah@leadboat.com> wrote:
> >> So, in short, ERROR + ERROR*10 = PANIC, but FATAL + ERROR*10 = FATAL.
> >> That's bizarre.
> >
> > Quite so.
> >
> >> Given that that's where we are, promoting an ERROR during FATAL
> >> processing to PANIC doesn't seem like it's losing much; we're
> >> essentially already doing that in the (probably more likely) case of a
> >> persistent ERROR during ERROR processing.  But since PANIC sucks, I'd
> >> rather go the other direction: let's make an ERROR during ERROR
> >> processing promote to FATAL.  And then let's do what you write above:
> >> make sure that there's a separate on-shmem-exit callback for each
> >> critical shared memory resource and that we call of those during FATAL
> >> processing.
> >
> > Many of the factors that can cause AbortTransaction() to fail can also cause
> > CommitTransaction() to fail, and those would still PANIC if the transaction
> > had an xid.  How practical might it be to also escape from an error during
> > CommitTransaction() with a FATAL instead of PANIC?  There's more to fix up in
> > that case (sinval, NOTIFY), but it may be within reach.  If such a technique
> > can only reasonably fix abort, though, I have doubts it buys us enough.
> 
> The critical stuff that's got to happen after
> RecordTransactionCommit() appears to be ProcArrayEndTransaction() and
> AtEOXact_Inval(). Unfortunately, the latter is well after the point
> when we're supposed to only be doing "non-critical resource cleanup",
> nonwithstanding which it appears to be critical.
> 
> So here's a sketch.  Hoist the preparatory logic in
> RecordTransactionCommit() - smgrGetPendingDeletes,
> xactGetCommittedChildren, and xactGetCommittedInvalidationMessages up
> into the caller and do it before setting TRANS_COMMIT.  If any of that
> stuff fails, we'll land in AbortTransaction() which must cope.  As
> soon as we exit the commit critical section, set a flag somewhere
> (where?) indicating that we have written our commit record; when that
> flag is set, (a) promote any ERROR after that point through the end of
> commit cleanup to FATAL and (b) if we enter AbortTransaction(), don't
> try to RecordTransactionAbort().
> 
> I can't see that the notification stuff requires fixup in this case;
> AFAICS, it is just adjusting backend-local state, and it's OK to
> disregard any problems there during a FATAL exit.  Do you see
> something to the contrary?

The part not to miss is SignalBackends() in ProcessCompletedNotifies().  It
happens outside CommitTransaction(), just before the backend goes idle.
Without that, listeners could miss our notification until some other NOTIFY
transaction signals them.  That said, since ProcessCompletedNotifies() already
accepts the possibility of failure, it would not be crazy to overlook this.

> But invalidation messages are a problem:
> if we commit and exit without sending our queued-up invalidation
> messages, Bad Things Will Happen.  Perhaps we could arrange things so
> that in that case only, we just PANIC.   That would allow most write
> transactions to get by with FATAL, promoting to PANIC only in the case
> of transactions that have modified system catalogs and only until the
> invalidations have actually been sent.  Avoiding the PANIC in that
> case seems to require some additional wizardry which is not entirely
> clear to me at this time.

Agreed.

> I think we'll have to approach the various problems in this area
> stepwise, or we'll never make any progress.

That's one option.  The larger point is that allowing ERROR-late-in-COMMIT and
FATAL + ERROR scenarios to get away with FATAL requires this sort of analysis
of every exit callback and all the later stages of CommitTransaction().  The
current bug level shows such analysis hasn't been happening, and the dearth of
bug reports suggests the scenarios are rare.  I don't think the project stands
to gain enough from the changes contemplated here and, perhaps more notably,
from performing such analysis during review of future patches that touch
CommitTransaction() and the exit sequence.  It remains my recommendation to
run proc_exit_prepare() and the post-CLOG actions of CommitTransaction() and
AbortTransaction() in critical sections, giving the scenarios blunt but
reliable treatment.  If reports of excess PANIC arise, the thing to fix is the
code causing the late error.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thom Brown
Дата:
Сообщение: Re: ALTER SYSTEM SET command to change postgresql.conf parameters
Следующее
От: Robert Haas
Дата:
Сообщение: Re: better atomics - v0.2