Обсуждение: BUG #6218: TRAP: FailedAssertion("!(owner->nsnapshots == 0)", File: "resowner.c", Line: 365)

Поиск
Список
Период
Сортировка

BUG #6218: TRAP: FailedAssertion("!(owner->nsnapshots == 0)", File: "resowner.c", Line: 365)

От
"YAMAMOTO Takashi"
Дата:
The following bug has been logged online:

Bug reference:      6218
Logged by:          YAMAMOTO Takashi
Email address:      yamt@mwd.biglobe.ne.jp
PostgreSQL version: 9.2
Operating system:   NetBSD
Description:        TRAP: FailedAssertion("!(owner->nsnapshots == 0)", File:
"resowner.c", Line: 365)
Details:

9.2devel (2562dcea811eb642e1c5442e1ede9fe268278157)

ERROR:  could not serialize access due to concurrent update
STATEMENT:  UPDATE file SET atime = current_timestamp WHERE fileid = $1
TRAP: FailedAssertion("!(owner->nsnapshots == 0)", File: "resowner.c", Line:
365)
LOG:  server process (PID 16832) was terminated by signal 6: Abort trap
LOG:  terminating any other active server processes


Core was generated by `postgres'.
Program terminated with signal 6, Aborted.
#0  0xbbb9c547 in _lwp_kill () from /usr/lib/libc.so.12
(gdb) bt
#0  0xbbb9c547 in _lwp_kill () from /usr/lib/libc.so.12
#1  0xbbb9c503 in raise (s=6) at /siro/nbsd/src/lib/libc/gen/raise.c:48
#2  0xbbb9bced in abort () at /siro/nbsd/src/lib/libc/stdlib/abort.c:74
#3  0x08400d8b in ExceptionalCondition (
    conditionName=0x85a28f3 "!(owner->nsnapshots == 0)",
    errorType=0x85a280b "FailedAssertion", fileName=0x85a2800 "resowner.c",
    lineNumber=365) at assert.c:57
#4  0x084272a7 in ResourceOwnerDelete (owner=0x99be23f0) at resowner.c:365
#5  0x080d96f3 in CleanupTransaction () at xact.c:2374
#6  0x080d98d9 in CommitTransactionCommand () at xact.c:2538
#7  0x08318802 in finish_xact_command () at postgres.c:2378
#8  0x08317c19 in exec_execute_message (portal_name=0x99b2f01c "",
    max_rows=2147483647) at postgres.c:1915
#9  0x0831a6cf in PostgresMain (argc=2, argv=0xbb9117b0,
    username=0xbb911704 "takashi") at postgres.c:3909
#10 0x082c47b0 in BackendRun (port=0xbb93d0f0) at postmaster.c:3512
#11 0x082c3e12 in BackendStartup (port=0xbb93d0f0) at postmaster.c:3197
#12 0x082c0e94 in ServerLoop () at postmaster.c:1333
#13 0xbb93d0f0 in ?? ()
#14 0x085f8850 in UnBlockSig ()
#15 0x00000000 in ?? ()
(gdb) fr 4
#4  0x084272a7 in ResourceOwnerDelete (owner=0x99be23f0) at resowner.c:365
365             Assert(owner->nsnapshots == 0);
(gdb) p *owner
$1 = {parent = 0x0, firstchild = 0x99b2c174, nextchild = 0x0,
  name = 0x844bfb2 "TopTransaction", nbuffers = 0, buffers = 0x0,
  maxbuffers = 0, ncatrefs = 0, catrefs = 0x99b85f88, maxcatrefs = 16,
  ncatlistrefs = 0, catlistrefs = 0x0, maxcatlistrefs = 0, nrelrefs = 0,
  relrefs = 0x0, maxrelrefs = 0, nplanrefs = 0, planrefs = 0x0,
  maxplanrefs = 0, ntupdescs = 0, tupdescs = 0x0, maxtupdescs = 0,
  nsnapshots = 1, snapshots = 0xbb9f4fa4, maxsnapshots = 16, nfiles = 0,
  files = 0x0, maxfiles = 0}
(gdb) p *owner->snapshots[0]
$7 = {satisfies = 0x8435138 <HeapTupleSatisfiesMVCC>, xmin = 991, xmax =
991,
  xcnt = 0, xip = 0x0, subxcnt = 0, subxip = 0x0, suboverflowed = 0 '\0',
  takenDuringRecovery = 0 '\0', curcid = 1, active_count = 0, regd_count =
1,
  copied = 1 '\001'}
(gdb) p CurrentTransactionState
$8 = (TransactionState) 0x85bea00
(gdb) p *CurrentTransactionState
$9 = {transactionId = 990, subTransactionId = 1, name = 0x0,
  savepointLevel = 0, state = TRANS_ABORT, blockState = TBLOCK_ABORT_END,
  nestingLevel = 1, gucNestLevel = 1, curTransactionContext = 0xbb912a40,
  curTransactionOwner = 0x99be23f0, childXids = 0x0, nChildXids = 0,
  maxChildXids = 0, prevUser = 10, prevSecContext = 0,
  prevXactReadOnly = 0 '\0', startedInRecovery = 0 '\0', parent = 0x0}
(gdb) p debug_query_string
$12 = 0x99b2f428 "ROLLBACK"
(gdb)

Re: BUG #6218: TRAP: FailedAssertion("!(owner->nsnapshots == 0)", File: "resowner.c", Line: 365)

От
Tom Lane
Дата:
"YAMAMOTO Takashi" <yamt@mwd.biglobe.ne.jp> writes:
> 9.2devel (2562dcea811eb642e1c5442e1ede9fe268278157)

> ERROR:  could not serialize access due to concurrent update
> STATEMENT:  UPDATE file SET atime = current_timestamp WHERE fileid = $1
> TRAP: FailedAssertion("!(owner->nsnapshots == 0)", File: "resowner.c", Line:
> 365)
> LOG:  server process (PID 16832) was terminated by signal 6: Abort trap

There isn't terribly much we can do with this report unless you can
provide a complete test case to reproduce it.

            regards, tom lane

Re: BUG #6218: TRAP: FailedAssertion( "!(owner->nsnapshots == 0)", File: "resowner.c", Line: 365)

От
yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
Дата:
hi,

> "YAMAMOTO Takashi" <yamt@mwd.biglobe.ne.jp> writes:
>> 9.2devel (2562dcea811eb642e1c5442e1ede9fe268278157)
>
>> ERROR:  could not serialize access due to concurrent update
>> STATEMENT:  UPDATE file SET atime = current_timestamp WHERE fileid = $1
>> TRAP: FailedAssertion("!(owner->nsnapshots == 0)", File: "resowner.c", Line:
>> 365)
>> LOG:  server process (PID 16832) was terminated by signal 6: Abort trap
>
> There isn't terribly much we can do with this report unless you can
> provide a complete test case to reproduce it.
>
>             regards, tom lane

after some investigation, i think it is triggered by protocol-level prepare
of ROLLBACK in an aborted transaction.  does the following patch make sense?

YAMAMOTO Takashi

diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 9cccc87..d1da7df 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -520,6 +520,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource)
      * don't have invalidatable plans, so we'd not get here for such a
      * command.
      */
+    Assert(analyze_requires_snapshot(plansource->raw_parse_tree));
     snapshot_set = false;
     if (!ActiveSnapshotSet())
     {
@@ -757,13 +758,11 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,

     /*
      * If a snapshot is already set (the normal case), we can just use
-     * that for parsing/planning.  But if it isn't, install one.  Note: no
-     * point in checking whether parse analysis requires a snapshot;
-     * utility commands don't have invalidatable plans, so we'd not get
-     * here for such a command.
+     * that for parsing/planning.  But if it isn't, install one.
      */
     snapshot_set = false;
-    if (!ActiveSnapshotSet())
+    if (!ActiveSnapshotSet() &&
+        analyze_requires_snapshot(plansource->raw_parse_tree))
     {
         PushActiveSnapshot(GetTransactionSnapshot());
         snapshot_set = true;

Re: BUG #6218: TRAP: FailedAssertion( "!(owner->nsnapshots == 0)", File: "resowner.c", Line: 365)

От
Tom Lane
Дата:
yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes:
>> There isn't terribly much we can do with this report unless you can
>> provide a complete test case to reproduce it.

> after some investigation, i think it is triggered by protocol-level prepare
> of ROLLBACK in an aborted transaction.  does the following patch make sense?

Maybe, but I'd still like to see a test case, because I can't reproduce
any such problem by preparing ROLLBACK in an aborted transaction.

            regards, tom lane

Re: BUG #6218: TRAP: FailedAssertion( "!(owner->nsnapshots == 0)", File: "resowner.c", Line: 365)

От
yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
Дата:
hi,

> yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes:
>>> There isn't terribly much we can do with this report unless you can
>>> provide a complete test case to reproduce it.
>
>> after some investigation, i think it is triggered by protocol-level prepare
>> of ROLLBACK in an aborted transaction.  does the following patch make sense?
>
> Maybe, but I'd still like to see a test case, because I can't reproduce
> any such problem by preparing ROLLBACK in an aborted transaction.

reading GetTransactionSnapshot, it seems that the problem happens
only with IsolationUsesXactSnapshot() true.
the attached program triggers the assertion failure reliably for me.

YAMAMOTO Takashi

>
>             regards, tom lane

Re: BUG #6218: TRAP: FailedAssertion( "!(owner->nsnapshots == 0)", File: "resowner.c", Line: 365)

От
Tom Lane
Дата:
yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes:
>> Maybe, but I'd still like to see a test case, because I can't reproduce
>> any such problem by preparing ROLLBACK in an aborted transaction.

> reading GetTransactionSnapshot, it seems that the problem happens
> only with IsolationUsesXactSnapshot() true.

Hmm.  I'm inclined to think that this demonstrates a bug in snapshot
management, not so much in plancache.  We have plancache doing

        PushActiveSnapshot(GetTransactionSnapshot());

and then later

        PopActiveSnapshot();

and at this point surely it is not plancache's fault if there is any
remaining refcount for the snapshot.  There is, though, because
GetTransactionSnapshot saved a refcount in TopTransactionResourceOwner.
I think it's snapmgr.c's responsibility to make sure that that's cleaned
up, and it's not doing so.

The place where that refcount normally gets dropped is
AtEarlyCommit_Snapshot, but that isn't going to be called at all in
aborted-transaction cleanup.  Worse, if we just transposed it over to be
called in a place in AbortTransaction comparable to where it's called
during commit, that still wouldn't fix the problem, because when the
ROLLBACK happens, we've already aborted the transaction.

I think that AtEarlyCommit_Snapshot is misdesigned, and that far from
being done "early" in commit/abort, it needs to be done "late", like
somewhere not very long before the
ResourceOwnerDelete(TopTransactionResourceOwner) calls.  There is no
very good reason to think that someone might not ask for a snapshot
during commit processing.

Alvaro, do you happen to remember why this got designed as an "early"
transaction shutdown action, rather than delaying it as long as
possible?

            regards, tom lane

Re: BUG #6218: TRAP: FailedAssertion( "!(owner->nsnapshots == 0)", File: "resowner.c", Line: 365)

От
Alvaro Herrera
Дата:
Excerpts from Tom Lane's message of lun sep 26 13:26:37 -0300 2011:
>
> yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes:
> >> Maybe, but I'd still like to see a test case, because I can't reproduce
> >> any such problem by preparing ROLLBACK in an aborted transaction.
>
> > reading GetTransactionSnapshot, it seems that the problem happens
> > only with IsolationUsesXactSnapshot() true.
>
> Hmm.  I'm inclined to think that this demonstrates a bug in snapshot
> management, not so much in plancache.  We have plancache doing
>
>         PushActiveSnapshot(GetTransactionSnapshot());
>
> and then later
>
>         PopActiveSnapshot();
>
> and at this point surely it is not plancache's fault if there is any
> remaining refcount for the snapshot.  There is, though, because
> GetTransactionSnapshot saved a refcount in TopTransactionResourceOwner.
> I think it's snapmgr.c's responsibility to make sure that that's cleaned
> up, and it's not doing so.

Agreed.

> The place where that refcount normally gets dropped is
> AtEarlyCommit_Snapshot, but that isn't going to be called at all in
> aborted-transaction cleanup.  Worse, if we just transposed it over to be
> called in a place in AbortTransaction comparable to where it's called
> during commit, that still wouldn't fix the problem, because when the
> ROLLBACK happens, we've already aborted the transaction.

... ouch.

> I think that AtEarlyCommit_Snapshot is misdesigned, and that far from
> being done "early" in commit/abort, it needs to be done "late", like
> somewhere not very long before the
> ResourceOwnerDelete(TopTransactionResourceOwner) calls.  There is no
> very good reason to think that someone might not ask for a snapshot
> during commit processing.
>
> Alvaro, do you happen to remember why this got designed as an "early"
> transaction shutdown action, rather than delaying it as long as
> possible?

As far as I remember, the only principle was that it had to run before
ResourceOwner cleanup.  Commit 7b640b0345dc4fbd39ff568700985b432f6afa07
introduces that "early" call; ResOwner support had been introduced 10
days before in 6bbef4e5383c99d93aa974e2c79d328cfbd1c4a9.  I probably
just tried it out and noticed that resowner.c complained if I didn't
drop the refcount prior to its own cleanup.

I don't think I ever considered the scenario of calls in aborted
transactions.

Shall I work on a fix?  I expect you are plenty busy with commitfest
stuff, but please let me know otherwise.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: BUG #6218: TRAP: FailedAssertion( "!(owner->nsnapshots == 0)", File: "resowner.c", Line: 365)

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Excerpts from Tom Lane's message of lun sep 26 13:26:37 -0300 2011:
>> The place where that refcount normally gets dropped is
>> AtEarlyCommit_Snapshot, but that isn't going to be called at all in
>> aborted-transaction cleanup.  Worse, if we just transposed it over to be
>> called in a place in AbortTransaction comparable to where it's called
>> during commit, that still wouldn't fix the problem, because when the
>> ROLLBACK happens, we've already aborted the transaction.

> ... ouch.

> Shall I work on a fix?  I expect you are plenty busy with commitfest
> stuff, but please let me know otherwise.

I have what-I-think-is-the-fix pretty clear in my own mind, so let me
give it a try.  If it doesn't work I'll bounce it back to you.

            regards, tom lane