Обсуждение: BUG #6218: TRAP: FailedAssertion("!(owner->nsnapshots == 0)", File: "resowner.c", Line: 365)
BUG #6218: TRAP: FailedAssertion("!(owner->nsnapshots == 0)", File: "resowner.c", Line: 365)
От
"YAMAMOTO Takashi"
Дата:
The following bug has been logged online: Bug reference: 6218 Logged by: YAMAMOTO Takashi Email address: yamt@mwd.biglobe.ne.jp PostgreSQL version: 9.2 Operating system: NetBSD Description: TRAP: FailedAssertion("!(owner->nsnapshots == 0)", File: "resowner.c", Line: 365) Details: 9.2devel (2562dcea811eb642e1c5442e1ede9fe268278157) ERROR: could not serialize access due to concurrent update STATEMENT: UPDATE file SET atime = current_timestamp WHERE fileid = $1 TRAP: FailedAssertion("!(owner->nsnapshots == 0)", File: "resowner.c", Line: 365) LOG: server process (PID 16832) was terminated by signal 6: Abort trap LOG: terminating any other active server processes Core was generated by `postgres'. Program terminated with signal 6, Aborted. #0 0xbbb9c547 in _lwp_kill () from /usr/lib/libc.so.12 (gdb) bt #0 0xbbb9c547 in _lwp_kill () from /usr/lib/libc.so.12 #1 0xbbb9c503 in raise (s=6) at /siro/nbsd/src/lib/libc/gen/raise.c:48 #2 0xbbb9bced in abort () at /siro/nbsd/src/lib/libc/stdlib/abort.c:74 #3 0x08400d8b in ExceptionalCondition ( conditionName=0x85a28f3 "!(owner->nsnapshots == 0)", errorType=0x85a280b "FailedAssertion", fileName=0x85a2800 "resowner.c", lineNumber=365) at assert.c:57 #4 0x084272a7 in ResourceOwnerDelete (owner=0x99be23f0) at resowner.c:365 #5 0x080d96f3 in CleanupTransaction () at xact.c:2374 #6 0x080d98d9 in CommitTransactionCommand () at xact.c:2538 #7 0x08318802 in finish_xact_command () at postgres.c:2378 #8 0x08317c19 in exec_execute_message (portal_name=0x99b2f01c "", max_rows=2147483647) at postgres.c:1915 #9 0x0831a6cf in PostgresMain (argc=2, argv=0xbb9117b0, username=0xbb911704 "takashi") at postgres.c:3909 #10 0x082c47b0 in BackendRun (port=0xbb93d0f0) at postmaster.c:3512 #11 0x082c3e12 in BackendStartup (port=0xbb93d0f0) at postmaster.c:3197 #12 0x082c0e94 in ServerLoop () at postmaster.c:1333 #13 0xbb93d0f0 in ?? () #14 0x085f8850 in UnBlockSig () #15 0x00000000 in ?? () (gdb) fr 4 #4 0x084272a7 in ResourceOwnerDelete (owner=0x99be23f0) at resowner.c:365 365 Assert(owner->nsnapshots == 0); (gdb) p *owner $1 = {parent = 0x0, firstchild = 0x99b2c174, nextchild = 0x0, name = 0x844bfb2 "TopTransaction", nbuffers = 0, buffers = 0x0, maxbuffers = 0, ncatrefs = 0, catrefs = 0x99b85f88, maxcatrefs = 16, ncatlistrefs = 0, catlistrefs = 0x0, maxcatlistrefs = 0, nrelrefs = 0, relrefs = 0x0, maxrelrefs = 0, nplanrefs = 0, planrefs = 0x0, maxplanrefs = 0, ntupdescs = 0, tupdescs = 0x0, maxtupdescs = 0, nsnapshots = 1, snapshots = 0xbb9f4fa4, maxsnapshots = 16, nfiles = 0, files = 0x0, maxfiles = 0} (gdb) p *owner->snapshots[0] $7 = {satisfies = 0x8435138 <HeapTupleSatisfiesMVCC>, xmin = 991, xmax = 991, xcnt = 0, xip = 0x0, subxcnt = 0, subxip = 0x0, suboverflowed = 0 '\0', takenDuringRecovery = 0 '\0', curcid = 1, active_count = 0, regd_count = 1, copied = 1 '\001'} (gdb) p CurrentTransactionState $8 = (TransactionState) 0x85bea00 (gdb) p *CurrentTransactionState $9 = {transactionId = 990, subTransactionId = 1, name = 0x0, savepointLevel = 0, state = TRANS_ABORT, blockState = TBLOCK_ABORT_END, nestingLevel = 1, gucNestLevel = 1, curTransactionContext = 0xbb912a40, curTransactionOwner = 0x99be23f0, childXids = 0x0, nChildXids = 0, maxChildXids = 0, prevUser = 10, prevSecContext = 0, prevXactReadOnly = 0 '\0', startedInRecovery = 0 '\0', parent = 0x0} (gdb) p debug_query_string $12 = 0x99b2f428 "ROLLBACK" (gdb)
"YAMAMOTO Takashi" <yamt@mwd.biglobe.ne.jp> writes: > 9.2devel (2562dcea811eb642e1c5442e1ede9fe268278157) > ERROR: could not serialize access due to concurrent update > STATEMENT: UPDATE file SET atime = current_timestamp WHERE fileid = $1 > TRAP: FailedAssertion("!(owner->nsnapshots == 0)", File: "resowner.c", Line: > 365) > LOG: server process (PID 16832) was terminated by signal 6: Abort trap There isn't terribly much we can do with this report unless you can provide a complete test case to reproduce it. regards, tom lane
Re: BUG #6218: TRAP: FailedAssertion( "!(owner->nsnapshots == 0)", File: "resowner.c", Line: 365)
От
yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
Дата:
hi, > "YAMAMOTO Takashi" <yamt@mwd.biglobe.ne.jp> writes: >> 9.2devel (2562dcea811eb642e1c5442e1ede9fe268278157) > >> ERROR: could not serialize access due to concurrent update >> STATEMENT: UPDATE file SET atime = current_timestamp WHERE fileid = $1 >> TRAP: FailedAssertion("!(owner->nsnapshots == 0)", File: "resowner.c", Line: >> 365) >> LOG: server process (PID 16832) was terminated by signal 6: Abort trap > > There isn't terribly much we can do with this report unless you can > provide a complete test case to reproduce it. > > regards, tom lane after some investigation, i think it is triggered by protocol-level prepare of ROLLBACK in an aborted transaction. does the following patch make sense? YAMAMOTO Takashi diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c index 9cccc87..d1da7df 100644 --- a/src/backend/utils/cache/plancache.c +++ b/src/backend/utils/cache/plancache.c @@ -520,6 +520,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource) * don't have invalidatable plans, so we'd not get here for such a * command. */ + Assert(analyze_requires_snapshot(plansource->raw_parse_tree)); snapshot_set = false; if (!ActiveSnapshotSet()) { @@ -757,13 +758,11 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist, /* * If a snapshot is already set (the normal case), we can just use - * that for parsing/planning. But if it isn't, install one. Note: no - * point in checking whether parse analysis requires a snapshot; - * utility commands don't have invalidatable plans, so we'd not get - * here for such a command. + * that for parsing/planning. But if it isn't, install one. */ snapshot_set = false; - if (!ActiveSnapshotSet()) + if (!ActiveSnapshotSet() && + analyze_requires_snapshot(plansource->raw_parse_tree)) { PushActiveSnapshot(GetTransactionSnapshot()); snapshot_set = true;
yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes: >> There isn't terribly much we can do with this report unless you can >> provide a complete test case to reproduce it. > after some investigation, i think it is triggered by protocol-level prepare > of ROLLBACK in an aborted transaction. does the following patch make sense? Maybe, but I'd still like to see a test case, because I can't reproduce any such problem by preparing ROLLBACK in an aborted transaction. regards, tom lane
Re: BUG #6218: TRAP: FailedAssertion( "!(owner->nsnapshots == 0)", File: "resowner.c", Line: 365)
От
yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
Дата:
hi, > yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes: >>> There isn't terribly much we can do with this report unless you can >>> provide a complete test case to reproduce it. > >> after some investigation, i think it is triggered by protocol-level prepare >> of ROLLBACK in an aborted transaction. does the following patch make sense? > > Maybe, but I'd still like to see a test case, because I can't reproduce > any such problem by preparing ROLLBACK in an aborted transaction. reading GetTransactionSnapshot, it seems that the problem happens only with IsolationUsesXactSnapshot() true. the attached program triggers the assertion failure reliably for me. YAMAMOTO Takashi > > regards, tom lane
yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes: >> Maybe, but I'd still like to see a test case, because I can't reproduce >> any such problem by preparing ROLLBACK in an aborted transaction. > reading GetTransactionSnapshot, it seems that the problem happens > only with IsolationUsesXactSnapshot() true. Hmm. I'm inclined to think that this demonstrates a bug in snapshot management, not so much in plancache. We have plancache doing PushActiveSnapshot(GetTransactionSnapshot()); and then later PopActiveSnapshot(); and at this point surely it is not plancache's fault if there is any remaining refcount for the snapshot. There is, though, because GetTransactionSnapshot saved a refcount in TopTransactionResourceOwner. I think it's snapmgr.c's responsibility to make sure that that's cleaned up, and it's not doing so. The place where that refcount normally gets dropped is AtEarlyCommit_Snapshot, but that isn't going to be called at all in aborted-transaction cleanup. Worse, if we just transposed it over to be called in a place in AbortTransaction comparable to where it's called during commit, that still wouldn't fix the problem, because when the ROLLBACK happens, we've already aborted the transaction. I think that AtEarlyCommit_Snapshot is misdesigned, and that far from being done "early" in commit/abort, it needs to be done "late", like somewhere not very long before the ResourceOwnerDelete(TopTransactionResourceOwner) calls. There is no very good reason to think that someone might not ask for a snapshot during commit processing. Alvaro, do you happen to remember why this got designed as an "early" transaction shutdown action, rather than delaying it as long as possible? regards, tom lane
Excerpts from Tom Lane's message of lun sep 26 13:26:37 -0300 2011: > > yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes: > >> Maybe, but I'd still like to see a test case, because I can't reproduce > >> any such problem by preparing ROLLBACK in an aborted transaction. > > > reading GetTransactionSnapshot, it seems that the problem happens > > only with IsolationUsesXactSnapshot() true. > > Hmm. I'm inclined to think that this demonstrates a bug in snapshot > management, not so much in plancache. We have plancache doing > > PushActiveSnapshot(GetTransactionSnapshot()); > > and then later > > PopActiveSnapshot(); > > and at this point surely it is not plancache's fault if there is any > remaining refcount for the snapshot. There is, though, because > GetTransactionSnapshot saved a refcount in TopTransactionResourceOwner. > I think it's snapmgr.c's responsibility to make sure that that's cleaned > up, and it's not doing so. Agreed. > The place where that refcount normally gets dropped is > AtEarlyCommit_Snapshot, but that isn't going to be called at all in > aborted-transaction cleanup. Worse, if we just transposed it over to be > called in a place in AbortTransaction comparable to where it's called > during commit, that still wouldn't fix the problem, because when the > ROLLBACK happens, we've already aborted the transaction. ... ouch. > I think that AtEarlyCommit_Snapshot is misdesigned, and that far from > being done "early" in commit/abort, it needs to be done "late", like > somewhere not very long before the > ResourceOwnerDelete(TopTransactionResourceOwner) calls. There is no > very good reason to think that someone might not ask for a snapshot > during commit processing. > > Alvaro, do you happen to remember why this got designed as an "early" > transaction shutdown action, rather than delaying it as long as > possible? As far as I remember, the only principle was that it had to run before ResourceOwner cleanup. Commit 7b640b0345dc4fbd39ff568700985b432f6afa07 introduces that "early" call; ResOwner support had been introduced 10 days before in 6bbef4e5383c99d93aa974e2c79d328cfbd1c4a9. I probably just tried it out and noticed that resowner.c complained if I didn't drop the refcount prior to its own cleanup. I don't think I ever considered the scenario of calls in aborted transactions. Shall I work on a fix? I expect you are plenty busy with commitfest stuff, but please let me know otherwise. -- Ãlvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Tom Lane's message of lun sep 26 13:26:37 -0300 2011: >> The place where that refcount normally gets dropped is >> AtEarlyCommit_Snapshot, but that isn't going to be called at all in >> aborted-transaction cleanup. Worse, if we just transposed it over to be >> called in a place in AbortTransaction comparable to where it's called >> during commit, that still wouldn't fix the problem, because when the >> ROLLBACK happens, we've already aborted the transaction. > ... ouch. > Shall I work on a fix? I expect you are plenty busy with commitfest > stuff, but please let me know otherwise. I have what-I-think-is-the-fix pretty clear in my own mind, so let me give it a try. If it doesn't work I'll bounce it back to you. regards, tom lane