Обсуждение: Snapshot management, final
Here is the patch for snapshot management I currently have. Below is a complete description of the changes in this patch. The most important change is on the use of CopySnapshot and the games with ActiveSnapshot. On the new code, whenever a piece of code needs a snapshot to persist, it is "registered" by using RegisterSnapshot. As soon as it is done with it, it must be unregistered with UnregisterSnapshot. For ActiveSnapshot, we deal with a stack structure. Callers needing an Active snap set call PushActiveSnapshot, and PopActiveSnapshot when they no longer need it. GetSnapshotData still creates initial versions of each snapshot in static memory. There's a new boolean on the SnapshotData struct (set to false by GetSnapshotData), which indicates whether the snapshot has been copied out of static memory. When a snapshot is Registered or PushedActive, this bit is checked: if false, then a copy of the snapshot is made on the dedicated SnapshotContext, and the "copied" flag is flipped. SnapshotData also carries two new reference counts: one for the Register list, another one for the ActiveSnapshot stack. Whenever a snap is Unregistered or Popped, both refcounts are checked. If both are found to be zero then memory can be released. On Unregister and Pop, after deleting the snapshot, we also verify the whole lists. If they are empty, it means no more live snapshot remain for this transaction. In this case we can reset MyProc->xmin to InvalidXid. Thus, GetSnapshotData must now recalculate MyProc->xmin each time it finds xmin to be Invalid, which can not only happen for the serializable snapshot but at any time. A serializable transaction Registers its snapshot as first thing, and keeps it registered until transaction end. Note: I had previously made noises about changing the semantics of MyProc->xmin or introducing a new VacuumXmin. I have refrained from doing so in this patch. Those are the high-level changes. Some coding changes: - SerializableSnapshot and LatestSnapshot as global symbols are no more. - A couple of PG_TRY blocks have been removed. Particularly, on plancache.c this means the do_planning() routine is now pointless and has been folded back into its sole caller. - Three CopySnapshot call sites remain outside snapmgr.c: DoCopy() on copy.c, ExplainOnePlan() on explain.c and _SPI_execute_plan() on spi.c. They are there because they grab the current ActiveSnapshot, modify it, and then use the resulting snapshot. There is no corresponding FreeSnapshot, because it's not needed. - CommandCounterIncrement now calls into snapmgr.c to set the CID of the static snapshots. - CREATE INDEX CONCURRENTLY, VACUUM FULL, and CLUSTER must now explicitely Pop the ActiveSnapshot set by PortalRunUtility before calling CommitTransactionCommand. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Вложения
On Tue, 2008-04-22 at 15:49 -0400, Alvaro Herrera wrote: > - Three CopySnapshot call sites remain outside snapmgr.c: DoCopy() on > copy.c, ExplainOnePlan() on explain.c and _SPI_execute_plan() on spi.c. > They are there because they grab the current ActiveSnapshot, modify it, > and then use the resulting snapshot. There is no corresponding > FreeSnapshot, because it's not needed. Not needed? How can we be certain that the modified snapshot does not outlive its original source? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs wrote: > On Tue, 2008-04-22 at 15:49 -0400, Alvaro Herrera wrote: > > > - Three CopySnapshot call sites remain outside snapmgr.c: DoCopy() on > > copy.c, ExplainOnePlan() on explain.c and _SPI_execute_plan() on spi.c. > > They are there because they grab the current ActiveSnapshot, modify it, > > and then use the resulting snapshot. There is no corresponding > > FreeSnapshot, because it's not needed. > > Not needed? How can we be certain that the modified snapshot does not > outlive its original source? It's not CopySnapshot that's not needed, but FreeSnapshot. The point here is that the snapshot will be freed automatically as soon as it is PopActiveSnapshot'd out of existance. CopySnapshot creates a new, separate copy of the passed snapshot, and each of them will be freed (separately) as soon as their refcounts reach zero. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Tue, 2008-04-22 at 17:50 -0400, Alvaro Herrera wrote: > Simon Riggs wrote: > > On Tue, 2008-04-22 at 15:49 -0400, Alvaro Herrera wrote: > > > > > - Three CopySnapshot call sites remain outside snapmgr.c: DoCopy() on > > > copy.c, ExplainOnePlan() on explain.c and _SPI_execute_plan() on spi.c. > > > They are there because they grab the current ActiveSnapshot, modify it, > > > and then use the resulting snapshot. There is no corresponding > > > FreeSnapshot, because it's not needed. > > > > Not needed? How can we be certain that the modified snapshot does not > > outlive its original source? > > It's not CopySnapshot that's not needed, but FreeSnapshot. The point > here is that the snapshot will be freed automatically as soon as it is > PopActiveSnapshot'd out of existance. CopySnapshot creates a new, > separate copy of the passed snapshot, and each of them will be freed > (separately) as soon as their refcounts reach zero. OK, so it can;t be copied to a longer lived memory context? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs wrote: > OK, so it can;t be copied to a longer lived memory context? CopySnapshot always copies snapshots to SnapshotContext, which is a context that lives until transaction end. There's no mechanism for copying a snapshot into another context, because I don't see the need. If you need that ability, please explain. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Simon Riggs wrote: >> OK, so it can;t be copied to a longer lived memory context? > CopySnapshot always copies snapshots to SnapshotContext, which is a > context that lives until transaction end. There's no mechanism for > copying a snapshot into another context, because I don't see the need. The only reason we have memory contexts at all is to avoid the need to track individual palloc'd objects. Since we're instituting exactly such tracking for snapshots, there's no value in placing them in general-purpose memory contexts. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > > CopySnapshot always copies snapshots to SnapshotContext, which is a > > context that lives until transaction end. There's no mechanism for > > copying a snapshot into another context, because I don't see the need. > > The only reason we have memory contexts at all is to avoid the need to > track individual palloc'd objects. Since we're instituting exactly such > tracking for snapshots, there's no value in placing them in > general-purpose memory contexts. The problem is that we reuse snapshots, and not all uses have the same longevity. If a context goes away from under a snapshot and there are other references to it, the result is a dangling pointer somewhere. That's why we have reference counts on snaps: we know we can free one when its refcounts are zero. At the same time, the snapshots all go away at transaction end with TopTransactionContext. The other possible approach to this problem is creating a separate copy each time a snapshot is reused, but this just causes extra palloc'ing for no gain at all. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Tue, 2008-04-22 at 18:13 -0400, Alvaro Herrera wrote: > Simon Riggs wrote: > > > OK, so it can;t be copied to a longer lived memory context? > > CopySnapshot always copies snapshots to SnapshotContext, which is a > context that lives until transaction end. There's no mechanism for > copying a snapshot into another context, because I don't see the need. > > If you need that ability, please explain. No, I wish to prevent that, not enable it. Perhaps put the TransactionId on each snapshot and then an Assert can check it before its used. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs wrote: > On Tue, 2008-04-22 at 18:13 -0400, Alvaro Herrera wrote: > > Simon Riggs wrote: > > > > > OK, so it can;t be copied to a longer lived memory context? > > > > If you need that ability, please explain. > > No, I wish to prevent that, not enable it. I see. Sure, we don't have that problem. In fact, we didn't have it before either, so I'm not sure I see your point :-) > Perhaps put the TransactionId on each snapshot and then an Assert can > check it before its used. There's no need for that -- all snapshots go away at transaction end. An attempt to use one would cause a prompt crash (at least on an assert-enabled build.) -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Tom Lane wrote: > The only reason we have memory contexts at all is to avoid the need to > track individual palloc'd objects. Since we're instituting exactly such > tracking for snapshots, there's no value in placing them in > general-purpose memory contexts. FWIW I noticed yesterday after going to bed that SnapshotContext serves no useful purpose -- we can just remove it and store snaps in TopTransactionContext. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Wed, 2008-04-23 at 08:21 -0400, Alvaro Herrera wrote: > Simon Riggs wrote: > > On Tue, 2008-04-22 at 18:13 -0400, Alvaro Herrera wrote: > > > Simon Riggs wrote: > > > > > > > OK, so it can;t be copied to a longer lived memory context? > > > > > > If you need that ability, please explain. > > > > No, I wish to prevent that, not enable it. > > I see. Sure, we don't have that problem. In fact, we didn't have it > before either, so I'm not sure I see your point :-) You originally said "because its not needed" but didn't explain why. I wanted to make sure there was no loophole. I'm not trying to make any other point, just checking. Forgive me for being dense, but what is there to stop you using a CopySnapshot in TopMemoryContext? If you did, there would be no way to free it, nor would we notice it had been done, AFAICS. Not anything I'm thinking about doing, though. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs wrote: > Forgive me for being dense, but what is there to stop you using a > CopySnapshot in TopMemoryContext? If you did, there would be no way to > free it, nor would we notice it had been done, AFAICS. Not anything I'm > thinking about doing, though. Well, TopMemoryContext is no good because we want to free snapshots in a fell swoop at transaction abort. TopTransactionContext would be OK, as I just said in the parallel subthread. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote: > FWIW I noticed yesterday after going to bed that SnapshotContext serves > no useful purpose -- we can just remove it and store snaps in > TopTransactionContext. ... which is what the attached patch does. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Вложения
Alvaro Herrera <alvherre@commandprompt.com> writes: > Simon Riggs wrote: >> On Tue, 2008-04-22 at 15:49 -0400, Alvaro Herrera wrote: >>> - Three CopySnapshot call sites remain outside snapmgr.c: DoCopy() on >>> copy.c, ExplainOnePlan() on explain.c and _SPI_execute_plan() on spi.c. >>> They are there because they grab the current ActiveSnapshot, modify it, >>> and then use the resulting snapshot. There is no corresponding >>> FreeSnapshot, because it's not needed. >> >> Not needed? How can we be certain that the modified snapshot does not >> outlive its original source? > It's not CopySnapshot that's not needed, but FreeSnapshot. The point > here is that the snapshot will be freed automatically as soon as it is > PopActiveSnapshot'd out of existance. CopySnapshot creates a new, > separate copy of the passed snapshot, and each of them will be freed > (separately) as soon as their refcounts reach zero. I looked over this patch and I think it still needs work. I concur with Simon's discomfort about the external CopySnapshot calls: they are reference leaks waiting to happen. As an example, you have the pattern snap = CopySnapshot(...); do some stuff; RegisterSnapshot(); but what if the "stuff" fails? In most of these code paths there's at least one palloc in between, so a failure is certainly possible. In the current form of the patch, a failure might not cost more than some leaked memory in TopTransactionContext, but it's still pretty ugly. I think it would be best if it were simply not possible to do CopySnapshot without simultaneously putting the snap into either the registered or active lists. In the COPY and EXPLAIN cases I think you misunderstood the point of the original code anyway: the modified snapshot is still the active snapshot for the duration of the command. So I think the right approach for both of these is the equivalent of what you put into spi.c: snap = CopySnapshot(snapshot); if (!read_only) snap->curcid = GetCurrentCommandId(false); PushActiveSnapshot(snap); and then a Pop at the end. It might be worth encapsulating the above series as a single copy-modify-and-push subroutine in snapmgr.c, so that you don't have to expose CopySnapshot publicly, nor expose adjustment of a snap's curcid outside snapmgr. Also, I don't think the subtransaction management is correct at all. How can it handle cases where a subtransaction and a sub-sub-transaction both take out reference counts on the same snap? If the sub-sub-xact aborts, you have no way to determine how many refcounts should go away. I think it would work better to keep the subxact level indicators in the ActiveSnapshotElt/RegdSnapshotElt list members. This would mean multiple RegdSnapshotElt members pointing at the same snapshot in any case where two different subxact levels actually had refs to the same snapshot. The RegdSnapshotElt members would each count registration counts for their own subtransaction, and the underlying snapshot would just count how many RegdSnapshotElts were pointing at it. This would allow the same amount of verification in subxact commit as you have in xact commit, ie there should be no counts left for the current subtransaction. Also, I think that the whole snapshot-sharing mechanism is not working as intended except for the serializable case; otherwise sequences like x = RegisterSnapshot(GetTransactionSnapshot()); y = RegisterSnapshot(GetTransactionSnapshot()); will result in x and y being separate copies. Or are you assuming that this just isn't worth optimizing? Some minor points: It seems odd that snapshot cleanup is late in main xact commit/abort and early in subxact commit/abort. Unless there's a really good reason for that, keep the ordering of module cleanup calls consistent across the various cases in xact.c. GetSnapshotData's serializable parameter is obsolete and should be removed. I'm not entirely comfortable with the reference counts being only uint16 wide. int32 seems safer and it isn't really saving anything much. Push/PopActiveSnap leaks the ActiveSnapshotElt. PopActiveSnapshot should probably assert active_count > 0 before decrementing, likewise in UnregisterSnapshot. regards, tom lane
I wrote: > Also, I don't think the subtransaction management is correct at all. BTW, maybe it would make more sense to keep the reference count management inside ResourceOwners, instead of having a largely duplicative set of logic in snapmgr.c. regards, tom lane
Tom Lane wrote: > I looked over this patch and I think it still needs work. Thanks for the thorough review. I'll be working on these problems soon. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
[Reposting with compressed patch] Okay, I think I've fixed most of the issues in the reviewed patch. Updated patch attached. The most interesting change is that I've done away with CopySnapshot as a public routine in favor of a new PushUpdatedSnapshot which does the copy-update-push sequence. Also I added a refcount to RegdSnapshotElt as suggested, and changed the subxact logic to substract that exact amount on abort. There's something I'm not sure what to do about: Tom Lane wrote: > Also, I think that the whole snapshot-sharing mechanism is not working > as intended except for the serializable case; otherwise sequences > like > x = RegisterSnapshot(GetTransactionSnapshot()); > y = RegisterSnapshot(GetTransactionSnapshot()); > will result in x and y being separate copies. Or are you assuming > that this just isn't worth optimizing? It's not that I don't think it's worth optimizing, but I think it's a bit away from the scope of this patch. The problem here is how to notice that two consecutive GetTransactionSnapshot calls should really return different snapshots, considering that shared state may change in between. Perhaps there's an easy way to optimize that; I don't know. What does work is to get (say) a registered snapshot and push it as active snapshot. That results in a successfully shared snapshot. For example PortalStart does that for cursors, etc. (FWIW another thing which is probably worth rethinking is the handling of snapshots around PortalStart. Some callers pass the currently active snapshot; Others pass InvalidSnapshot. Another passes an arbitrary snapshot. When it's Invalid, PortalStart calls GetTransactionSnapshot, otherwise it uses the passed snap for PushActiveSnapshot. So this is all a bit confusing and wasteful and could use some clean up. This is material for a new patch however.) -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Вложения
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> Also, I think that the whole snapshot-sharing mechanism is not working >> as intended except for the serializable case; otherwise sequences >> like >> x = RegisterSnapshot(GetTransactionSnapshot()); >> y = RegisterSnapshot(GetTransactionSnapshot()); >> will result in x and y being separate copies. Or are you assuming >> that this just isn't worth optimizing? > It's not that I don't think it's worth optimizing, but I think it's a > bit away from the scope of this patch. The problem here is how to > notice that two consecutive GetTransactionSnapshot calls should really > return different snapshots, considering that shared state may change in > between. Perhaps there's an easy way to optimize that; I don't know. Yeah, in general you could only optimize it if no other backend had changed state, and there doesn't seem any real simple way to know that. Maybe we could teach GetSnapshotData to test for it but it's a bit doubtful that it's worth the cycles. I'm fine with leaving this as-is. I have a few other gripes though: The UnregisterSnapshot call at line 631 of indexcmds.c is definitely too early: the snapshot is touched in the very next statement. I'd be inclined to move it down to right after the Pop at line 696; there's no point in unregistering till you pop anyway. In FreeQueryDesc, the unregister calls should be after the Assert. Drop this comment fragment in plancache.c, it's not relevant anymore: * Having to ! * replan is an unusual case, and it seems a really bad idea for ! * RevalidateCachedPlan to affect the snapshot only in unusual ! * cases. ! */ s_level and as_level fields must be int not uint32, because they are being compared to nesting levels that are declared as int. You risk getting the wrong comparison semantics. (Not that it should ever matter in this code, but mixing signed and unsigned arithmetic is just bad form.) Why Assert(snap->satisfies == HeapTupleSatisfiesMVCC) in PushActiveSnapshot and RegisterSnapshot? AFAICT this code will work fine on non-MVCC snapshots. Seems a bit odd that RegisterSnapshot and PushActiveSnapshot do their palloc's in opposite orders. I think making the list elt first is probably better; in either case, if the second palloc fails then you're gonna leak the first one, so it's better to make the smaller allocation first. Shouldn't UnregisterSnapshot insist that s_level be equal to current xact nest level? AtSubCommit_Snapshot can leave us with multiple RegdSnapshotElt's for the same snap and s_level, which seems a bit bogus. I don't think the unregister code will go wrong in its current form, but you at least need some comments about the invariants that are expected to hold for the list data structure. Example: the code depends on the assumption that elements are in nonincreasing s_level order, but that's nowhere stated. AtSubAbort_Snapshot has Assert(tofree->s_snap->active_count == 0) which seems wrong: couldn't the snap be active in an outer subxact? regards, tom lane
Tom Lane wrote: I'm revising the patch; this comment is flawed though: > Shouldn't UnregisterSnapshot insist that s_level be equal to current > xact nest level? It can't check that; consider begin; savepoint foo; declare cur cursor for select (1), (2), (3); savepoint bar; close cur; commit; Thanks again for the review. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> Shouldn't UnregisterSnapshot insist that s_level be equal to current >> xact nest level? > It can't check that; consider > begin; > savepoint foo; > declare cur cursor for select (1), (2), (3); > savepoint bar; > close cur; > commit; Hmm ... but that "close" can't unregister the snapshot immediately, because you'd lose if the 2nd savepoint gets rolled back, no? Is the handling of this case even correct at the moment? ISTM correct handling of this example would require that the "close" not really discard the snap until commit. Then, given proper ordering of the cleanup operations at commit, you might be able to still have the cross-check about s_level in UnregisterSnapshot. (IOW, maybe having snapshot cleanup be late in the commit sequence wasn't such a good choice...) regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Tom Lane wrote: > >> Shouldn't UnregisterSnapshot insist that s_level be equal to current > >> xact nest level? > > > It can't check that; consider > > > begin; > > savepoint foo; > > declare cur cursor for select (1), (2), (3); > > savepoint bar; > > close cur; > > commit; > > Hmm ... but that "close" can't unregister the snapshot immediately, > because you'd lose if the 2nd savepoint gets rolled back, no? Is the > handling of this case even correct at the moment? No, CLOSE is not rolled back: alvherre=# begin; BEGIN alvherre=# savepoint foo; SAVEPOINT alvherre=# declare cur cursor for select (1), (2), (3); DECLARE CURSOR alvherre=# savepoint bar; SAVEPOINT alvherre=# close cur; CLOSE CURSOR alvherre=# rollback to bar; ROLLBACK alvherre=# fetch all from cur; ERREUR: le curseur « cur » n'existe pas Maybe this is possible to fix, but again I think it's outside the scope of this patch. > ISTM correct handling of this example would require that the "close" > not really discard the snap until commit. Then, given proper ordering > of the cleanup operations at commit, you might be able to still have the > cross-check about s_level in UnregisterSnapshot. (IOW, maybe having > snapshot cleanup be late in the commit sequence wasn't such a good > choice...) Right -- I'll move them earlier. I don't think it's trivial to fix the un-rollback-ability of CLOSE however. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> Hmm ... but that "close" can't unregister the snapshot immediately, >> because you'd lose if the 2nd savepoint gets rolled back, no? Is the >> handling of this case even correct at the moment? > No, CLOSE is not rolled back: > ... > Maybe this is possible to fix, but again I think it's outside the scope > of this patch. I'd forgotten that ... seems a bit bogus, and it's certainly not documented on the CLOSE reference page. >> ISTM correct handling of this example would require that the "close" >> not really discard the snap until commit. Then, given proper ordering >> of the cleanup operations at commit, you might be able to still have the >> cross-check about s_level in UnregisterSnapshot. (IOW, maybe having >> snapshot cleanup be late in the commit sequence wasn't such a good >> choice...) > Right -- I'll move them earlier. Well, without a clear idea of where to place them instead, you might as well leave it alone for the moment. I'd like to see this revisited sometime though. regards, tom lane
Tom Lane wrote: Patch committed, with most of these fixed. On this item: > AtSubCommit_Snapshot can leave us with multiple RegdSnapshotElt's for > the same snap and s_level, which seems a bit bogus. I agree it is bogus. I punted though, and left the code as-is, with a comment stating the problem. We can revisit this problem later if it's ever an issue. Many thanks for the extensive help. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support