Обсуждение: MultiXact truncation, startup et al.
Hi, Heikki's comments about StartupMultiXact() being executed too late in http://archives.postgresql.org/message-id/528C9392.8000004%40vmware.com made me look at how multixact truncation works, since that's where SimpleLruTruncate() would trigger an error because of an unitialized latest_page_number. Turns out, we don't ever truncate pg_multixact during recovery since 9dc842f0832fd71eda826349a0c17ecf8ae93b84 because multixact truncations, in contrast to clog, aren't WAL logged themselves. Disabling probably was fair game back then since it wasn't too likely to remain in crash recovery forever. But at the very least since the addition of Hot Standby that's really not the case anymore. If I calculate correctly currently you'd end up with ~34GB(<9.3)/38GB of pg_multixact which seems a bit much. I am not 100% sure, but it looks like things could actually continue to work despite having an slru wraparound into existing data. But that's certainly nothing I'd want to rely on and looks mostly like lucky happenstance, especially in 9.3. If this were a master only issue, I'd say WAL-logging mxact truncation would be the way to go, but we can't really do that in the back branches since multixact_redo() would throw a fit if we were to introduce a new type of wal record and somebody would upgrade a primary first. So, what I think we need to do is to split StartupMultiXact() into two parts, StartupMultiXact() which only sets the offset's, members's shared->latest_page_number and TrimMultiXact() which does the remainder of the work, executed when finishing crash recovery at the current location of StartupMultiXact(). At this point <9.3 and 9.3+ diverge: <9.3 we can just re-enable doing TruncateMultiXact() in CheckPointMultiXact() since the content of the mxacts isn't important anymore - we just need the WAL records to be sure to have advanced the nextMXact, nextOffset counters to be sure they are big enough. >= 9.3 multixact checkpoints don't actually perform the truncation anymore, but vacuum does, via vac_truncate_clog(). Which obviously isn't executed during recovery. So we need to re-add truncation there, possibly only during recovery (everything else should be superflous work)? Any comments? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Hi, On 2013-11-21 20:38:47 +0100, Andres Freund wrote: > Turns out, we don't ever truncate pg_multixact during recovery since > 9dc842f0832fd71eda826349a0c17ecf8ae93b84 because multixact truncations, > in contrast to clog, aren't WAL logged themselves. Disabling probably > was fair game back then since it wasn't too likely to remain in crash > recovery forever. > But at the very least since the addition of Hot Standby that's really > not the case anymore. If I calculate correctly currently you'd end up > with ~34GB(<9.3)/38GB of pg_multixact which seems a bit much. > > I am not 100% sure, but it looks like things could actually continue to > work despite having an slru wraparound into existing data. But that's > certainly nothing I'd want to rely on and looks mostly like lucky > happenstance, especially in 9.3. > > If this were a master only issue, I'd say WAL-logging mxact truncation > would be the way to go, but we can't really do that in the back branches > since multixact_redo() would throw a fit if we were to introduce a new > type of wal record and somebody would upgrade a primary first. > > So, what I think we need to do is to split StartupMultiXact() into two > parts, StartupMultiXact() which only sets the offset's, members's > shared->latest_page_number and TrimMultiXact() which does the remainder > of the work, executed when finishing crash recovery at the current > location of StartupMultiXact(). So, I've done this for 9.3+ for now. Testing around that turned up that our current way to schedule anti mxid wraparounds doesn't really work: 1) autovacuum.c knows about such vacuums, but vacuum.c doesn't. Leading to a long cycle of partial vacuums that don't increase relminmxid. 2) Parts of the code used 200mio as a hardcoded constant, others used autovacuum_freeze_max_age. 0001 fixes the vacuum scheduling and is applicable to 9.3+, 0002 re-adds pg_multixact truncation during crash recovery. The current code will only work on 9.3+, but if it's deemed acceptable I can backport it to earlier versions. I am not sure if it's worth backporting it 9.0 given it has neither HS nor SR? 0003 is a debugging only patch adding the useful pg_burn_multixact(num) function to pageinspect (plus some core changes to make that fast) and allows for low autovacuum_freeze_max_age settings. Not sure if it's really worth adding MultiXactIdPrecedesOrEquals in 0002, but I didn't want to differ in the scan_all logic normal xids ids and mxids. I think it'd also be fine to change the logic for xids to use TransactionIdPrecedes(), but I didn't want to touch that logic unnecessarily. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On Wed, Nov 27, 2013 at 5:42 PM, Andres Freund <andres@2ndquadrant.com> wrote: > So, I've done this for 9.3+ for now. Testing around that turned up that > our current way to schedule anti mxid wraparounds doesn't really work: > 1) autovacuum.c knows about such vacuums, but vacuum.c doesn't. Leading > to a long cycle of partial vacuums that don't increase relminmxid. > 2) Parts of the code used 200mio as a hardcoded constant, others used > autovacuum_freeze_max_age. > > 0001 fixes the vacuum scheduling and is applicable to 9.3+, multiTableLimit is a bad name for whatever concept this is supposed to represent. It does not involve multiple tables. vacuum_set_xid_limits now has multiXactCutoff, multiTableLimit, and mxLimit, and there's no explanation of what the are. > 0002 re-adds pg_multixact truncation during crash recovery. The current > code will only work on 9.3+, but if it's deemed acceptable I can > backport it to earlier versions. I am not sure if it's worth backporting > it 9.0 given it has neither HS nor SR? Huh? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-11-28 19:23:29 -0500, Robert Haas wrote: > On Wed, Nov 27, 2013 at 5:42 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > So, I've done this for 9.3+ for now. Testing around that turned up that > > our current way to schedule anti mxid wraparounds doesn't really work: > > 1) autovacuum.c knows about such vacuums, but vacuum.c doesn't. Leading > > to a long cycle of partial vacuums that don't increase relminmxid. > > 2) Parts of the code used 200mio as a hardcoded constant, others used > > autovacuum_freeze_max_age. > > > > 0001 fixes the vacuum scheduling and is applicable to 9.3+, > > multiTableLimit is a bad name for whatever concept this is supposed to > represent. It does not involve multiple tables. It's the freezeTableLimit equivalent for multixacts. I haven't named it multiFreezeTableLimit because multixacts don't actually have a concept of freezing - and in that context I didn't see much danger for it to be understood to apply to multiple tables. I'll try to think of a better name. > vacuum_set_xid_limits now has multiXactCutoff, multiTableLimit, and > mxLimit, and there's no explanation of what the are. Well, neither have the plain xid variants. Not that that's good, but it's not this patches fault. I'd be happy to provide a separate patch that adds documentation for each of them - I've wondered often enough about their meaning to make it rather worthwhile. > > 0002 re-adds pg_multixact truncation during crash recovery. The current > > code will only work on 9.3+, but if it's deemed acceptable I can > > backport it to earlier versions. I am not sure if it's worth backporting > > it 9.0 given it has neither HS nor SR? > > Huh? That should have been a < 9.0 aka. 8.4. Patch 02 has changed its shape slightly since the version I posted here, because it's a prerequisite for the fix for the multixact bugs around heap_freeze_tuple() and heap_tuple_needs_freeze() I've written about nearby. I think Alvaro plans to work over my fixes to make them look nice enough and commit them soonish. Should somebody want to look into it http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/multixact-handling contains the fixes. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund escribió: > Patch 02 has changed its shape slightly since the version I posted here, > because it's a prerequisite for the fix for the multixact bugs around > heap_freeze_tuple() and heap_tuple_needs_freeze() I've written about > nearby. I think Alvaro plans to work over my fixes to make them look > nice enough and commit them soonish. > Should somebody want to look into it > http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/multixact-handling > contains the fixes. Here's a tweaked version of the freeze patch. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
Hi, Thanks, looks saner than my version ;) On 2013-11-29 01:00:35 -0300, Alvaro Herrera wrote: > ! /* > ! * FIXME this calls TransactionIdDidAbort() internally, > ! * falsifying the claim in the comment at the top ... > ! */ > ! update_xid = HeapTupleGetUpdateXid(tuple); Yea. I think we should remove that behaviour from HeapTupleGetUpdateXid() - we don't need to rely on it here. > ! /* > ! * XXX we rely here on HeapTupleGetUpdateXid returning > ! * Invalid for aborted updates. > ! */ > ! if (!TransactionIdIsValid(update_xid)) > ! freeze_xmax = true; I've just added this case because HeapTupleGetUpdateXid() currently can return InvalidTransactionId - I'd be perfectly fine to just place the aborted xid in there. > ! * FIXME -- what if it's a committed update which has recent > ! * new locker transaction? The tuple wouldn't have been > ! * removed in that case (HeapTupleSatisfiesVacuum returns > ! * RECENTLY_DEAD). > ! */ Afaics we should be protected against that by virtue of GetOldestMultiXactId(). > *************** > *** 5645,5668 **** heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid, > ! /* we don't care about lockers */ > ! if (members[i].status != MultiXactStatusNoKeyUpdate || > ! members[i].status == MultiXactStatusUpdate) > ! continue; Dangerous typo alert. Should also be replaced by ISUPDATE_from_mxstatus(). Alternatively, if we decide to make HeapTupleGetUpdateXid() not make that DidAbort() check, we can simply get rid of doing the loop ourselves alltogethers. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund escribió: > Hi, > > Thanks, looks saner than my version ;) Here are my versions of these patches. One thing not yet addressed in these patches is the change to HeapTupleGetUpdateXid() that has been mentioned repeatedly in these threads (we'll work on that next). Note I still have the FIXME comments in the freeze patch previously mentioned. I added some comments about the output parameters of vacuum_set_xid_limits, per Robert's complaint. Please give that a look. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On 2013-11-29 12:49:32 -0300, Alvaro Herrera wrote: > + * - xidFullScanLimit (also known as the table freeze age) represents the > + * minimum Xid age past which a vacuum will be turned into a full-table one, > + * to freeze tuples across the whole table. Vacuuming a table younger than > + * this can use a partial scan. Imo "age" isn't really appropriate, since it's a concrete xid that's the cutoff. It's determined by vacuum_freeze_table_age, sure, but at that point it's an absolute value. > + * - mxactFullScanLimit (as xidFullScanLimit) represents the minimum MultiXact > + * age past which a vacuum will be turned into a full-table one, as with > + * xidFullScanLimit. Not an age again. > + scan_all |= MultiXactIdPrecedesOrEquals(onerel->rd_rel->relminmxid, > + mxactFullScanLimit); Hah. That's cute ;). > @@ -1906,6 +1931,7 @@ CheckPointMultiXact(void) > SimpleLruFlush(MultiXactOffsetCtl, true); > SimpleLruFlush(MultiXactMemberCtl, true); > > + > TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(true); > } My fault, but superflous newline added. > @@ -8619,6 +8623,22 @@ CreateRestartPoint(int flags) > } > LWLockRelease(ControlFileLock); > > + Also an inconsistent newline, again by me :( > - multi = HeapTupleHeaderGetRawXmax(tuple); > - if (MultiXactIdPrecedes(multi, cutoff_multi)) > - return true; > + nmembers = GetMultiXactIdMembers(xid, &members, true); > + for (i = 0; i < nmembers; i++) > + { > + TransactionId member = members[i].xid; > + Assert(TransactionIdIsNormal(member)); > + > + /* we don't care about lockers */ > + if (ISUPDATE_from_mxstatus(members[i].status)) > + continue; Isn't there a ! missing? > diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c > index c389bf3..518c22d 100644 > --- a/src/backend/access/transam/multixact.c > +++ b/src/backend/access/transam/multixact.c > @@ -445,6 +445,10 @@ MultiXactIdExpand(MultiXactId multi, TransactionId xid, MultiXactStatus status) > > for (i = 0, j = 0; i < nmembers; i++) > { > + /* > + * FIXME: is it possible that we copy over too old updater xids > + * here? > + */ > if (TransactionIdIsInProgress(members[i].xid) || > ((members[i].status > MultiXactStatusForUpdate) && > TransactionIdDidCommit(members[i].xid))) That's not really a new thing though, so I am fine with leaving that as is for now. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
New versions of all these patches, plus one more patch which removes the behavior that HeapTupleGetUpdateXid checks for aborted updates. (Turns out this was necessary to get freezing right.) My previous patch to avoid InvalidXid as page prune point is reverted in there, too (no longer necessary.) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
- 0001-Replace-hardcoded-200000000-with-autovacuum_freeze_m.patch
- 0002-Fix-full-table-vacuum-request-mechanism-for-MultiXac.patch
- 0003-Truncate-pg_multixact-s-contents-during-crash-recove.patch
- 0004-Don-t-TransactionIdDidAbort-in-HeapTupleGetUpdateXid.patch
- 0005-Fix-a-couple-of-bugs-in-MultiXactId-freezing.patch
On 2013-11-29 16:30:08 -0300, Alvaro Herrera wrote: > New versions of all these patches, plus one more patch which removes the > behavior that HeapTupleGetUpdateXid checks for aborted updates. > From 0dce0b75da2732ca93f4c451b9bae6d4416794c3 Mon Sep 17 00:00:00 2001 > From: Alvaro Herrera <alvherre@alvh.no-ip.org> > Date: Fri, 29 Nov 2013 16:08:06 -0300 > Subject: [PATCH 4/5] Don't TransactionIdDidAbort in HeapTupleGetUpdateXid > > It is dangerous to do so, because some code expects to be able to see what's > the true Xmax even if it is aborted (particularly while traversing HOT > chains). So don't do it, and instead rely on the callers to verify for > abortedness, if necessary. > > Several race conditions and bugs fixed in the process. The current version of that additional patch causes two failures in isolationtester's delete-abort-savept.spec/out. But afaics the old behaviour was a bug: An updater seems to have waited for an aborted subtransaction. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2013-11-29 16:30:08 -0300, Alvaro Herrera wrote: > As a second bug, heap_freeze_tuple() didn't properly handle multixacts > that need to be frozen according to cutoff_multi, but whose updater xid > is still alive. Instead of preserving the update Xid, it just set Xmax > invalid, which leads to both old and new tuple versions becoming > visible. This is pretty rare in practice, but a real threat > nonetheless. Existing corrupted rows, unfortunately, cannot be repaired > in an automated fashion. I think this bug is worth mentioning here explicitly. As released, if you have a table where some rows are updated using an xmax as multi, and you freeze it you're pretty likely to experience corruption where you see both the old and the new version of a tuple as live. I haven't seen this one in the wild but just noticed it while looking at the other freezing bug, but it's really quite easy to reproduce. As demonstrated in the attached isolationtester spec which doubles the row count via an UPDATE in 9.3/HEAD. > + * Note the update Xid cannot possibly be older than > + * cutoff_xid; if it were, we wouldn't be here: if committed, > + * the tuple would have been pruned, and if aborted, the Xmax > + * would have been marked Invalid by HeapTupleSatisfiesVacuum. > + * (Not in-progress either, because then cutoff_xid would be > + * newer.) s/newer/older/? > @@ -5644,24 +5729,54 @@ heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid, > TransactionIdPrecedes(xid, cutoff_xid)) > return true; Maybe add a comment referring to heap_freeze_tuple() for justification of individual parts and that it needs to be kept in sync? > + nmembers = GetMultiXactIdMembers(xid, &members, true); > + for (i = 0; i < nmembers; i++) > + { > + TransactionId member = members[i].xid; > + > + Assert(TransactionIdIsNormal(member)); > + > + /* we don't care about lockers */ > + if (!ISUPDATE_from_mxstatus(members[i].status)) > + continue; > + > + if (TransactionIdPrecedes(member, cutoff_xid)) > + { > + pfree(members); > + return true; > + } > + } This now can use GetUpdateXid() as well. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > New versions of all these patches, plus one more patch which > removes the behavior that HeapTupleGetUpdateXid checks for > aborted updates. (Turns out this was necessary to get freezing > right.) My previous patch to avoid InvalidXid as page prune > point is reverted in there, too (no longer necessary.) Does this mean that when HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DELETE_IN_PROGRESS it is no longer possible for HeapTupleHeaderGetUpdateXid() to return InvalidTransactionId? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-11-29 13:26:00 -0800, Kevin Grittner wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > > New versions of all these patches, plus one more patch which > > removes the behavior that HeapTupleGetUpdateXid checks for > > aborted updates. (Turns out this was necessary to get freezing > > right.) My previous patch to avoid InvalidXid as page prune > > point is reverted in there, too (no longer necessary.) > > Does this mean that when HeapTupleSatisfiesVacuum() returns > HEAPTUPLE_DELETE_IN_PROGRESS it is no longer possible for > HeapTupleHeaderGetUpdateXid() to return InvalidTransactionId? Correct. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Okay, I have pushed all these patches, including the fixes suggested here and then some. Hats off to Andres, who handled all the bug analysis, figured out the test cases that tickled things in all the wrong ways, and came up with the patches. Whenever we meet again, let's make sure to find a real good restaurant and let me invite you a nice dinner. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Okay, I have pushed all these patches, including the fixes suggested > here and then some. Not sure exactly which patch caused it, but I'm getting a warning in 9.0 through 9.2: multixact.c:1553: warning: no previous prototype for 'TrimMultiXact' regards, tom lane
On Fri, 2013-11-29 at 22:15 -0300, Alvaro Herrera wrote: > Okay, I have pushed all these patches, including the fixes suggested > here and then some. Something in these patches is causing a new compiler warning in the 9.2 branch: multixact.c:1553:1: warning: no previous prototype for ‘TrimMultiXact’ [-Wmissing-prototypes] http://pgci.eisentraut.org/jenkins/job/postgresql_9.2_world/404/warnings19Result/new/? Something might be wrong there, because that new function doesn't appear to be called anywhere.
On 2013-11-30 10:57:43 -0500, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > Okay, I have pushed all these patches, including the fixes suggested > > here and then some. > > Not sure exactly which patch caused it, but I'm getting a warning > in 9.0 through 9.2: > > multixact.c:1553: warning: no previous prototype for 'TrimMultiXact' Hm. There's been a mistake during the back-patching of that patch. Likely due to the difference in how multixacts work between 9.2 and 9.3. Let me provide a fixup patch. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2013-11-30 17:06:31 +0100, Andres Freund wrote: > On 2013-11-30 10:57:43 -0500, Tom Lane wrote: > > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > > Okay, I have pushed all these patches, including the fixes suggested > > > here and then some. > > > > Not sure exactly which patch caused it, but I'm getting a warning > > in 9.0 through 9.2: > > > > multixact.c:1553: warning: no previous prototype for 'TrimMultiXact' > > Hm. There's been a mistake during the back-patching of that > patch. Likely due to the difference in how multixacts work between 9.2 > and 9.3. > Let me provide a fixup patch. Attached. I've tested that pg_multixact gets cleaned up on both primary and a HS standby, and that things continue work after failover. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services