While working on the reported pg_upgrade failure at multixid wraparound
[1], I bumped into another bug related to multixid wraparound. If you
run vacuum freeze, and it advances oldestMultiXactId, and nextMulti has
just wrapped around to 0, you get this in the log:
> LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on
disk
Culprit: TruncateMultiXact does this:
LWLockAcquire(MultiXactGenLock, LW_SHARED);
nextMulti = MultiXactState->nextMXact;
nextOffset = MultiXactState->nextOffset;
oldestMulti = MultiXactState->oldestMultiXactId;
LWLockRelease(MultiXactGenLock);
Assert(MultiXactIdIsValid(oldestMulti));
...
/*
* First, compute the safe truncation point for MultiXactMember.
This is
* the starting offset of the oldest multixact.
*
* Hopefully, find_multixact_start will always work here, because we've
* already checked that it doesn't precede the earliest MultiXact
on disk.
* But if it fails, don't truncate anything, and log a message.
*/
if (oldestMulti == nextMulti)
{
/* there are NO MultiXacts */
oldestOffset = nextOffset;
}
else if (!find_multixact_start(oldestMulti, &oldestOffset))
{
ereport(LOG,
(errmsg("oldest MultiXact %u not found, earliest
MultiXact %u, skipping truncation",
oldestMulti, earliest)));
LWLockRelease(MultiXactTruncationLock);
return;
}
Scenario 1: In the buggy scenario, oldestMulti is 1 and nextMulti is 0.
We should take the "there are NO MultiXacts" codepath in that case,
because we skip over 0 when assigning multixids. Instead, we call
find_multixact_start with oldestMulti==1, which returns false because
multixid 1 hasn't been assigned and the SLRU segment doesn't exist yet.
There's a similar bug in SetOffsetVacuumLimit().
Scenario 2: In scenario 1 we just fail to truncate the SLRUs and you get
the log message. But I think there might be more serious variants of
this. If the SLRU segment exists but the offset for multixid 1 hasn't
been set yet, find_multixact_start() will return 0 instead, and we will
proceed with the truncation based on incorrect oldestOffset==0 value,
possibly removing SLRU segments that are still needed.
Attached is a fix for scenarios 1 and 2, and a test case for scenario 1.
Scenario 3: I also noticed that the above code isn't prepared for the
race condition that the offset corresponding to 'oldestMulti' hasn't
been stored in the SLRU yet, even without wraparound. That could
theoretically happen if the backend executing
MultiXactIdCreateFromMembers() gets stuck for a long time between the
calls to GetNewMultiXactId() and RecordNewMultiXact(), but I think we're
saved by the fact that we only create new multixids while holding a lock
on a heap page, and a system-wide VACUUM FREEZE that would advance
oldestMulti would need to lock the heap page too. It's scary though,
because it could also lead to truncating away members SLRU segments that
are still needed. The attached patch does *not* address this scenario.
[1]
https://www.postgresql.org/message-id/CACG%3DezaApSMTjd%3DM2Sfn5Ucuggd3FG8Z8Qte8Xq9k5-%2BRQis-g@mail.gmail.com
- Heikki