On Thu, Oct 28, 2021 at 3:48 PM Andres Freund <andres@anarazel.de> wrote:
> That wouldn't protect against e.g. a logic bug in ZFS.
> Not saying that that is the most likely explanation, just something worth
> checking.
True. It's too early to rule that out. Though note that a full
pg_amcheck of the database mostly didn't complain about anything -- it
was just a handful of indexes, associated with just 2 tables. And this
is mediawiki, which has lots of tables. None of the new heapam
verification functionality found any problems (as with the older
index-matches-table heapallindexed stuff).
> Didn't 14 change the logic when index vacuums are done? That could cause
> previously existing issues to manifest with a higher likelihood.
I don't follow. The new logic that skips index vacuuming kicks in 1)
in an anti-wraparound vacuum emergency, and 2) when there are very few
LP_DEAD line pointers in the heap. We can rule 1 out, I think, because
the XIDs we see are in the low millions, and our starting point was a
database that was upgraded via a dump and reload.
The second criteria for skipping index vacuuming (the "less than 2% of
heap pages have any LP_DEAD items" thing) might well have been hit on
these tables -- it is after all very common. But I don't see how that
could matter. We're never going to get to a code path inside
vacuumlazy.c that sets LP_DEAD items from VACUUM's dead_tuples array
to LP_UNUSED (how could reached such a code path without also index
vacuuming, given the way things are set up inside lazy_vacuum()?).
We're always going to have the opportunity to do index vacuuming with
any left-behind LP_DEAD line pointers in the next VACUUM -- right
after the later VACUUM successfully returns from
lazy_vacuum_all_indexes().
--
Peter Geoghegan