Andres Freund wrote:
> On 2013-12-09 16:00:32 -0300, Alvaro Herrera wrote:
> > As a note, the SlruScanDirectory code has a flaw because it only looks
> > at four-digit files; the reason only files up to 0xFFFF are missing and
> > not the following ones is because those got ignored. This needs a fix
> > as well.
>
> While I agree it's a bug, I don't think it's relevant for the case at
> hand. For offset's there's no following page (or exactly 1, not sure
> about the math offhand), and we only use SlruScanDirectory() for
> offsets not for members.
Sure we do for members, through SimpleLruTruncate which calls
SlruScanDirectory underneath.
> > > I've recently remarked that I find it dangerous that we only do
> > > anti-wraparound stuff for pg_multixact/offsets, not for /members. So,
> > > here we have the proof that that's bad.
> >
> > It's hard to see how to add this post-facto, though. I mean, I am
> > thinking we would need some additional pg_control info etc. We'd better
> > figure out a way to add such controls without having to add that.
>
> Couldn't we just get the oldest multi, check where in offsets it points
> to, and compare that with nextOffset? That should be doable without
> additional data.
Hmm, that seems a sensible approach ...
> > > I think problems should be preventable if you issue a systemwide VACUUM
> > > FREEZE, but please let others chime in before you execute it.
> >
> > I wouldn't freeze anything just yet, at least until the patch to fix
> > multixact freezing is in.
>
> Well, it seems better than getting errors because of multixact members
> that are gone.
> Maybe PGOPTIONS='-c vacuum_freez_table_age=0 -c vacuum_freeze_min_age=1000000 vacuumdb -a'
> - that ought not to cause problems with current data and should freeze
> enough to get rid of problematic multis?
TBH I don't feel comfortable with predicting what will it freeze with
the broken code.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services