Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts
Дата
Msg-id 20140618225131.GN18688@eldon.alvh.no-ip.org
обсуждение исходный текст
Ответ на Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts  (Bruce Momjian <bruce@momjian.us>)
Ответы Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts  (Bruce Momjian <bruce@momjian.us>)
Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-bugs
Bruce Momjian wrote:
> On Fri, May 30, 2014 at 02:16:31PM +0200, Andres Freund wrote:
> > Hi,
> >
> > When upgrading a < 9.3 cluster pg_upgrade doesn't bother to keep the old
> > multixacts around because they won't be read after the upgrade (and
> > aren't compatible). It just resets the new cluster's nextMulti to the
> > old + 1.
> > Unfortunately that means that there'll be a offsets/0000 file created by
> > initdb around. Sounds harmless enough, but that'll actually cause
> > problems if the old cluster had a nextMulti that's bigger than that
> > page.

I've been playing with this a bit.  Here's a patch that just does
rmtree() of the problematic file during pg_upgrade, as proposed by
Andres, which solves the problem.  Note that this patch removes the 0000
file in both cases: when upgrading from 9.2, and when upgrading from
9.3+.  The former case is the bug that Andres reported.  In the second
case, we overwrite the files with the ones from the old cluster; if
there's a lingering 0000 file in the new cluster, it would cause
problems as well.  (Really, I don't see any reason to think that these
two cases are any different.)

> This is a bug in 9.3 pg_upgrade as well?  Why has no one reported it
> before?

I think one reason is that not all upgrades see an issue here; for old
clusters that haven't gone beyond the 0000 offset file, there is no
problem.  For clusters that have gone beyond 0000 but not by much, the
file will be deleted during the first truncation.  It only becomes a
problem if the cluster is close enough to 2^31.  Another thing to keep
in consideration is that initdb initializes all databases' datminmxid to
1.  If the old cluster was past the 2^31 point, it means the datminmxid
doesn't move from 1 until the actual wraparound.


I noticed another problem while trying to reproduce it.  It only happens
in assert-enabled builds: when FreezeMultiXactId() sees an old multi, it
asserts that it mustn't be running anymore, which is fine except that if
the multi value is one that survived a pg_upgrade cycle from 9.2 or
older (i.e. it was a share-locked tuple in the old cluster), an error is
raised when the assertion is checked.  Since it doesn't make sense to
examine its running status at that point, the attached patch simply
disables the assertion in that case.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Felipe Gasper
Дата:
Сообщение: Re: “server closed the connection unexpectedly”
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: BUG #10589: hungarian.stop file spelling error