Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts
Дата
Msg-id 20140620171738.GB29143@momjian.us
обсуждение исходный текст
Ответ на Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-bugs
On Thu, Jun 19, 2014 at 06:04:25PM -0400, Alvaro Herrera wrote:
> Bruce Momjian wrote:
>
> > I wasn't happy with having that delete code added there when we do
> > directory delete in the function above.  I instead broke apart the
> > delete and copy code and called the delete code where needed, in the
> > attached patch.
>
> Makes sense, yeah.  I didn't look closely enough to realize that the
> function that does the copying also does the rmtree part.

OK.  Should I apply my patch so at least pg_upgrade is good going
forward?

> I also now realize why the other case (upgrade from 9.3 to 9.4) does not
> have a bug: we are already deleting the files in that path.

Right, and I think the patch makes it clearer why we need those 'rm'
function calls because they mirror the 'copy' ones.

> > OK, so the xid has to be beyond 2^31 during pg_upgrade to trigger a
> > problem?  That might explain the rare reporting of this bug.  What would
> > the test query look like so we can tell people when to remove the '0000'
> > files?  Would we need to see the existence of '0000' and high-numbered
> > files?  How high?  What does a 2^31 file look like?
>
> I misspoke.
>
> I ran a few more upgrades, and then tried vacuuming all databases, which
> is when the truncate code is run.  Say the original cluster had an
> oldestmulti of 10 million.  If you just run VACUUM in the new cluster
> after the upgrade, the 0000 file is not deleted: it's not yet old enough
> in terms of multixact age.  An error is not thrown, because we're still
> not attempting a truncate.  But if you lower the
> vacuum_multixact_freeze_table_age to 10 million minus one, then we will
> try the deletion and that will raise the error.
>
> I think (didn't actually try) if you just let 150 million multixacts be
> generated, that's the first time you will get the error.
>
> Now if you run a VACUUM FREEZE after the upgrade, the file will be
> deleted with no error.
>
> I now think that the reason most people haven't hit the problem is that
> they don't generate enough multis after upgrading a database that had
> enough multis in the old database.  This seems a bit curious

OK, that does make more sense.  A user would need to have the to
exceeded 0000 to the point where it was removed from their old cluster,
and _then_ run far enough past the freeze horizon to again require file
removal.  This does make sense why we are seeing the bug only now, and
while a quick minor release with a query to fix this will get us out of
the problem with minimal impact.

> > Also, is there a reason you didn't remove the 'members/0000' file in your
> > patch?  I have removed it in my version.
>
> There's no point.  That file is the starting point for new multis
> anyway, and it's compatible with the new format (because it's all
> zeroes).

I think it should be done for consistency with the 'copy' function calls
above.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

В списке pgsql-bugs по дате отправления:

Предыдущее
От: "MauMau"
Дата:
Сообщение: Re: Missing file versions for a bunch of dll/exe files in Windows builds
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts