Обсуждение: pg_internal.init is hazardous to your health

Поиск
Список
Период
Сортировка

pg_internal.init is hazardous to your health

От
Tom Lane
Дата:
Dirk Lutzebaeck and I just spent a tense couple of hours trying to
figure out why a large database Down Under wasn't coming up after being
reloaded from a base backup plus PITR recovery.  The symptoms were that
the recovery went fine, but backend processes would fail at startup or
soon after with "could not open relation XX/XX/XX: No such file" type of
errors.

The answer that ultimately emerged was that they'd been running a
nightly maintenance script that did REINDEX SYSTEM (among other things
I suppose).  The PITR base backup included pg_internal.init files that
were appropriate when it was taken, and the PITR recovery process did
nothing whatsoever to update 'em :-(.  So incoming backends picked up
init files with obsolete relfilenode values.

We don't actually need to *update* the file, per se, we only need to
remove it if no longer valid --- the next incoming backend will rebuild
it.  I could see fixing this by making WAL recovery run around and zap
all the .init files (only problem is to find 'em), or we could add a new
kind of WAL record saying "remove the .init file for database XYZ"
to be emitted whenever someone removes the active one.  Thoughts?

Meanwhile, if you're trying to recover from a PITR backup and it's not
working, try removing any pg_internal.init files you can find.
        regards, tom lane


Re: pg_internal.init is hazardous to your health

От
Gavin Sherry
Дата:
On Tue, 17 Oct 2006, Tom Lane wrote:

> Dirk Lutzebaeck and I just spent a tense couple of hours trying to
> figure out why a large database Down Under wasn't coming up after being
> reloaded from a base backup plus PITR recovery.  The symptoms were that
> the recovery went fine, but backend processes would fail at startup or
> soon after with "could not open relation XX/XX/XX: No such file" type of
> errors.
>
> The answer that ultimately emerged was that they'd been running a
> nightly maintenance script that did REINDEX SYSTEM (among other things
> I suppose).  The PITR base backup included pg_internal.init files that
> were appropriate when it was taken, and the PITR recovery process did
> nothing whatsoever to update 'em :-(.  So incoming backends picked up
> init files with obsolete relfilenode values.

Ouch.

> We don't actually need to *update* the file, per se, we only need to
> remove it if no longer valid --- the next incoming backend will rebuild
> it.  I could see fixing this by making WAL recovery run around and zap
> all the .init files (only problem is to find 'em), or we could add a new
> kind of WAL record saying "remove the .init file for database XYZ"
> to be emitted whenever someone removes the active one.  Thoughts?

The latter seems the Right Way except, I guess, that the decision to
remove the file is buried deep inside inval.c.

Thanks,

Gavin


Re: pg_internal.init is hazardous to your health

От
"Simon Riggs"
Дата:
On Tue, 2006-10-17 at 22:29 -0400, Tom Lane wrote:
> Dirk Lutzebaeck and I just spent a tense couple of hours trying to
> figure out why a large database Down Under wasn't coming up after being
> reloaded from a base backup plus PITR recovery.  The symptoms were that
> the recovery went fine, but backend processes would fail at startup or
> soon after with "could not open relation XX/XX/XX: No such file" type of
> errors.

Understand the tension...

> The answer that ultimately emerged was that they'd been running a
> nightly maintenance script that did REINDEX SYSTEM (among other things
> I suppose).  The PITR base backup included pg_internal.init files that
> were appropriate when it was taken, and the PITR recovery process did
> nothing whatsoever to update 'em :-(.  So incoming backends picked up
> init files with obsolete relfilenode values.

OK, I'm looking at this now for later discussion.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: pg_internal.init is hazardous to your health

От
"Simon Riggs"
Дата:
On Wed, 2006-10-18 at 12:49 +1000, Gavin Sherry wrote:

> > We don't actually need to *update* the file, per se, we only need to
> > remove it if no longer valid --- the next incoming backend will rebuild
> > it.  I could see fixing this by making WAL recovery run around and zap
> > all the .init files (only problem is to find 'em), or we could add a new
> > kind of WAL record saying "remove the .init file for database XYZ"
> > to be emitted whenever someone removes the active one.  Thoughts?

Yes, that assessment seems good.

> The latter seems the Right Way except, I guess, that the decision to
> remove the file is buried deep inside inval.c.

I'd prefer the zap everything approach, but emitting a WAL record looks
mostly straightforward and just as good.

RelationCacheInitFileInvalidate() can easily emit a WAL record. This is
called twice in succession, so we would emit WAL on the
RelationCacheInitFileInvalidate(true) call only. I'll work out a patch
for that...XLOG_XACT_RELCACHE_INVALIDATE

RelationCacheInitFileInvalidate() is also called on each
FinishPreparedTransaction(). If that is called 100% of the time, then we
can skip writing an additional record for prepared transactions by
triggering the removal of pg_internal.init when we see a
XLOG_XACT_COMMIT_PREPARED during replay. 
Not sure whether we need to do that, Heikki? Anyone?
I'm guessing no, but it seems sensible to check.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: pg_internal.init is hazardous to your health

От
Tom Lane
Дата:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> RelationCacheInitFileInvalidate() is also called on each
> FinishPreparedTransaction().

Surely not...
        regards, tom lane


Re: pg_internal.init is hazardous to your health

От
"Simon Riggs"
Дата:
On Wed, 2006-10-18 at 13:24 -0400, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > RelationCacheInitFileInvalidate() is also called on each
> > FinishPreparedTransaction().
> 
> Surely not...

I take that to mean there's nothing special about prepared transactions
and invalidating the rel cache, so we *do* need to have a separate WAL
record in all cases.

OK, I'll write up a patch later today (working in US for few days).

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: pg_internal.init is hazardous to your health

От
"Heikki Linnakangas"
Дата:
Simon Riggs wrote:
> RelationCacheInitFileInvalidate() is also called on each
> FinishPreparedTransaction(). 

It's only called if the prepared transaction invalidated the init file.

> If that is called 100% of the time, then we
> can skip writing an additional record for prepared transactions by
> triggering the removal of pg_internal.init when we see a
> XLOG_XACT_COMMIT_PREPARED during replay. 
> Not sure whether we need to do that, Heikki? Anyone?
> I'm guessing no, but it seems sensible to check.

If you write the WAL record in RelationCacheInitFileInvalidate(true), 
that's enough. No extra handling for prepared transactions is needed.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: pg_internal.init is hazardous to your health

От
"Simon Riggs"
Дата:
On Wed, 2006-10-18 at 15:56 +0100, Simon Riggs wrote:
> On Tue, 2006-10-17 at 22:29 -0400, Tom Lane wrote:
> > The answer that ultimately emerged was that they'd been running a
> > nightly maintenance script that did REINDEX SYSTEM (among other things
> > I suppose).  The PITR base backup included pg_internal.init files that
> > were appropriate when it was taken, and the PITR recovery process did
> > nothing whatsoever to update 'em :-(.  So incoming backends picked up
> > init files with obsolete relfilenode values.
> 
> OK, I'm looking at this now for later discussion.

I've coded a patch and am just testing now.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com