Обсуждение: BUG #5011: Standby recovery unable to follow timeline change

Поиск
Список
Период
Сортировка

BUG #5011: Standby recovery unable to follow timeline change

От
"James Bardin"
Дата:
The following bug has been logged online:

Bug reference:      5011
Logged by:          James Bardin
Email address:      jbardin@bu.edu
PostgreSQL version: 8.4.0-1
Operating system:   Centos 5.3
Description:        Standby recovery unable to follow timeline change
Details:

This is another use case that fails with what looks like the same issue as
BUG #4796.
http://archives.postgresql.org/pgsql-bugs/2009-05/msg00060.php
(Sorry if this bug is redundant, I couldn't find any way to contribute to
that thread directly)

I'm working on a system where the master and standby servers are expected to
be able to swap roles repeatedly. The first failover works fine, but the
ex-master, now standby, can't recover using the shipped logs.

Using recovery_target_timeline='latest' finds the new history file, and
pg_standby looks good until recovery is attempted. Then we log errors like:

LOG:  unexpected timeline ID 0 in log file 0, segment 1, offset 0
LOG:  invalid primary checkpoint record

and any changes made after the first failover are lost.

Is this currently possible, or do I have to send a full file-level backup to
sync the ex-master server with the new master?

Re: BUG #5011: Standby recovery unable to follow timeline change

От
"Kevin Grittner"
Дата:
"James Bardin" <jbardin@bu.edu> wrote:

> Is this currently possible, or do I have to send a full file-level
> backup to sync the ex-master server with the new master?

I believe you have to get a new base backup from the new master to the
new standby.  Consider rsync, which might do it *really* fast if not
much has changed yet.

-Kevin

Re: BUG #5011: Standby recovery unable to follow timeline change

От
Heikki Linnakangas
Дата:
James Bardin wrote:
> I'm working on a system where the master and standby servers are expected to
> be able to swap roles repeatedly. The first failover works fine, but the
> ex-master, now standby, can't recover using the shipped logs.
>
> Using recovery_target_timeline='latest' finds the new history file, and
> pg_standby looks good until recovery is attempted. Then we log errors like:
>
> LOG:  unexpected timeline ID 0 in log file 0, segment 1, offset 0
> LOG:  invalid primary checkpoint record
>
> and any changes made after the first failover are lost.
>
> Is this currently possible, or do I have to send a full file-level backup to
> sync the ex-master server with the new master?

That should work. (Note that you do need to restore the ex-master from
the old base backup; you can't just copy recovery.conf to the old
master's data directory.)

I can reproduce that, it's clearly a bug. Thanks for the report!

Our last-minute changes in 8.4 to allow checkpoint record to be created,
while forbidding other WAL insertions, missed that CreateCheckPoint()
calls AdvanceXLInsertBuffer() which requires a valid ThisTimeLineID to
be set. We need to initialize ThisTimeLineID before we call
AdvanceXLInsertBuffer().

I wonder if we should add an XLogInsertAllowed() cross-check to
AdvanceXLInsertBuffer() to catch this kind of bugs in the future.
Writing a new empty WAL page is more or less the same thing as writing a
new WAL record. OTOH, all other AdvanceXLInsertBuffer() calls are from
XLogInsert(), which already checks that, and those calls are in quite
performance-critical paths.

Attached is a straightforward fix which initializes ThisTimeLineID
before the AdvanceXLInsertBuffer() call. Barring objections, I'll commit
 that.

BTW, I'm not sure if the AdvanceXLInsertBuffer() call is really
necessary there. It's just to round up the redo-pointer in the
checkpoint record to where the next WAL record will be, but ISTM the end
location of the last record would work just as well.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index cc6be16..88dc987 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6455,6 +6455,8 @@ CreateCheckPoint(int flags)
     freespace = INSERT_FREESPACE(Insert);
     if (freespace < SizeOfXLogRecord)
     {
+        /* AdvanceXLInsertBuffer() needs a valid ThisTimeLineID */
+        InitXLOGAccess();
         (void) AdvanceXLInsertBuffer(false);
         /* OK to ignore update return flag, since we will do flush anyway */
         freespace = INSERT_FREESPACE(Insert);

Re: BUG #5011: Standby recovery unable to follow timeline change

От
Tom Lane
Дата:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Our last-minute changes in 8.4 to allow checkpoint record to be created,
> while forbidding other WAL insertions, missed that CreateCheckPoint()
> calls AdvanceXLInsertBuffer() which requires a valid ThisTimeLineID to
> be set. We need to initialize ThisTimeLineID before we call
> AdvanceXLInsertBuffer().

Ah-hah ...

> Attached is a straightforward fix which initializes ThisTimeLineID
> before the AdvanceXLInsertBuffer() call. Barring objections, I'll commit
>  that.

... but this solution is astonishingly ugly.  I think we should move up
the LocalSetXLogInsertAllowed call, instead.

            regards, tom lane

Re: BUG #5011: Standby recovery unable to follow timeline change

От
james bardin
Дата:
On Wed, Aug 26, 2009 at 4:40 AM, Heikki
Linnakangas<heikki.linnakangas@enterprisedb.com> wrote:
>> Is this currently possible, or do I have to send a full file-level backup to
>> sync the ex-master server with the new master?
>
> That should work. (Note that you do need to restore the ex-master from
> the old base backup; you can't just copy recovery.conf to the old
> master's data directory.)

I'm relying on an rsync for the data directories after the recovery to
bring up the ex-master.
This works fine, but I'd rather be able to simply rely on wall
shipping to keep them synced without the extra backup procedures.
Thanks for looking into this.

Is this the type of fix that would make it into the next 8.4.x release?

-jim

Re: BUG #5011: Standby recovery unable to follow timeline change

От
Tom Lane
Дата:
I wrote:
> ... but this solution is astonishingly ugly.  I think we should move up
> the LocalSetXLogInsertAllowed call, instead.

Specifically, I propose this patch instead.  I'm not set up to try the
test case though, can you do that?

            regards, tom lane

Index: src/backend/access/transam/xlog.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.348
diff -c -r1.348 xlog.c
*** src/backend/access/transam/xlog.c    12 Aug 2009 20:53:30 -0000    1.348
--- src/backend/access/transam/xlog.c    26 Aug 2009 14:40:28 -0000
***************
*** 6445,6450 ****
--- 6445,6461 ----
      }

      /*
+      * An end-of-recovery checkpoint is created before anyone is allowed to
+      * write WAL. To allow us to write the checkpoint record, temporarily
+      * enable XLogInsertAllowed.  (This also ensures ThisTimeLineID is
+      * initialized, which we need here and in AdvanceXLInsertBuffer.)
+      */
+     if (flags & CHECKPOINT_END_OF_RECOVERY)
+         LocalSetXLogInsertAllowed();
+
+     checkPoint.ThisTimeLineID = ThisTimeLineID;
+
+     /*
       * Compute new REDO record ptr = location of next XLOG record.
       *
       * NB: this is NOT necessarily where the checkpoint record itself will be,
***************
*** 6567,6586 ****
      START_CRIT_SECTION();

      /*
-      * An end-of-recovery checkpoint is created before anyone is allowed to
-      * write WAL. To allow us to write the checkpoint record, temporarily
-      * enable XLogInsertAllowed.
-      */
-     if (flags & CHECKPOINT_END_OF_RECOVERY)
-         LocalSetXLogInsertAllowed();
-
-     /*
-      * This needs to be done after LocalSetXLogInsertAllowed(), else
-      * ThisTimeLineID might still be uninitialized.
-      */
-     checkPoint.ThisTimeLineID = ThisTimeLineID;
-
-     /*
       * Now insert the checkpoint record into XLOG.
       */
      rdata.data = (char *) (&checkPoint);
--- 6578,6583 ----

Re: BUG #5011: Standby recovery unable to follow timeline change

От
Heikki Linnakangas
Дата:
Tom Lane wrote:
> I wrote:
>> ... but this solution is astonishingly ugly.  I think we should move up
>> the LocalSetXLogInsertAllowed call, instead.
>
> Specifically, I propose this patch instead.

It looks better, but leaves the door open for WAL insertions for a much
longer period. Particularly, there's the call to CheckpointGuts(), which
does a lot of things. Maybe I'm just too paranoid about keeping that
sanity check as tight as possible...

> I'm not set up to try the test case though, can you do that?

Sure.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: BUG #5011: Standby recovery unable to follow timeline change

От
Heikki Linnakangas
Дата:
james bardin wrote:
> Is this the type of fix that would make it into the next 8.4.x release?

Yes. this fix will go into the next 8.4.x release.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: BUG #5011: Standby recovery unable to follow timeline change

От
Tom Lane
Дата:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Tom Lane wrote:
>> Specifically, I propose this patch instead.

> It looks better, but leaves the door open for WAL insertions for a much
> longer period. Particularly, there's the call to CheckpointGuts(), which
> does a lot of things. Maybe I'm just too paranoid about keeping that
> sanity check as tight as possible...

Well, I'd prefer to go through the LocalSetXLogInsertAllowed/
reset LocalXLogInsertAllowed dance twice rather than have this code
calling InitXLOGAccess directly (and unconditionally, which was
even worse IMHO).  But I don't actually see anything wrong with
having CheckpointGuts enabled to write WAL.  I could even see that
being *necessary* in some future iteration of the system --- who's
to say that a checkpoint involves adding only one WAL entry?

            regards, tom lane

Re: BUG #5011: Standby recovery unable to follow timeline change

От
Heikki Linnakangas
Дата:
Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> Tom Lane wrote:
>>> Specifically, I propose this patch instead.
>
>> It looks better, but leaves the door open for WAL insertions for a much
>> longer period. Particularly, there's the call to CheckpointGuts(), which
>> does a lot of things. Maybe I'm just too paranoid about keeping that
>> sanity check as tight as possible...
>
> Well, I'd prefer to go through the LocalSetXLogInsertAllowed/
> reset LocalXLogInsertAllowed dance twice rather than have this code
> calling InitXLOGAccess directly (and unconditionally, which was
> even worse IMHO).  But I don't actually see anything wrong with
> having CheckpointGuts enabled to write WAL.  I could even see that
> being *necessary* in some future iteration of the system --- who's
> to say that a checkpoint involves adding only one WAL entry?

Yeah. maybe it's OK. There isn't anything strictly wrong about writing
WAL records at that time - we already allowed it for the rmgr cleanup
routines. CheckpointGuts is shared by recovery points, though, so any
WAL insertions in there would be conditional (like "if (in recovery)
xloginsert()").

I'll commit your patch then.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com