Re: PANIC during crash recovery of a recently promoted standby

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: PANIC during crash recovery of a recently promoted standby
Дата
Msg-id 20180607.195829.164100769.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: PANIC during crash recovery of a recently promoted standby  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: PANIC during crash recovery of a recently promoted standby  (Michael Paquier <michael@paquier.xyz>)
Re: PANIC during crash recovery of a recently promoted standby  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
Hello.

At Thu, 24 May 2018 16:57:07 +0900, Michael Paquier <michael@paquier.xyz> wrote in
<20180524075707.GE15445@paquier.xyz>
> On Mon, May 14, 2018 at 01:14:22PM +0530, Pavan Deolasee wrote:
> > Looks like I didn't understand Alvaro's comment when he mentioned it to me
> > off-list. But I now see what Michael and Alvaro mean and that indeed seems
> > like a problem. I was thinking that the test for (ControlFile->state ==
> > DB_IN_ARCHIVE_RECOVERY) will ensure that minRecoveryPoint can't be updated
> > after the standby is promoted. While that's true for a DB_IN_PRODUCTION,  the
> > RestartPoint may finish after we have written end-of-recovery record, but
> > before we're in production and thus the minRecoveryPoint may again be set.
> 
> Yeah, this has been something I considered as well first, but I was not
> confident enough that setting up minRecoveryPoint to InvalidXLogRecPtr
> was actually a safe thing for timeline switches.
> 
> So I have spent a good portion of today testing and playing with it to
> be confident enough that this was right, and I have finished with the
> attached.  The patch adds a new flag to XLogCtl which marks if the
> control file has been updated after the end-of-recovery record has been
> written, so as minRecoveryPoint does not get updated because of a
> restart point running in parallel.
> 
> I have also reworked the test case you sent, removing the manuals sleeps
> and replacing them with correct wait points.  There is also no point to
> wait after promotion as pg_ctl promote implies a wait.  Another
> important thing is that you need to use wal_log_hints = off to see a 
> crash, which is something that allows_streaming actually enables.
> 
> Comments are welcome.

As the result of some poking around, my dignosis is different
from yours.

(I believe that) By definition recovery doesn't end until the
end-of-recovery check point ends so from the viewpoint I think it
is wrong to clear ControlFile->minRecoveryPoint before the end.

Invalid-page checking during crash recovery is hamful rather than
useless. It is done by CheckRecoveryConsistency even in crash
recovery against its expectation because there's a case where
minRecoveryPoint is valid but InArchiveRecovery is false. The two
variable there seems to be in contradiction with each other.

The immediate cause of the contradition is that StartXLOG wrongly
initializes local minRecoveryPoint from control file's value even
under crash recovery. updateMinRecoveryPoint also should be
turned off during crash recovery. The case of crash after last
checkpoint end has been treated in UpdateMinRecoveryPoint but
forgot to consider this case, where crash recovery has been
started while control file is still holding valid
minRecoveryPoint.

Finally, the attached patch seems fixing the issue. It passes
015_promotion.pl and the problem doesn't happen with
014_promotion_bug.pl.  Also this passes the existing tests
002-014.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 192c36bc96135d9108c499d01016f7eb6fc28fd1 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Thu, 7 Jun 2018 16:02:41 +0900
Subject: [PATCH] Fix wrong behavior during crash recovery

After being killed before end-of-recovery checkpoint ends, server
restarts in crash recovery mode. However it can wrongly performs
invalid-page checks of WAL records, which is not expected during crash
recovery, then results in PANIC.

This patch prevents the wrong checking by suppressing needless updates
of minRecoveryPoint during crash recovery.
---
 src/backend/access/transam/xlog.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index adbd6a2126..283c26cb6c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -821,8 +821,11 @@ static XLogSource XLogReceiptSource = 0;    /* XLOG_FROM_* code */
 static XLogRecPtr ReadRecPtr;    /* start of last record read */
 static XLogRecPtr EndRecPtr;    /* end+1 of last record read */
 
-static XLogRecPtr minRecoveryPoint; /* local copy of
-                                     * ControlFile->minRecoveryPoint */
+/*
+ * local copy of ControlFile->minRecoveryPoint. InvalidXLogRecPtr means we're
+ * in crash recovery
+ */
+static XLogRecPtr minRecoveryPoint = InvalidXLogRecPtr;
 static TimeLineID minRecoveryPointTLI;
 static bool updateMinRecoveryPoint = true;
 
@@ -2717,14 +2720,7 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
     minRecoveryPoint = ControlFile->minRecoveryPoint;
     minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 
-    /*
-     * An invalid minRecoveryPoint means that we need to recover all the WAL,
-     * i.e., we're doing crash recovery.  We never modify the control file's
-     * value in that case, so we can short-circuit future checks here too.
-     */
-    if (minRecoveryPoint == 0)
-        updateMinRecoveryPoint = false;
-    else if (force || minRecoveryPoint < lsn)
+    if (force || minRecoveryPoint < lsn)
     {
         XLogRecPtr    newMinRecoveryPoint;
         TimeLineID    newMinRecoveryPointTLI;
@@ -6841,6 +6837,7 @@ StartupXLOG(void)
                                 ControlFile->checkPointCopy.ThisTimeLineID,
                                 recoveryTargetTLI)));
             ControlFile->state = DB_IN_CRASH_RECOVERY;
+            updateMinRecoveryPoint = false;
         }
         ControlFile->checkPoint = checkPointLoc;
         ControlFile->checkPointCopy = checkPoint;
@@ -6852,6 +6849,10 @@ StartupXLOG(void)
                 ControlFile->minRecoveryPoint = checkPoint.redo;
                 ControlFile->minRecoveryPointTLI = checkPoint.ThisTimeLineID;
             }
+
+            /* also initialize our local copies */
+            minRecoveryPoint = ControlFile->minRecoveryPoint;
+            minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
         }
 
         /*
@@ -6889,10 +6890,6 @@ StartupXLOG(void)
         /* No need to hold ControlFileLock yet, we aren't up far enough */
         UpdateControlFile();
 
-        /* initialize our local copy of minRecoveryPoint */
-        minRecoveryPoint = ControlFile->minRecoveryPoint;
-        minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
-
         /*
          * Reset pgstat data, because it may be invalid after recovery.
          */
@@ -7858,6 +7855,8 @@ CheckRecoveryConsistency(void)
     if (XLogRecPtrIsInvalid(minRecoveryPoint))
         return;
 
+    Assert(InArchiveRecovery);
+
     /*
      * assume that we are called in the startup process, and hence don't need
      * a lock to read lastReplayedEndRecPtr
-- 
2.16.3


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ashutosh Bapat
Дата:
Сообщение: Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
Следующее
От: Magnus Hagander
Дата:
Сообщение: Re: Typo in planner README