Re: Make mesage at end-of-recovery less scary.

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: Make mesage at end-of-recovery less scary.
Дата
Msg-id 20211109.162751.1330179292938994347.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Make mesage at end-of-recovery less scary.  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Ответы Re: Make mesage at end-of-recovery less scary.  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Список pgsql-hackers
Thank you for the comments!

At Tue, 9 Nov 2021 09:53:15 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Mon, Nov 08, 2021 at 02:59:46PM +0900, Kyotaro Horiguchi wrote:
> > While checking the behavior for the case of missing-contrecord, I
> > noticed that emode_for_corrupt_record() doesn't work as expected since
> > readSource is reset to XLOG_FROM_ANY after a read failure. We could
> > remember the last failed source but pg_wal should have been visited if
> > page read error happened so I changed the function so that it treats
> > XLOG_FROM_ANY the same way with XLOG_FROM_PG_WAL.
> 
> FWIW, I am not much a fan of assuming that it is fine to use
> XLOG_FROM_ANY as a condition here.  The comments on top of
> emode_for_corrupt_record() make it rather clear what the expectations
> are, and this is the default readSource.

The readSource is expected by the function to be the failed source but
it goes back to XLOG_FROM_ANY on page read failure.  So the function
*is* standing on the wrong assumption.  I noticed that currentSource
holds the last accessed source (but forgot about that). So it is
exactly what we need here.  No longer need to introduce the unclear
assumption by using it.

> > (Otherwise we see "LOG: reached end-of-WAL at .." message after
> >  "WARNING: missing contrecord at.." message.)
> 
> +      /* broken record found */
> +      ereport(WARNING,
> +                      (errmsg("redo is skipped"),
> +                       errhint("This suggests WAL file corruption. You might need to check the database.")));
> This looks rather scary to me, FWIW, and this could easily be reached

Yes, the message is intentionally scary, since we don't come here in
the case of clean WAL:)

> if one forgets about EndOfWAL while hacking on xlogreader.c.
> Unlikely so, still.

I don't understand.  Isn't it the case of almost every feature?

The patch compells hackers to maintain the condition for recovery
being considered cleanly ended.  If the last record doesn't meet the
condition, the WAL file should be considered having a
problem. However, I don't see the condition expanded to have another
term in future.

Even if someone including myself broke that condition, we will at
worst unwantedly see a scary message.  And I believe almost all
hackers can easily find it a bug from the DETAILED message shown along
aside.  I'm not sure such bugs could be found in development phase,
though..

> +       report_invalid_record(state,
> +                             "missing contrecord at %X/%X",
> +                             LSN_FORMAT_ARGS(RecPtr));
> Isn't there a risk here to break applications checking after error
> messages stored in the WAL reader after seeing a contrecord?

I'm not sure you are mentioning the case where no message is stored
previously, or the case where already a message is stored. The former
is fine as the record is actually broken. But I was missing the latter
case.  In this version I avoided to overwite the error message.

> +   if (record->xl_tot_len == 0)
> +   {
> +       /* This is strictly not an invalid state, so phrase it as so. */
> +       report_invalid_record(state,
> +                             "record length is 0 at %X/%X",
> +                             LSN_FORMAT_ARGS(RecPtr));
> +       state->EndOfWAL = true;
> +       return false;
> +   }
> This assumes that a value of 0 for xl_tot_len is a synonym of the end
> of WAL, but cannot we have also a corrupted record in this case in the
> shape of xl_tot_len being 0?  We validate the full record after
> reading the header, so it seems to me that we should not assume that
> things are just ending as proposed in this patch.

Yeah, it's the most serious concern to me. So I didn't hide the
detailed message in the "end-of-wal reached message".

> LOG:  reached end of WAL at 0/512F198 on timeline 1 in pg_wal during crash recovery
> DETAIL:  record length is 0 at 0/512F210

I believe everyone regards zero record length as fine unless something
wrong is seen afterwards.  However, we can extend the check to the
whole record header. I think it is by far nearer to the perfect for
almost all cases. The attached emits the following message for the
good (true end-of-WAL) case.

> LOG:  reached end of WAL at 0/512F4A0 on timeline 1 in pg_wal during crash recovery
> DETAIL:  empty record header found at 0/512F518

If garbage bytes are found in the header area, the following log will
be left. I think we can have a better message here.

> WARNING:  garbage record header at 0/2095458
> LOG:  redo done at 0/2095430 system usage: CPU: user: 0.03 s, system: 0.01 s, elapsed: 0.04 s


This is the updated version.

- emode_for_currupt_record() now uses currentSource instead of
  readSource.

- If zero record length is faced, make sure the whole header is zeroed
  before deciding it as the end-of-WAL.

- Do not overwrite existig message when missing contrecord is
  detected.  The message added here is seen in the TAP test log
  026_overwrite_contrecord_standby.log

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 1d5f6e707f8d67172eea79689c8a5f4d86889d3e Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Fri, 28 Feb 2020 15:52:58 +0900
Subject: [PATCH v6] Make End-Of-Recovery error less scary

When recovery in any type ends, we see a bit scary error message like
"invalid record length" that suggests something serious is
happening. Actually if recovery meets a record with length = 0, that
usually means it finished applying all available WAL records.

Make this message less scary as "reached end of WAL". Instead raise
the error level for other kind of WAL failure to WARNING.
---
 src/backend/access/transam/xlog.c       | 89 +++++++++++++++++++------
 src/backend/access/transam/xlogreader.c | 42 ++++++++++++
 src/backend/replication/walreceiver.c   |  3 +-
 src/include/access/xlogreader.h         |  1 +
 4 files changed, 111 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5cda30836f..e90c69810b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4477,6 +4477,7 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
     for (;;)
     {
         char       *errormsg;
+        XLogRecPtr    ErrRecPtr = InvalidXLogRecPtr;
 
         record = XLogReadRecord(xlogreader, &errormsg);
         ReadRecPtr = xlogreader->ReadRecPtr;
@@ -4494,6 +4495,16 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
             {
                 abortedRecPtr = xlogreader->abortedRecPtr;
                 missingContrecPtr = xlogreader->missingContrecPtr;
+                ErrRecPtr = abortedRecPtr;
+            }
+            else
+            {
+                /*
+                 * NULL ReadRecPtr means we could not read a record at
+                 * beginning. In that case EndRecPtr is storing the LSN of the
+                 * record we tried to read.
+                 */
+                ErrRecPtr = ReadRecPtr ? ReadRecPtr : EndRecPtr;
             }
 
             if (readFile >= 0)
@@ -4503,13 +4514,12 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
             }
 
             /*
-             * We only end up here without a message when XLogPageRead()
-             * failed - in that case we already logged something. In
-             * StandbyMode that only happens if we have been triggered, so we
-             * shouldn't loop anymore in that case.
+             * If we get here for other than end-of-wal, emit the error message
+             * right now. Otherwise the message if any is shown as a part of
+             * the end-of-WAL message below.
              */
-            if (errormsg)
-                ereport(emode_for_corrupt_record(emode, EndRecPtr),
+            if (!xlogreader->EndOfWAL && errormsg)
+                ereport(emode_for_corrupt_record(emode, ErrRecPtr),
                         (errmsg_internal("%s", errormsg) /* already translated */ ));
         }
 
@@ -4540,11 +4550,12 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
             /* Great, got a record */
             return record;
         }
-        else
+
+        /* No valid record available from this source */
+        lastSourceFailed = true;
+
+        if (!fetching_ckpt)
         {
-            /* No valid record available from this source */
-            lastSourceFailed = true;
-
             /*
              * If archive recovery was requested, but we were still doing
              * crash recovery, switch to archive recovery and retry using the
@@ -4557,11 +4568,17 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
              * we'd have no idea how far we'd have to replay to reach
              * consistency.  So err on the safe side and give up.
              */
-            if (!InArchiveRecovery && ArchiveRecoveryRequested &&
-                !fetching_ckpt)
+            if (!InArchiveRecovery && ArchiveRecoveryRequested)
             {
+                /*
+                 * We don't report this as LOG, since we don't stop recovery
+                 * here
+                 */
                 ereport(DEBUG1,
-                        (errmsg_internal("reached end of WAL in pg_wal, entering archive recovery")));
+                        (errmsg_internal("reached end of WAL at %X/%X on timeline %u in %s during crash recovery,
enteringarchive recovery",
 
+                                         LSN_FORMAT_ARGS(ErrRecPtr),
+                                         replayTLI,
+                                         xlogSourceNames[currentSource])));
                 InArchiveRecovery = true;
                 if (StandbyModeRequested)
                     StandbyMode = true;
@@ -4609,12 +4626,33 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
                 continue;
             }
 
-            /* In standby mode, loop back to retry. Otherwise, give up. */
-            if (StandbyMode && !CheckForStandbyTrigger())
-                continue;
-            else
-                return NULL;
+            /*
+             *  If we haven't emit an error message, we have safely reached the
+             *  end-of-WAL.
+             */
+            if (emode_for_corrupt_record(LOG, ErrRecPtr) == LOG)
+            {
+                char *fmt;
+
+                if (StandbyMode)
+                    fmt = gettext_noop("reached end of WAL at %X/%X on timeline %u in %s during standby mode");
+                else if (InArchiveRecovery)
+                    fmt = gettext_noop("reached end of WAL at %X/%X on timeline %u in %s during archive recovery");
+                else
+                    fmt = gettext_noop("reached end of WAL at %X/%X on timeline %u in %s during crash recovery");
+
+                ereport(LOG,
+                        (errmsg(fmt, LSN_FORMAT_ARGS(ErrRecPtr), replayTLI,
+                                xlogSourceNames[currentSource]),
+                         (errormsg ? errdetail_internal("%s", errormsg) : 0)));
+            }
         }
+
+        /* In standby mode, loop back to retry. Otherwise, give up. */
+        if (StandbyMode && !CheckForStandbyTrigger())
+            continue;
+        else
+            return NULL;
     }
 }
 
@@ -7544,7 +7582,7 @@ StartupXLOG(void)
         else
         {
             /* just have to read next record after CheckPoint */
-            record = ReadRecord(xlogreader, LOG, false, ThisTimeLineID);
+            record = ReadRecord(xlogreader, WARNING, false, ThisTimeLineID);
         }
 
         if (record != NULL)
@@ -7781,7 +7819,7 @@ StartupXLOG(void)
                 }
 
                 /* Else, try to fetch the next WAL record */
-                record = ReadRecord(xlogreader, LOG, false, ThisTimeLineID);
+                record = ReadRecord(xlogreader, WARNING, false, ThisTimeLineID);
             } while (record != NULL);
 
             /*
@@ -7841,13 +7879,20 @@ StartupXLOG(void)
 
             InRedo = false;
         }
-        else
+        else if (xlogreader->EndOfWAL)
         {
             /* there are no WAL records following the checkpoint */
             ereport(LOG,
                     (errmsg("redo is not required")));
 
         }
+        else
+        {
+            /* broken record found */
+            ereport(WARNING,
+                    (errmsg("redo is skipped"),
+                     errhint("This suggests WAL file corruption. You might need to check the database.")));
+        }
 
         /*
          * This check is intentionally after the above log messages that
@@ -13147,7 +13192,7 @@ emode_for_corrupt_record(int emode, XLogRecPtr RecPtr)
 {
     static XLogRecPtr lastComplaint = 0;
 
-    if (readSource == XLOG_FROM_PG_WAL && emode == LOG)
+    if (currentSource == XLOG_FROM_PG_WAL && emode <= WARNING)
     {
         if (RecPtr == lastComplaint)
             emode = DEBUG1;
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index f39f8044a9..273b927cd9 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -121,6 +121,7 @@ XLogReaderAllocate(int wal_segment_size, const char *waldir,
         pfree(state);
         return NULL;
     }
+    state->EndOfWAL = false;
     state->errormsg_buf[0] = '\0';
 
     /*
@@ -292,6 +293,7 @@ XLogReadRecord(XLogReaderState *state, char **errormsg)
     /* reset error state */
     *errormsg = NULL;
     state->errormsg_buf[0] = '\0';
+    state->EndOfWAL = false;
 
     ResetDecoder(state);
     state->abortedRecPtr = InvalidXLogRecPtr;
@@ -588,6 +590,16 @@ err:
          */
         state->abortedRecPtr = RecPtr;
         state->missingContrecPtr = targetPagePtr;
+
+        /*
+         * If the messages is not set yet, that means we failed to load the
+         * page for the record.  Otherwise do not hide the existing message at
+         * it should be more detailed.
+         */
+        if (state->errormsg_buf[0] == '\0')
+            report_invalid_record(state,
+                                  "missing contrecord at %X/%X",
+                                  LSN_FORMAT_ARGS(RecPtr));
     }
 
     /*
@@ -730,6 +742,36 @@ ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
                       XLogRecPtr PrevRecPtr, XLogRecord *record,
                       bool randAccess)
 {
+    if (record->xl_tot_len == 0)
+    {
+        /*
+         * We are almost sure reaching the end of WAL, make sure that the whole
+         * header is zeroed.
+         */
+        char   *p = (char *)record;
+        char   *pe = (char *)record + SizeOfXLogRecord;
+
+        while (*p == 0 && p < pe)
+            p++;
+
+        if (p == pe)
+        {
+            /* it is completely zeroed, call it a day  */
+            report_invalid_record(state, "empty record header found at %X/%X",
+                                  LSN_FORMAT_ARGS(RecPtr));
+
+            /* notify end-of-wal to callers */
+            state->EndOfWAL = true;
+        }
+        else
+        {
+            /* Otherwise we found a garbage header.. */
+            report_invalid_record(state, "garbage record header at %X/%X",
+                                  LSN_FORMAT_ARGS(RecPtr));
+        }
+
+        return false;
+    }
     if (record->xl_tot_len < SizeOfXLogRecord)
     {
         report_invalid_record(state,
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 7a7eb3784e..ba3c4bd550 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -471,8 +471,7 @@ WalReceiverMain(void)
                         else if (len < 0)
                         {
                             ereport(LOG,
-                                    (errmsg("replication terminated by primary server"),
-                                     errdetail("End of WAL reached on timeline %u at %X/%X.",
+                                    (errmsg("replication terminated by primary server on timeline %u at %X/%X.",
                                                startpointTLI,
                                                LSN_FORMAT_ARGS(LogstreamResult.Write))));
                             endofwal = true;
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index de6fd791fe..1241b85838 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -174,6 +174,7 @@ struct XLogReaderState
      */
     XLogRecPtr    ReadRecPtr;        /* start of last record read */
     XLogRecPtr    EndRecPtr;        /* end+1 of last record read */
+    bool        EndOfWAL;        /* the last attempt was EOW? */
 
     /*
      * Set at the end of recovery: the start point of a partial record at the
-- 
2.27.0


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: Skipping logical replication transactions on subscriber side
Следующее
От: Dilip Kumar
Дата:
Сообщение: Re: Skipping logical replication transactions on subscriber side