Re: Make mesage at end-of-recovery less scary.

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: Make mesage at end-of-recovery less scary.
Дата
Msg-id 20220127.103547.185630769052774136.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: Make mesage at end-of-recovery less scary.  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Ответы Re: Make mesage at end-of-recovery less scary.  (Pavel Borisov <pashkin.elfe@gmail.com>)
Список pgsql-hackers
At Tue, 25 Jan 2022 17:34:56 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> This v8 is changed in...
> 
> - Added tests to 011_crash_recovery.pl
> 
> - Fixed a bug that server emits "end-of-wal" messages even if it have
>   emitted an error message for the same LSN.
> 
> - Changed XLogReaderValidatePageHeader() so that it recognizes an
>   empty page as end-of-WAL.
> 
> - Made pg_waldump conscious of end-of-wal.
> 
> While doing the last item, I noticed that pg_waldump shows the wrong
> LSN as the error position.  Concretely it emits the LSN of the last
> sound WAL record as the error position.  I will post a bug-fix patch
> for the issue after confirmation.

I noticed that I added a useless error message "garbage record
header", but it is a kind of invalid record length.  So I removed the
message. That change makes the logic for EOW in ValidXLogRecordHeader
and XLogReaderValidatePageHeader share the same flow.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 57cb251f7cacbb96066ead4543b9f12f5b3c7062 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Fri, 28 Feb 2020 15:52:58 +0900
Subject: [PATCH v9] Make End-Of-Recovery error less scary

When recovery in any type ends, we see a bit scary error message like
"invalid record length" that suggests something serious is
happening. Actually if recovery meets a record with length = 0, that
usually means it finished applying all available WAL records.

Make this message less scary as "reached end of WAL". Instead, raise
the error level for other kind of WAL failure to WARNING.
---
 src/backend/access/transam/xlog.c         |  91 +++++++++++++-----
 src/backend/access/transam/xlogreader.c   |  61 ++++++++++++
 src/backend/replication/walreceiver.c     |   3 +-
 src/bin/pg_waldump/pg_waldump.c           |  13 ++-
 src/include/access/xlogreader.h           |   1 +
 src/test/recovery/t/011_crash_recovery.pl | 110 +++++++++++++++++++++-
 6 files changed, 251 insertions(+), 28 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dfe2a0bcce..5727e0939f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4480,6 +4480,7 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
     for (;;)
     {
         char       *errormsg;
+        XLogRecPtr    ErrRecPtr = InvalidXLogRecPtr;
 
         record = XLogReadRecord(xlogreader, &errormsg);
         if (record == NULL)
@@ -4495,6 +4496,18 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
             {
                 abortedRecPtr = xlogreader->abortedRecPtr;
                 missingContrecPtr = xlogreader->missingContrecPtr;
+                ErrRecPtr = abortedRecPtr;
+            }
+            else
+            {
+                /*
+                 * NULL ReadRecPtr means we could not read a record at the
+                 * beginning. In that case EndRecPtr is storing the LSN of the
+                 * record we tried to read.
+                 */
+                ErrRecPtr =
+                    xlogreader->ReadRecPtr ?
+                    xlogreader->ReadRecPtr : xlogreader->EndRecPtr;
             }
 
             if (readFile >= 0)
@@ -4504,13 +4517,12 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
             }
 
             /*
-             * We only end up here without a message when XLogPageRead()
-             * failed - in that case we already logged something. In
-             * StandbyMode that only happens if we have been triggered, so we
-             * shouldn't loop anymore in that case.
+             * If we get here for other than end-of-wal, emit the error message
+             * right now. Otherwise the message if any is shown as a part of
+             * the end-of-WAL message below.
              */
-            if (errormsg)
-                ereport(emode_for_corrupt_record(emode, xlogreader->EndRecPtr),
+            if (!xlogreader->EndOfWAL && errormsg)
+                ereport(emode_for_corrupt_record(emode, ErrRecPtr),
                         (errmsg_internal("%s", errormsg) /* already translated */ ));
         }
 
@@ -4541,11 +4553,12 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
             /* Great, got a record */
             return record;
         }
-        else
+
+        /* No valid record available from this source */
+        lastSourceFailed = true;
+
+        if (!fetching_ckpt)
         {
-            /* No valid record available from this source */
-            lastSourceFailed = true;
-
             /*
              * If archive recovery was requested, but we were still doing
              * crash recovery, switch to archive recovery and retry using the
@@ -4558,11 +4571,17 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
              * we'd have no idea how far we'd have to replay to reach
              * consistency.  So err on the safe side and give up.
              */
-            if (!InArchiveRecovery && ArchiveRecoveryRequested &&
-                !fetching_ckpt)
+            if (!InArchiveRecovery && ArchiveRecoveryRequested)
             {
+                /*
+                 * We don't report this as LOG, since we don't stop recovery
+                 * here
+                 */
                 ereport(DEBUG1,
-                        (errmsg_internal("reached end of WAL in pg_wal, entering archive recovery")));
+                        (errmsg_internal("reached end of WAL at %X/%X on timeline %u in %s during crash recovery,
enteringarchive recovery",
 
+                                         LSN_FORMAT_ARGS(ErrRecPtr),
+                                         replayTLI,
+                                         xlogSourceNames[currentSource])));
                 InArchiveRecovery = true;
                 if (StandbyModeRequested)
                     StandbyMode = true;
@@ -4610,12 +4629,33 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
                 continue;
             }
 
-            /* In standby mode, loop back to retry. Otherwise, give up. */
-            if (StandbyMode && !CheckForStandbyTrigger())
-                continue;
-            else
-                return NULL;
+            /*
+             *  If we haven't emit an error message, we have safely reached the
+             *  end-of-WAL.
+             */
+            if (emode_for_corrupt_record(LOG, ErrRecPtr) == LOG)
+            {
+                char *fmt;
+
+                if (StandbyMode)
+                    fmt = gettext_noop("reached end of WAL at %X/%X on timeline %u in %s during standby mode");
+                else if (InArchiveRecovery)
+                    fmt = gettext_noop("reached end of WAL at %X/%X on timeline %u in %s during archive recovery");
+                else
+                    fmt = gettext_noop("reached end of WAL at %X/%X on timeline %u in %s during crash recovery");
+
+                ereport(LOG,
+                        (errmsg(fmt, LSN_FORMAT_ARGS(ErrRecPtr), replayTLI,
+                                xlogSourceNames[currentSource]),
+                         (errormsg ? errdetail_internal("%s", errormsg) : 0)));
+            }
         }
+
+        /* In standby mode, loop back to retry. Otherwise, give up. */
+        if (StandbyMode && !CheckForStandbyTrigger())
+            continue;
+        else
+            return NULL;
     }
 }
 
@@ -7544,7 +7584,7 @@ StartupXLOG(void)
         else
         {
             /* just have to read next record after CheckPoint */
-            record = ReadRecord(xlogreader, LOG, false, replayTLI);
+            record = ReadRecord(xlogreader, WARNING, false, replayTLI);
         }
 
         if (record != NULL)
@@ -7782,7 +7822,7 @@ StartupXLOG(void)
                 }
 
                 /* Else, try to fetch the next WAL record */
-                record = ReadRecord(xlogreader, LOG, false, replayTLI);
+                record = ReadRecord(xlogreader, WARNING, false, replayTLI);
             } while (record != NULL);
 
             /*
@@ -7842,13 +7882,20 @@ StartupXLOG(void)
 
             InRedo = false;
         }
-        else
+        else if (xlogreader->EndOfWAL)
         {
             /* there are no WAL records following the checkpoint */
             ereport(LOG,
                     (errmsg("redo is not required")));
 
         }
+        else
+        {
+            /* broken record found */
+            ereport(WARNING,
+                    (errmsg("redo is skipped"),
+                     errhint("This suggests WAL file corruption. You might need to check the database.")));
+        }
 
         /*
          * This check is intentionally after the above log messages that
@@ -13097,7 +13144,7 @@ emode_for_corrupt_record(int emode, XLogRecPtr RecPtr)
 {
     static XLogRecPtr lastComplaint = 0;
 
-    if (readSource == XLOG_FROM_PG_WAL && emode == LOG)
+    if (currentSource == XLOG_FROM_PG_WAL && emode <= WARNING)
     {
         if (RecPtr == lastComplaint)
             emode = DEBUG1;
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 35029cf97d..418fb66ef2 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -121,6 +121,7 @@ XLogReaderAllocate(int wal_segment_size, const char *waldir,
         pfree(state);
         return NULL;
     }
+    state->EndOfWAL = false;
     state->errormsg_buf[0] = '\0';
 
     /*
@@ -292,6 +293,7 @@ XLogReadRecord(XLogReaderState *state, char **errormsg)
     /* reset error state */
     *errormsg = NULL;
     state->errormsg_buf[0] = '\0';
+    state->EndOfWAL = false;
 
     ResetDecoder(state);
     state->abortedRecPtr = InvalidXLogRecPtr;
@@ -588,6 +590,15 @@ err:
          */
         state->abortedRecPtr = RecPtr;
         state->missingContrecPtr = targetPagePtr;
+
+        /*
+         * If the message is not set yet, that means we failed to load the
+         * page for the record.  Otherwise do not hide the existing message.
+         */
+        if (state->errormsg_buf[0] == '\0')
+            report_invalid_record(state,
+                                  "missing contrecord at %X/%X",
+                                  LSN_FORMAT_ARGS(RecPtr));
     }
 
     /*
@@ -730,6 +741,31 @@ ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
                       XLogRecPtr PrevRecPtr, XLogRecord *record,
                       bool randAccess)
 {
+    if (record->xl_tot_len == 0)
+    {
+        /*
+         * We are almost sure reaching the end of WAL, make sure that the whole
+         * header is filled with zeroes.
+         */
+        char   *p = (char *)record;
+        char   *pe = (char *)record + SizeOfXLogRecord;
+
+        while (*p == 0 && p < pe)
+            p++;
+
+        if (p == pe)
+        {
+            /* it is completely zeroed, call it a day  */
+            report_invalid_record(state, "empty record header found at %X/%X",
+                                  LSN_FORMAT_ARGS(RecPtr));
+
+            /* notify end-of-wal to callers */
+            state->EndOfWAL = true;
+            return false;
+        }
+
+        /* The same condition will be caught as invalid record length */
+    }
     if (record->xl_tot_len < SizeOfXLogRecord)
     {
         report_invalid_record(state,
@@ -836,6 +872,31 @@ XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr,
 
     XLogSegNoOffsetToRecPtr(segno, offset, state->segcxt.ws_segsize, recaddr);
 
+    StaticAssertStmt(XLOG_PAGE_MAGIC != 0, "XLOG_PAGE_MAGIC is zero");
+
+    if (hdr->xlp_magic == 0)
+    {
+        /* Regard an empty page as End-Of-WAL */
+        int    i;
+
+        for (i = 0 ; i < XLOG_BLCKSZ && phdr[i] == 0 ; i++);
+        if (i == XLOG_BLCKSZ)
+        {
+            char        fname[MAXFNAMELEN];
+
+            XLogFileName(fname, state->seg.ws_tli, segno,
+                         state->segcxt.ws_segsize);
+
+            report_invalid_record(state,
+                                  "empty page in log segment %s, offset %u",
+                                  fname,
+                                  offset);
+            state->EndOfWAL = true;
+            return false;
+        }
+
+        /* The same condition will be caught as invalid magic number */
+    }
     if (hdr->xlp_magic != XLOG_PAGE_MAGIC)
     {
         char        fname[MAXFNAMELEN];
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index b39fce8c23..3034f8281e 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -471,8 +471,7 @@ WalReceiverMain(void)
                         else if (len < 0)
                         {
                             ereport(LOG,
-                                    (errmsg("replication terminated by primary server"),
-                                     errdetail("End of WAL reached on timeline %u at %X/%X.",
+                                    (errmsg("replication terminated by primary server on timeline %u at %X/%X.",
                                                startpointTLI,
                                                LSN_FORMAT_ARGS(LogstreamResult.Write))));
                             endofwal = true;
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index a6251e1a96..3745e76488 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -1176,9 +1176,16 @@ main(int argc, char **argv)
         exit(0);
 
     if (errormsg)
-        fatal_error("error in WAL record at %X/%X: %s",
-                    LSN_FORMAT_ARGS(xlogreader_state->ReadRecPtr),
-                    errormsg);
+    {
+        if (xlogreader_state->EndOfWAL)
+            pg_log_info("end of WAL at %X/%X: %s",
+                        LSN_FORMAT_ARGS(xlogreader_state->EndRecPtr),
+                        errormsg);
+        else
+            fatal_error("error in WAL record at %X/%X: %s",
+                        LSN_FORMAT_ARGS(xlogreader_state->EndRecPtr),
+                        errormsg);
+    }
 
     XLogReaderFree(xlogreader_state);
 
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index 477f0efe26..3eeba220a1 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -174,6 +174,7 @@ struct XLogReaderState
      */
     XLogRecPtr    ReadRecPtr;        /* start of last record read */
     XLogRecPtr    EndRecPtr;        /* end+1 of last record read */
+    bool        EndOfWAL;        /* the last attempt was EOW? */
 
     /*
      * Set at the end of recovery: the start point of a partial record at the
diff --git a/src/test/recovery/t/011_crash_recovery.pl b/src/test/recovery/t/011_crash_recovery.pl
index 3892aba3e5..67d264df26 100644
--- a/src/test/recovery/t/011_crash_recovery.pl
+++ b/src/test/recovery/t/011_crash_recovery.pl
@@ -10,9 +10,11 @@ use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
 use Config;
+use IPC::Run;
 
-plan tests => 3;
+plan tests => 11;
 
+my $reached_eow_pat = "reached end of WAL at ";
 my $node = PostgreSQL::Test::Cluster->new('primary');
 $node->init(allows_streaming => 1);
 $node->start;
@@ -50,7 +52,15 @@ is($node->safe_psql('postgres', qq[SELECT pg_xact_status('$xid');]),
 
 # Crash and restart the postmaster
 $node->stop('immediate');
+my $logstart = get_log_size($node);
 $node->start;
+my $max_attempts = 360;
+while ($max_attempts-- >= 0)
+{
+    last if (find_in_log($node, $reached_eow_pat, $logstart));
+    sleep 0.5;
+}
+ok ($max_attempts >= 0, "end-of-wal is logged");
 
 # Make sure we really got a new xid
 cmp_ok($node->safe_psql('postgres', 'SELECT pg_current_xact_id()'),
@@ -62,3 +72,101 @@ is($node->safe_psql('postgres', qq[SELECT pg_xact_status('$xid');]),
 
 $stdin .= "\\q\n";
 $tx->finish;    # wait for psql to quit gracefully
+
+my $segsize = $node->safe_psql('postgres',
+       qq[SELECT setting FROM pg_settings WHERE name = 'wal_segment_size';]);
+
+# make sure no records afterwards go to the next segment
+$node->safe_psql('postgres', qq[
+                 SELECT pg_switch_wal();
+                 CHECKPOINT;
+                 CREATE TABLE t();
+]);
+$node->stop('immediate');
+
+# identify REDO WAL file
+my $cmd = "pg_controldata -D " . $node->data_dir();
+my $chkptfile;
+$cmd = ['pg_controldata', '-D', $node->data_dir()];
+$stdout = '';
+$stderr = '';
+IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr;
+ok($stdout =~ /^Latest checkpoint's REDO WAL file:[ \t] *(.+)$/m,
+   "checkpoint file is identified");
+my $chkptfile = $1;
+
+# identify the last record
+my $walfile = $node->data_dir() . "/pg_wal/$chkptfile";
+$cmd = ['pg_waldump', $walfile];
+$stdout = '';
+$stderr = '';
+my $lastrec;
+IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr;
+foreach my $l (split(/\r?\n/, $stdout))
+{
+    $lastrec = $l;
+}
+ok(defined $lastrec, "last WAL record is extracted");
+ok($stderr =~ /end of WAL at ([0-9A-F\/]+): .* at \g1/,
+   "pg_waldump emits the correct ending message");
+
+# read the last record LSN excluding leading zeroes
+ok ($lastrec =~ /, lsn: 0\/0*([1-9A-F][0-9A-F]+),/,
+    "LSN of the last record identified");
+my $lastlsn = $1;
+
+# corrupt the last record
+my $offset = hex($lastlsn) % $segsize;
+open(my $segf, '+<', $walfile) or die "failed to open $walfile\n";
+seek($segf, $offset, 0);  # halfway break the last record
+print $segf "\0\0\0\0";
+close($segf);
+
+# pg_waldump complains about the corrupted record
+$stdout = '';
+$stderr = '';
+IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr;
+ok($stderr =~ /fatal: error in WAL record at 0\/$lastlsn: .* at 0\/$lastlsn/,
+   "pg_waldump emits the correct error message");
+
+# also server complains
+$logstart = get_log_size($node);
+$node->start;
+$max_attempts = 360;
+while ($max_attempts-- >= 0)
+{
+    last if (find_in_log($node, "WARNING:  invalid record length at 0/$lastlsn: wanted [0-9]+, got 0",
+                         $logstart));
+    sleep 0.5;
+}
+ok($max_attempts >= 0, "header error is logged at $lastlsn");
+
+# and the end-of-wal messages shouldn't be seen
+# the same message has been confirmed in the past
+ok(!find_in_log($node, $reached_eow_pat, $logstart),
+   "false log message is not emitted");
+
+$node->stop('immediate');
+
+#### helper routines
+# return the size of logfile of $node in bytes
+sub get_log_size
+{
+    my ($node) = @_;
+
+    return (stat $node->logfile)[7];
+}
+
+# find $pat in logfile of $node after $off-th byte
+sub find_in_log
+{
+    my ($node, $pat, $off) = @_;
+
+    $off = 0 unless defined $off;
+    my $log = PostgreSQL::Test::Utils::slurp_file($node->logfile);
+    return 0 if (length($log) <= $off);
+
+    $log = substr($log, $off);
+
+    return $log =~ m/$pat/;
+}
-- 
2.27.0


В списке pgsql-hackers по дате отправления:

Предыдущее
От: TAO TANG
Дата:
Сообщение: Question on partition pruning involving stable operator: timestamptz_ge_date
Следующее
От: "Imseih (AWS), Sami"
Дата:
Сообщение: Re: Add index scan progress to pg_stat_progress_vacuum