Re: Race condition in recovery?

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: Race condition in recovery?
Дата
Msg-id 20210518.155207.2282568443886608104.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: Race condition in recovery?  (Dilip Kumar <dilipbalaut@gmail.com>)
Ответы Re: Race condition in recovery?  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Re: Race condition in recovery?  (Dilip Kumar <dilipbalaut@gmail.com>)
Список pgsql-hackers
At Mon, 17 May 2021 10:46:24 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in 
> On Mon, May 17, 2021 at 10:09 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Mon, May 17, 2021 at 8:50 AM Kyotaro Horiguchi
> > <horikyota.ntt@gmail.com> wrote:
> > >
> > > Before the commit expectedTLEs is always initialized with just one
> > > entry for the TLI of the last checkpoint record.
> >
> > Right
> >
> > > (1) If XLogFileReadAnyTLI() found the segment but no history file
> > > found, that is, using the dummy TLE-list, expectedTLEs is initialized
> > > with the dummy one-entry list. So there's no behavioral change in this
> > > aspect.
> >
> > Yeah, you are right.
> 
> But do you agree that one line entry will always be a checkpoint
> timeline entry?  Because if you notice below code[1] in function
> "readRecoveryCommandFile();", then you will realize that once we come
> out of this function either the "recoveryTargetTLI" is checkpoint TL
> wherever it was before calling this function or we must have the
> history file.  That means after exiting this function if we execute
> this line (expectedTLEs = readTimeLineHistory(recoveryTargetTLI);)
> that means either "expectedTLEs" could point to one dummy entry which
> will be nothing but the checkpoint TL entry or it will be holding
> complete history.

Right.

> The patch is trying to say that without the history file the
> checkpoint TL will not be found in "expectedTLEs" because the older TL
> (checkpoint TL) is not the ancestor of the target
> timeline(recoveryTargetTLI).  But ideally, either the target timeline
> should be the same as the checkpoint timeline or we must have the
> history file as I stated in the above paragraph.  Am I missing
> something?

Yeah, that has been the most mysterious point here.  So I searched for
a situation the one-entry expectedTLEs does not work.

I vaguely believed that there's a case where REDO point of a
checkpoint is in the timeline previous to the record of the
checkpoint. The previous discussion is based on this case, but that
doesn't seem to happen.  The last replayed checkpoint (that causes a
restartpoint) record is found before protmotion and the first
checkpoint starts after promotion.

A little while ago I tried to make a situation where a checkpoint
record is placed in the previous timeline of the tli written in the
control file. But control file is always written after checkpoint
record is flushed.


I rebooted myself from this:

ee994272ca:
>    There is at least one scenario where this makes a difference: if you take
>    a base backup from a standby server right after a timeline switch, the
>    WAL segment containing the initial checkpoint record will begin with an
>    older timeline ID. Without the timeline history file, recovering that file

And finally I think I could reach the situation the commit wanted to fix.

I took a basebackup from a standby just before replaying the first
checkpoint of the new timeline (by using debugger), without copying
pg_wal.  In this backup, the control file contains checkPointCopy of
the previous timeline.

I modified StartXLOG so that expectedTLEs is set just after first
determining recoveryTargetTLI, then started the grandchild node.  I
have the following error and the server fails to continue replication.

[postmaster] LOG:  starting PostgreSQL 14beta1 on x86_64-pc-linux-gnu...
[startup] LOG:  database system was interrupted while in recovery at log...
[startup] LOG:  set expectedtles tli=6, length=1
[startup] LOG:  Probing history file for TLI=7
[startup] LOG:  entering standby mode
[startup] LOG:  scanning segment 3 TLI 6, source 0
[startup] LOG:  Trying fetching history file for TLI=6
[walreceiver] LOG:  fetching timeline history file for timeline 5 from pri...
[walreceiver] LOG:  fetching timeline history file for timeline 6 from pri...
[walreceiver] LOG:  started streaming ... primary at 0/3000000 on timeline 5
[walreceiver] DETAIL:  End of WAL reached on timeline 5 at 0/30006E0.
[startup] LOG:  unexpected timeline ID 1 in log segment 000000050000000000000003, offset 0
[startup] LOG:  Probing history file for TLI=7
[startup] LOG:  scanning segment 3 TLI 6, source 0
(repeats forever)

This seems like the behavior the patch wanted to fix. (I'm not sure
precisely what happened at the time of the "unexpected timeline ID
1..", though. The line is seen only just after the first conection.)

>    will fail as the older timeline ID is not recognized to be an ancestor of
>    the target timeline. If you try to recover from such a backup, using only
>    streaming replication to fetch the WAL, this patch is required for that to
>    work.

After I reverted the modification, I got the following behavior
instead from the same backup.

[postmaster] LOG:  starting PostgreSQL 14beta1 on x86_64-...
[startup] JST LOG:  database system was interrupted while in recovery at log time 2021-05-18 13:45:59 JST
[startup] JST LOG:  Probing history file for TLI=7
[startup] JST LOG:  entering standby mode
[startup] JST LOG:  Loading history file for TLI=6 (2)
[startup] JST LOG:  Trying reading history file for TLI=6
[startup] JST LOG:  scanning segment 3 TLI 6, source 0
[startup] JST LOG:  Trying fetching history file for TLI=6
[walreceiver] JST LOG:  fetching timeline history file for timeline 5 fro...
[walreceiver] JST LOG:  fetching timeline history file for timeline 6 fro...
[walreceiver] JST LOG:  started streaming ... primary at 0/3000000 on timeline 5
[walreceiver] JST LOG:  replication terminated by primary server
[walreceiver] JST DETAIL:  End of WAL reached on timeline 5 at 0/30006E0.
[startup] LOG:  Loading expectedTLEs for 5
[startup] LOG:  Trying reading history file for TLI=5
[startup] LOG:  Checkpoint record: TLI=5, 0/3000668, rectargetTLI=6, exptles=0x3322a60
[startup] FATAL:  requested timeline 6 does not contain minimum recovery point 0/30007C0 on timeline 6
[postmaster] LOG:  startup process (PID 76526) exited with exit code 1
[postmaster] LOG:  aborting startup due to startup process failure
[postmaster] LOG:  database system is shut down

Aborts.. Yeah, this is the same issue with what is railed here.  So,
still I'm not sure I confirmed the case exactly (since the problem is
still seen.. but I don't want to bother building the version.)...
Anyway reading history file for recoveryTargetTLI instead of
receiveTLI fixes that.

FWIW, you could be get a problematic base backup by the following steps.

0. (make sure /tmp/hoge is removed)
1. apply the attached patch
2. create a primary then start
3. create a standby then start
4. place standby.signal to the primary, then restart it.
5. place the file /tmp/hoge.
6. promote the "primary".
7. You will see a log line like this
   LOG:  WAIT START: CHECKPOINT_ONLINE: TLI=2
8. Take a base backup (without copying WAL files)


> [1]
> if (rtli)
> {
>   /* Timeline 1 does not have a history file, all else should */
>   if (rtli != 1 && !existsTimeLineHistory(rtli))
>    ereport(FATAL,
>   (errmsg("recovery target timeline %u does not exist",
>   rtli)));
>   recoveryTargetTLI = rtli;
>   recoveryTargetIsLatest = false;
> }
> else
> {
>   /* We start the "latest" search from pg_control's timeline */
>   recoveryTargetTLI = findNewestTimeLine(recoveryTargetTLI);
>   recoveryTargetIsLatest = true;
> }

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/timeline.c b/src/backend/access/transam/timeline.c
index 8d0903c175..9483fd055c 100644
--- a/src/backend/access/transam/timeline.c
+++ b/src/backend/access/transam/timeline.c
@@ -55,6 +55,7 @@ restoreTimeLineHistoryFiles(TimeLineID begin, TimeLineID end)
 
     for (tli = begin; tli < end; tli++)
     {
+        elog(LOG, "Trying restoring history file for TLI=%d", tli);
         if (tli == 1)
             continue;
 
@@ -95,6 +96,7 @@ readTimeLineHistory(TimeLineID targetTLI)
 
     if (ArchiveRecoveryRequested)
     {
+        elog(LOG, "Trying reading history file for TLI=%d", targetTLI);
         TLHistoryFileName(histfname, targetTLI);
         fromArchive =
             RestoreArchivedFile(path, histfname, "RECOVERYHISTORY", 0, false);
@@ -231,6 +233,7 @@ existsTimeLineHistory(TimeLineID probeTLI)
 
     if (ArchiveRecoveryRequested)
     {
+        elog(LOG, "Probing history file for TLI=%d", probeTLI);
         TLHistoryFileName(histfname, probeTLI);
         RestoreArchivedFile(path, histfname, "RECOVERYHISTORY", 0, false);
     }
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8d163f190f..afd6a0ce0a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3726,7 +3726,7 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
             snprintf(activitymsg, sizeof(activitymsg), "waiting for %s",
                      xlogfname);
             set_ps_display(activitymsg);
-
+            elog(LOG, "Trying fetching history file for TLI=%d", tli);
             restoredFromArchive = RestoreArchivedFile(path, xlogfname,
                                                       "RECOVERYXLOG",
                                                       wal_segment_size,
@@ -3820,7 +3820,10 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source)
     if (expectedTLEs)
         tles = expectedTLEs;
     else
+    {
+        elog(LOG, "Loading history file for TLI=%d (2)", recoveryTargetTLI);
         tles = readTimeLineHistory(recoveryTargetTLI);
+    }
 
     foreach(cell, tles)
     {
@@ -3834,6 +3837,8 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source)
          * Skip scanning the timeline ID that the logfile segment to read
          * doesn't belong to
          */
+        elog(LOG, "scanning segment %lX TLI %d, source %d", segno, tli, source);
+
         if (hent->begin != InvalidXLogRecPtr)
         {
             XLogSegNo    beginseg = 0;
@@ -3860,6 +3865,7 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source)
                               XLOG_FROM_ARCHIVE, true);
             if (fd != -1)
             {
+                elog(LOG, "found segment %lX TLI %d, from archive", segno, tli);
                 elog(DEBUG1, "got WAL segment from archive");
                 if (!expectedTLEs)
                     expectedTLEs = tles;
@@ -3873,6 +3879,7 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source)
                               XLOG_FROM_PG_WAL, true);
             if (fd != -1)
             {
+                elog(LOG, "found segment %lX TLI %d, from PG_WAL", segno, tli);
                 if (!expectedTLEs)
                     expectedTLEs = tles;
                 return fd;
@@ -6577,6 +6584,8 @@ StartupXLOG(void)
     else
         recoveryTargetTLI = ControlFile->checkPointCopy.ThisTimeLineID;
 
+    expectedTLEs = readTimeLineHistory(recoveryTargetTLI);
+    elog(LOG, "set expectedtles %d, %d", recoveryTargetTLI, list_length(expectedTLEs));
     /*
      * Check for signal files, and if so set up state for offline recovery
      */
@@ -6866,11 +6875,19 @@ StartupXLOG(void)
     if (!XLogRecPtrIsInvalid(ControlFile->minRecoveryPoint) &&
         tliOfPointInHistory(ControlFile->minRecoveryPoint - 1, expectedTLEs) !=
         ControlFile->minRecoveryPointTLI)
+    {
+        ListCell *lc;
+        foreach (lc, expectedTLEs)
+        {
+            TimeLineHistoryEntry *tle = (TimeLineHistoryEntry *) lfirst(lc);
+            elog(LOG, "TLE %d {%X/%X - %X/%X}", tle->tli, LSN_FORMAT_ARGS(tle->begin), LSN_FORMAT_ARGS(tle->end));
+        }
         ereport(FATAL,
                 (errmsg("requested timeline %u does not contain minimum recovery point %X/%X on timeline %u",
                         recoveryTargetTLI,
                         LSN_FORMAT_ARGS(ControlFile->minRecoveryPoint),
                         ControlFile->minRecoveryPointTLI)));
+    }
 
     LastRec = RecPtr = checkPointLoc;
 
@@ -8396,7 +8413,7 @@ ReadCheckpointRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr,
 
     XLogBeginRead(xlogreader, RecPtr);
     record = ReadRecord(xlogreader, LOG, true);
-
+    elog(LOG, "Checkpoint record: TLI=%d, %X/%X, rectargetTLI=%d, exptles=%p", xlogreader->seg.ws_tli,
LSN_FORMAT_ARGS(xlogreader->ReadRecPtr),recoveryTargetTLI, expectedTLEs);
 
     if (record == NULL)
     {
         if (!report)
@@ -10211,6 +10228,19 @@ xlog_redo(XLogReaderState *record)
         CheckPoint    checkPoint;
 
         memcpy(&checkPoint, XLogRecGetData(record), sizeof(CheckPoint));
+        {
+            struct stat b;
+            bool f = true;
+            while (stat("/tmp/hoge", &b) == 0)
+            {
+                if (f)
+                    elog(LOG, "WAIT START: CHECKPOINT_ONLINE: TLI=%d", checkPoint.ThisTimeLineID);
+                f = false;
+                sleep(1);
+            }
+            if (!f)
+                elog(LOG, "WAIT END: CHECKPOINT_ONLINE");
+        }
         /* In an ONLINE checkpoint, treat the XID counter as a minimum */
         LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
         if (FullTransactionIdPrecedes(ShmemVariableCache->nextXid,
@@ -12595,7 +12625,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
                              * TLI, rather than the position we're reading.
                              */
                             tli = tliOfPointInHistory(tliRecPtr, expectedTLEs);
-
+                            elog(LOG, "%X/%X is on TLI %X", LSN_FORMAT_ARGS(tliRecPtr), tli);
                             if (curFileTLI > 0 && tli < curFileTLI)
                                 elog(ERROR, "according to history file, WAL location %X/%X belongs to timeline %u, but
previousrecovered WAL file came from timeline %u",
 
                                      LSN_FORMAT_ARGS(tliRecPtr),
@@ -12662,7 +12692,11 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
                         if (readFile < 0)
                         {
                             if (!expectedTLEs)
-                                expectedTLEs = readTimeLineHistory(receiveTLI);
+                            {
+                                
+                                elog(LOG, "Loading expectedTLEs for %d (%d)", recoveryTargetTLI, receiveTLI);
+                                expectedTLEs = readTimeLineHistory(recoveryTargetTLI);
+                            }
                             readFile = XLogFileRead(readSegNo, PANIC,
                                                     receiveTLI,
                                                     XLOG_FROM_STREAM, false);

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: Forget close an open relation in ReorderBufferProcessTXN()
Следующее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: Race condition in recovery?