Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Дата
Msg-id 20220208.165822.2155503452120234158.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: Add checkpoint and redo LSN to LogCheckpointEnd log message  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Ответы Re: Add checkpoint and redo LSN to LogCheckpointEnd log message  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Список pgsql-hackers
At Mon, 7 Feb 2022 13:51:31 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> 
> 
> On 2022/02/07 12:02, Kyotaro Horiguchi wrote:
> > - If any later checkpoint/restartpoint has been established, just skip
> >    remaining task then return false. (!chkpt_was_latest)
> >    (I'm not sure this can happen, though.)
> > - we update control file only when archive recovery is still ongoing.
> 
> This comment seems to conflict with what your PoC patch does. Because
> with the patch, ControlFile->checkPoint and
> ControlFile->checkPointCopy seem to be updated even when
> ControlFile->state != DB_IN_ARCHIVE_RECOVERY.

Ah, yeah, by "update" I meant that "move forward". Sorry for confusing
wording.

> I agree with what your PoC patch does for now. When we're not in
> archive recovery state, checkpoint and REDO locations in pg_control
> should be updated but min recovery point should be reset to invalid
> one (which instruments that subsequent crash recovery should replay
> all available WAL files).

Yes.  All buffers before the last recovery point's end have been
flushed out so the recovery point is valid as a checkpoint.  On the
other hand minRecoveryPoint is no longer needed and actually is
ignored at the next crash recovery.  We can leave it alone but it is
consistent that it is cleared.

> > - Otherwise reset minRecoveryPoint then continue.
> > Do you have any thoughts or opinions?
> 
> Regarding chkpt_was_latest, whether the state is
> DB_IN_ARCHIVE_RECOVERY or not, if checkpoint and redo locations in
> pg_control are updated, IMO we don't need to skip the "remaining
> tasks". Since those locations are updated and subsequent crash
> recovery will start from that redo location, for example, ISTM that we
> can safely delete old WAL files prior to the redo location as the
> "remaining tasks". Thought?

If I read you correctly, the PoC works that way. It updates pg_control
if the restart point is latest then performs the remaining cleanup
tasks in that case. Recovery state doesn't affect this process.

I reexamined about the possibility of concurrent checkpoints.

Both CreateCheckPoint and CreateRestartPoint are called from
checkpointer loop, shutdown handler of checkpointer and standalone
process.  So I can't see a possibility of concurrent checkpoints.

In the past we had a time when startup process called CreateCheckPoint
directly in the crash recovery case where checkpoint is not running
but since 7ff23c6d27 checkpoint is started before startup process
starts.  So I conclude that that cannot happen.

So the attached takes away the path for the case where the restart
point is overtaken by a concurrent checkpoint.

Thus.. the attached removes the ambiguity of of the proposed patch
about the LSNs in the restartpoint-ending log message.

Thoughts?

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 43888869846b6de00fbddb215300a8ff774bbc04 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Tue, 8 Feb 2022 16:42:53 +0900
Subject: [PATCH v1] Get rid of unused path to handle concurrent checkpoints

CreateRestartPoint considered the case a concurrent checkpoint is
running. But after 7ff23c6d27 we no longer launch multiple checkpoints
simultaneously.  That code path, if it is passed, may leave
unrecoverable database by removing WAL segments that are required by
the last established restartpoint.
---
 src/backend/access/transam/xlog.c | 53 +++++++++++++++++++------------
 1 file changed, 32 insertions(+), 21 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 958220c495..01a345e8bd 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9752,29 +9752,30 @@ CreateRestartPoint(int flags)
     PriorRedoPtr = ControlFile->checkPointCopy.redo;
 
     /*
-     * Update pg_control, using current time.  Check that it still shows
-     * DB_IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing;
-     * this is a quick hack to make sure nothing really bad happens if somehow
-     * we get here after the end-of-recovery checkpoint.
+     * Update pg_control, using current time.
      */
     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
-    if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY &&
-        ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
+
+    /* We mustn't have a concurrent checkpoint that advances checkpoint LSN */
+    Assert(lastCheckPoint.redo > ControlFile->checkPointCopy.redo);
+
+    ControlFile->checkPoint = lastCheckPointRecPtr;
+    ControlFile->checkPointCopy = lastCheckPoint;
+
+    /*
+     * Ensure minRecoveryPoint is past the checkpoint record while archive
+     * recovery is still ongoing.  Normally, this will have happened already
+     * while writing out dirty buffers, but not necessarily - e.g. because no
+     * buffers were dirtied.  We do this because a non-exclusive base backup
+     * uses minRecoveryPoint to determine which WAL files must be included in
+     * the backup, and the file (or files) containing the checkpoint record
+     * must be included, at a minimum. Note that for an ordinary restart of
+     * recovery there's no value in having the minimum recovery point any
+     * earlier than this anyway, because redo will begin just after the
+     * checkpoint record.
+     */
+    if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY)
     {
-        ControlFile->checkPoint = lastCheckPointRecPtr;
-        ControlFile->checkPointCopy = lastCheckPoint;
-
-        /*
-         * Ensure minRecoveryPoint is past the checkpoint record.  Normally,
-         * this will have happened already while writing out dirty buffers,
-         * but not necessarily - e.g. because no buffers were dirtied.  We do
-         * this because a non-exclusive base backup uses minRecoveryPoint to
-         * determine which WAL files must be included in the backup, and the
-         * file (or files) containing the checkpoint record must be included,
-         * at a minimum. Note that for an ordinary restart of recovery there's
-         * no value in having the minimum recovery point any earlier than this
-         * anyway, because redo will begin just after the checkpoint record.
-         */
         if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr)
         {
             ControlFile->minRecoveryPoint = lastCheckPointEndPtr;
@@ -9786,8 +9787,18 @@ CreateRestartPoint(int flags)
         }
         if (flags & CHECKPOINT_IS_SHUTDOWN)
             ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
-        UpdateControlFile();
     }
+    else
+    {
+        /*
+         * We have exited from archive-recovery mode after this restartpoint
+         * started. Crash recovery ever after should always recover to the end
+         * of WAL.
+         */
+        ControlFile->minRecoveryPoint = InvalidXLogRecPtr;
+        ControlFile->minRecoveryPointTLI = 0;
+    }
+    UpdateControlFile();
     LWLockRelease(ControlFileLock);
 
     /*
-- 
2.27.0


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dilip Kumar
Дата:
Сообщение: Re: [BUG]Update Toast data failure in logical replication
Следующее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: RFC: Logging plan of the running query