pgsql: Fix control file update done in restartpoints still running afte

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема pgsql: Fix control file update done in restartpoints still running afte
Дата
Msg-id E1nnqWU-000eto-D9@gemulon.postgresql.org
обсуждение исходный текст
Список pgsql-committers
Fix control file update done in restartpoints still running after promotion

If a cluster is promoted (aka the control file shows a state different
than DB_IN_ARCHIVE_RECOVERY) while CreateRestartPoint() is still
processing, this function could miss an update of the control file for
"checkPoint" and "checkPointCopy" but still do the recycling and/or
removal of the past WAL segments, assuming that the to-be-updated LSN
values should be used as reference points for the cleanup.  This causes
a follow-up restart attempting crash recovery to fail with a PANIC on a
missing checkpoint record if the end-of-recovery checkpoint triggered by
the promotion did not complete while the cluster abruptly stopped or
crashed before the completion of this checkpoint.  The PANIC would be
caused by the redo LSN referred in the control file as located in a
segment already gone, recycled by the previous restartpoint with
"checkPoint" out-of-sync in the control file.

This commit fixes the update of the control file during restartpoints so
as "checkPoint" and "checkPointCopy" are updated even if the cluster has
been promoted while a restartpoint is running, to be on par with the set
of WAL segments actually recycled in the end of CreateRestartPoint().

This problem exists in all the stable branches.  However, commit
7ff23c6, by removing the last call of CreateCheckPoint() from the
startup process, has made this bug much easier to reason about as
concurrent checkpoints are not possible anymore.  No backpatch is done
yet, mostly out of caution from me as a point release is close by, but
we need to think harder about the case of concurrent checkpoints at
promotion if the bgwriter is not considered as running by the startup
process in ~v14, so this change is done only on HEAD for the moment.

Reported-by: Fujii Masao, Rui Zhao
Author: Kyotaro Horiguchi
Reviewed-by: Nathan Bossart, Michael Paquier
Discussion: https://postgr.es/m/20220316.102444.2193181487576617583.horikyota.ntt@gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/7863ee4def653f2c2193cb0b0cf4a8f0f3ca6c56

Modified Files
--------------
src/backend/access/transam/xlog.c | 54 ++++++++++++++++++++++++---------------
1 file changed, 33 insertions(+), 21 deletions(-)


В списке pgsql-committers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: pgsql: Release notes for 14.3, 13.7, 12.11, 11.16, 10.21.
Следующее
От: Andres Freund
Дата:
Сообщение: pgsql: Disable 031_recovery_conflict.pl until after minor releases.