Re: Use of backup_label not noted in log

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: Use of backup_label not noted in log
Дата
Msg-id ZVsZJzDGBzUyMCrH@paquier.xyz
обсуждение исходный текст
Ответ на Re: Use of backup_label not noted in log  (Andres Freund <andres@anarazel.de>)
Ответы Re: Use of backup_label not noted in log  (Laurenz Albe <laurenz.albe@cybertec.at>)
Re: Use of backup_label not noted in log  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Sat, Nov 18, 2023 at 01:49:15PM -0800, Andres Freund wrote:
> Note that the LSN in the "continuing" case is the one the backup started at,
> not where recovery will start.
>
> I've wondered whether it's worth also adding an explicit message just after
> ReachedEndOfBackup(), but it seems far less urgent due to the existing
> "consistent recovery state reached at %X/%X" message.

Upgrading the surrounding DEBUG1 to a LOG is another option, but I
agree that I've seen less that as being an actual problem in the field
compared to the famous I-removed-a-backup-label-and-I-m-still-up,
until this user sees signs of corruption after recovery was finished,
sometimes days after putting back an instance online.

> Playing around with this a bit, I'm wondering if we instead should remove that
> message, and emit something more informative earlier on. If there's a problem,
> you kinda want the information before we finally get to the loop in
> PerformWalLRecovery(). If e.g. there's no WAL you'll only get
> LOG:  invalid checkpoint record
> PANIC:  could not locate a valid checkpoint record

I was looking at this code a few weeks ago and have on my stack of
list to do an item about sending a patch to make this exact message
PANIC more talkative as there are a lot of instances with
log_min_messages > log.

> which is delightfully lacking in details.

With a user panicking as much as the server itself, that's even more
tasty.

+       if (ControlFile->backupStartPoint != InvalidXLogRecPtr)
+           ereport(LOG,
+                   (errmsg("continuing to start from base backup with redo LSN %X/%X",
+                           LSN_FORMAT_ARGS(ControlFile->backupStartPoint))));

"Continuing to start" sounds a bit weird to me, though, considering
that there are a few LOGs that say "starting" when there is a signal
file, but I don't have a better idea on top of my mind.  So that
sounds OK here.
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Simplify if/else logic of walsender CreateReplicationSlot
Следующее
От: Dilip Kumar
Дата:
Сообщение: Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock