Re: Use of backup_label not noted in log

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Use of backup_label not noted in log
Дата
Msg-id 20231118180142.gaitteticpeccg5g@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: Use of backup_label not noted in log  (Laurenz Albe <laurenz.albe@cybertec.at>)
Ответы Re: Use of backup_label not noted in log  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Hi,

On 2023-11-17 06:41:46 +0100, Laurenz Albe wrote:
> On Thu, 2023-11-16 at 20:18 -0800, Andres Freund wrote:
> > I've often had to analyze what caused corruption in PG instances, where the
> > symptoms match not having had backup_label in place when bringing on the
> > node. However that's surprisingly hard - the only log messages that indicate
> > use of backup_label are at DEBUG1.
> >
> > Given how crucial use of backup_label is and how frequently people do get it
> > wrong, I think we should add a LOG message - it's not like use of backup_label
> > is a frequent thing in the life of a postgres instance and is going to swamp
> > the log.  And I think we should backpatch that addition.
>
> +1
>
> I am not sure about the backpatch: it is not a bug, and we should not wantonly
> introduce new log messages in a minor release.  Some monitoring system may
> get confused.

I think log monitoring need (and do) handle unknown log messages
gracefully. You're constantly encountering them.  If were to change an
existing log message in the back branches it'd be a different story.

The reason for backpatching is that this is by far the most common reason for
corrupted systems in the wild that I have seen. And there's no way to
determine from the logs whether something has gone right or wrong - not really
a bug, but a pretty substantial weakness. And we're going to have to deal with
< 17 releases for 5 years, so making this at least somewhat diagnosable seems
like a good idea.


> What about adding it to the "redo starts at" message, something like
>
>   redo starts at 12/12345678, taken from control file
>
> or
>
>   redo starts at 12/12345678, taken from backup label

I think it'd make sense to log use of backup_label earlier than that - the
locations from backup_label might end up not being available in the archive,
the primary or locally, and we'll error out with "could not locate a valid
checkpoint record".

I'd probably just do it within the if (read_backup_label()) block in
InitWalRecovery(), *before* the ReadCheckpointRecord().

I do like the idea of expanding the "redo starts at" message
though. E.g. including minRecoveryLSN, ControlFile->backupStartPoint,
ControlFile->backupEndPoint would provide information about when the node
might become consistent.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: Schema variables - new implementation for Postgres 15
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Use of backup_label not noted in log