Обсуждение: Add checkpoint/redo LSNs to recovery errors.

Поиск
Список
Период
Сортировка

Add checkpoint/redo LSNs to recovery errors.

От
David Steele
Дата:
Hackers,

This patch adds checkpoint/redo LSNs to recovery error messages where 
they may be useful for debugging.

When backup_label is present the LSNs are already output in a log 
message, but it still seems like a good idea to repeat them.

When backup_label is not present, the checkpoint LSN is not logged 
unless backup recovery is in progress or the checkpoint is found. In the 
case where a backup is restored but the backup_label is missing, the 
checkpoint LSN is not logged so it is useful for debugging to have it in 
the error message.

Regards,
-David
Вложения

Re: Add checkpoint/redo LSNs to recovery errors.

От
Michael Paquier
Дата:
On Thu, Feb 29, 2024 at 10:53:15AM +1300, David Steele wrote:
> This patch adds checkpoint/redo LSNs to recovery error messages where they
> may be useful for debugging.

Thanks for following up on that!

> When backup_label is not present, the checkpoint LSN is not logged unless
> backup recovery is in progress or the checkpoint is found. In the case where
> a backup is restored but the backup_label is missing, the checkpoint LSN is
> not logged so it is useful for debugging to have it in the error message.

             ereport(PANIC,
-                    (errmsg("could not locate a valid checkpoint record")));
+                    (errmsg("could not locate a valid checkpoint record at %X/%X",
+                            LSN_FORMAT_ARGS(CheckPointLoc))));
         }

I've seen this one in the field occasionally, so that's really a
welcome addition IMO.

I've scanned a bit xlogrecovery.c, and I have spotted a couple of that
could gain more information.

    ereport(PANIC,
            (errmsg("invalid redo record in shutdown checkpoint")));
[...]
    ereport(PANIC,
            (errmsg("invalid redo in checkpoint record")));
These two could mention CheckPointLoc and checkPoint.redo.

    ereport(PANIC,
            (errmsg("invalid next transaction ID")));
Perhaps some XID information could be added here?

    ereport(FATAL,
            (errmsg("WAL ends before consistent recovery point")));
[...]
    ereport(FATAL,
            (errmsg("WAL ends before end of online backup"),

These two are in xlog.c, and don't mention backupStartPoint for the
first one.  Perhaps there's a point in adding some information about
EndOfLog and LocalMinRecoveryPoint as well?
--
Michael

Вложения

Re: Add checkpoint/redo LSNs to recovery errors.

От
David Steele
Дата:
On 2/29/24 16:42, Michael Paquier wrote:
> On Thu, Feb 29, 2024 at 10:53:15AM +1300, David Steele wrote:
>> This patch adds checkpoint/redo LSNs to recovery error messages where they
>> may be useful for debugging.
> 
> I've scanned a bit xlogrecovery.c, and I have spotted a couple of that
> could gain more information.
> 
>      ereport(PANIC,
>              (errmsg("invalid redo record in shutdown checkpoint")));
> [...]
>      ereport(PANIC,
>              (errmsg("invalid redo in checkpoint record")));
> These two could mention CheckPointLoc and checkPoint.redo.
> 
>      ereport(PANIC,
>              (errmsg("invalid next transaction ID")));
> Perhaps some XID information could be added here?
> 
>      ereport(FATAL,
>              (errmsg("WAL ends before consistent recovery point")));
> [...]
>      ereport(FATAL,
>              (errmsg("WAL ends before end of online backup"),
> 
> These two are in xlog.c, and don't mention backupStartPoint for the
> first one.  Perhaps there's a point in adding some information about
> EndOfLog and LocalMinRecoveryPoint as well?

For now I'd like to just focus on these three messages that are related 
to a missing backup_label or a misconfiguration of restore_command when 
backup_label is present.

No doubt there are many other recovery log messages that could be 
improved, but I'd rather not bog down the patch.

Regards,
-David



Re: Add checkpoint/redo LSNs to recovery errors.

От
Michael Paquier
Дата:
On Sun, Mar 10, 2024 at 04:58:19PM +1300, David Steele wrote:
> No doubt there are many other recovery log messages that could be improved,
> but I'd rather not bog down the patch.

Fair argument, and these additions are useful when taken
independently.  I've applied your suggestions for now.
--
Michael

Вложения