Re: [patch] Fix checksum verification in base backups for zero page headers

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: [patch] Fix checksum verification in base backups for zero page headers
Дата
Msg-id 20201007081852.GC30037@paquier.xyz
обсуждение исходный текст
Ответ на Re: [patch] Fix checksum verification in base backups for zero page headers  (Michael Banck <michael.banck@credativ.de>)
Ответы Re: [patch] Fix checksum verification in base backups for zero page headers  (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
Список pgsql-hackers
On Fri, Sep 25, 2020 at 08:53:27AM +0200, Michael Banck wrote:
> Oh right, I've fixed up the commit message in the attached V4.

Not much a fan of what's proposed here, for a couple of reasons:
- If the page is not new, we should check if the header is sane or
not.
- It may be better to actually count checksum failures if these are
repeatable in a given block, and report them.
- The code would be more simple with a "continue" added for a block
detected as new or with a LSN newer than the backup start.
- The new error messages are confusing, and I think that these are
hard to translate in a meaningful way.

So I think that we should try to use PageIsVerified() directly in the
code path of basebackup.c, and this requires a couple of changes to
make the routine more extensible:
- Addition of a dboid argument for pgstat_report_checksum_failure(),
whose call needs to be changed to use the *_in_db() flavor.
- Addition of an option to decide if a log should be generated or
not.
- Addition of an option to control if a checksum failure should be
reported to pgstat or not, even if this is actually linked to the
previous point, I have seen cases where being able to control both
separately would be helpful, particularly the log part.

For the last two ones, I would use an extensible argument based on
bits32 with a set of flags that the caller can set at will.  Something
else I would recommend here is to use an error context for the case
where we want to log a retry number in the base backup loop that
accepts false positives, with the blkno and the physical file name
when we find failures after retries.
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: Add a log message on recovery startup before syncing datadir
Следующее
От: Greg Nancarrow
Дата:
Сообщение: Re: Parallel INSERT (INTO ... SELECT ...)