Re: pg_basebackup misses to report checksum error

Поиск
Список
Период
Сортировка
От Ashwin Agrawal
Тема Re: pg_basebackup misses to report checksum error
Дата
Msg-id CALfoeiuHshubwom94EMug2h8v5y9JAGi-twkrMf6-xaM6kmHOQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pg_basebackup misses to report checksum error  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: pg_basebackup misses to report checksum error  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-hackers
On Wed, May 6, 2020 at 3:02 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, May 6, 2020 at 5:48 PM Ashwin Agrawal <aagrawal@pivotal.io> wrote:
> If pg_basebackup is not able to read BLCKSZ content from file, then it
> just emits a warning "could not verify checksum in file "____" block
> X: read buffer size X and page size 8192 differ" currently but misses
> to error with "checksum error occurred". Only if it can read 8192 and
> checksum mismatch happens will it error in the end.

I don't think it's a good idea to conflate "hey, we can't checksum
this because the size is strange" with "hey, the checksum didn't
match". Suppose the a file has 1000 full blocks and a partial block.
All 1000 blocks have good checksums. With your change, ISTM that we'd
first emit a warning saying that the checksum couldn't be verified,
and then we'd emit a second warning saying that there was 1 checksum
verification failure, which would also be reported to the stats
system. I don't think that's what we want.

I feel the intent of reporting "total checksum verification failure" is to report corruption. Which way is the secondary piece of the puzzle. Not being able to read checksum itself to verify is also corruption and is checksum verification failure I think. WARNINGs will provide fine grained clarity on what type of checksum verification failure it is, so I am not sure we really need fine grained clarity in "total numbers" to differentiate these two types.

Not reporting anything to the stats system and at end reporting there is checksum failure would be more weird, right? Or will say ERRCODE_DATA_CORRUPTED with some other message and not checksum verification failure.

There might be an argument
for making this code trigger...

        ereport(ERROR,
                (errcode(ERRCODE_DATA_CORRUPTED),
                 errmsg("checksum verification failure during base backup")));

...but I wouldn't for that reason inflate the number of blocks that
are reported as having failures.

When checksum verification is turned on and the issue is detected, I strongly feel ERROR must be triggered as silently reporting success doesn't seem right.
I can introduce one more variable just to capture and track files with such cases. But will we report them separately to stats? How? Also, do we want to have separate WARNING for the total number of files in this category? Those all seem slight complications but if wind is blowing in that direction, I am ready to fly that way.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Korotkov
Дата:
Сообщение: Re: PG 13 release notes, first draft
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: pg_basebackup misses to report checksum error