Обсуждение: pg_basebackup misses to report checksum error

Поиск
Список
Период
Сортировка

pg_basebackup misses to report checksum error

От
Ashwin Agrawal
Дата:
If pg_basebackup is not able to read BLCKSZ content from file, then it
just emits a warning "could not verify checksum in file "____" block
X: read buffer size X and page size 8192 differ" currently but misses
to error with "checksum error occurred". Only if it can read 8192 and
checksum mismatch happens will it error in the end.

Repro is pretty simple:
/usr/local/pgsql/bin/initdb -k /tmp/postgres
/usr/local/pgsql/bin/pg_ctl -D /tmp/postgres -l /tmp/logfile start
# just create random file of size not in multiple of 8192
echo "corruption" > /tmp/postgres/base/12696/44444

Without the fix:
$ /usr/local/pgsql/bin/pg_basebackup -D /tmp/dummy
WARNING:  could not verify checksum in file "./base/12696/44444", block 0: read buffer size 11 and page size 8192 differ
$ echo $?
0

With the fix:
$ /usr/local/pgsql/bin/pg_basebackup -D /tmp/dummy
WARNING:  could not verify checksum in file "./base/12696/44444", block 0: read buffer size 11 and page size 8192 differ
pg_basebackup: error: checksum error occurred
$ echo $?
1


I think it's an important case to be handled and should not be silently skipped,
unless I am missing something. This one liner should fix it:

diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index fbdc28ec39..68febbedf0 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1641,6 +1641,7 @@ sendFile(const char *readfilename, const char *tarfilename,
                                                        "differ",
                                                        readfilename, blkno, (int) cnt, BLCKSZ)));
                        verify_checksum = false;
+                      checksum_failures++;
                }
 
                if (verify_checksum)

Re: pg_basebackup misses to report checksum error

От
Robert Haas
Дата:
On Wed, May 6, 2020 at 5:48 PM Ashwin Agrawal <aagrawal@pivotal.io> wrote:
> If pg_basebackup is not able to read BLCKSZ content from file, then it
> just emits a warning "could not verify checksum in file "____" block
> X: read buffer size X and page size 8192 differ" currently but misses
> to error with "checksum error occurred". Only if it can read 8192 and
> checksum mismatch happens will it error in the end.

I don't think it's a good idea to conflate "hey, we can't checksum
this because the size is strange" with "hey, the checksum didn't
match". Suppose the a file has 1000 full blocks and a partial block.
All 1000 blocks have good checksums. With your change, ISTM that we'd
first emit a warning saying that the checksum couldn't be verified,
and then we'd emit a second warning saying that there was 1 checksum
verification failure, which would also be reported to the stats
system. I don't think that's what we want. There might be an argument
for making this code trigger...

        ereport(ERROR,
                (errcode(ERRCODE_DATA_CORRUPTED),
                 errmsg("checksum verification failure during base backup")));

...but I wouldn't for that reason inflate the number of blocks that
are reported as having failures.

YMMV, of course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pg_basebackup misses to report checksum error

От
Ashwin Agrawal
Дата:
On Wed, May 6, 2020 at 3:02 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, May 6, 2020 at 5:48 PM Ashwin Agrawal <aagrawal@pivotal.io> wrote:
> If pg_basebackup is not able to read BLCKSZ content from file, then it
> just emits a warning "could not verify checksum in file "____" block
> X: read buffer size X and page size 8192 differ" currently but misses
> to error with "checksum error occurred". Only if it can read 8192 and
> checksum mismatch happens will it error in the end.

I don't think it's a good idea to conflate "hey, we can't checksum
this because the size is strange" with "hey, the checksum didn't
match". Suppose the a file has 1000 full blocks and a partial block.
All 1000 blocks have good checksums. With your change, ISTM that we'd
first emit a warning saying that the checksum couldn't be verified,
and then we'd emit a second warning saying that there was 1 checksum
verification failure, which would also be reported to the stats
system. I don't think that's what we want.

I feel the intent of reporting "total checksum verification failure" is to report corruption. Which way is the secondary piece of the puzzle. Not being able to read checksum itself to verify is also corruption and is checksum verification failure I think. WARNINGs will provide fine grained clarity on what type of checksum verification failure it is, so I am not sure we really need fine grained clarity in "total numbers" to differentiate these two types.

Not reporting anything to the stats system and at end reporting there is checksum failure would be more weird, right? Or will say ERRCODE_DATA_CORRUPTED with some other message and not checksum verification failure.

There might be an argument
for making this code trigger...

        ereport(ERROR,
                (errcode(ERRCODE_DATA_CORRUPTED),
                 errmsg("checksum verification failure during base backup")));

...but I wouldn't for that reason inflate the number of blocks that
are reported as having failures.

When checksum verification is turned on and the issue is detected, I strongly feel ERROR must be triggered as silently reporting success doesn't seem right.
I can introduce one more variable just to capture and track files with such cases. But will we report them separately to stats? How? Also, do we want to have separate WARNING for the total number of files in this category? Those all seem slight complications but if wind is blowing in that direction, I am ready to fly that way.

Re: pg_basebackup misses to report checksum error

От
Stephen Frost
Дата:
Greetings,

* Ashwin Agrawal (aagrawal@pivotal.io) wrote:
> On Wed, May 6, 2020 at 3:02 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > On Wed, May 6, 2020 at 5:48 PM Ashwin Agrawal <aagrawal@pivotal.io> wrote:
> > > If pg_basebackup is not able to read BLCKSZ content from file, then it
> > > just emits a warning "could not verify checksum in file "____" block
> > > X: read buffer size X and page size 8192 differ" currently but misses
> > > to error with "checksum error occurred". Only if it can read 8192 and
> > > checksum mismatch happens will it error in the end.
> >
> > I don't think it's a good idea to conflate "hey, we can't checksum
> > this because the size is strange" with "hey, the checksum didn't
> > match". Suppose the a file has 1000 full blocks and a partial block.
> > All 1000 blocks have good checksums. With your change, ISTM that we'd
> > first emit a warning saying that the checksum couldn't be verified,
> > and then we'd emit a second warning saying that there was 1 checksum
> > verification failure, which would also be reported to the stats
> > system. I don't think that's what we want.
>
> I feel the intent of reporting "total checksum verification failure" is to
> report corruption. Which way is the secondary piece of the puzzle. Not
> being able to read checksum itself to verify is also corruption and is
> checksum verification failure I think. WARNINGs will provide fine grained
> clarity on what type of checksum verification failure it is, so I am not
> sure we really need fine grained clarity in "total numbers" to
> differentiate these two types.

Are we absolutely sure that there's no way for a partial block to end up
being seen by pg_basebackup, which is just doing routine filesystem
read() calls, during normal operation though..?  Across all platforms?

We certainly don't want to end up reporting a false positive by saying
that there's been corruption when it was just that the file was getting
extended and a read() happened to catch an incomplete write(), or
something along those lines.

Thanks,

Stephen

Вложения

Re: pg_basebackup misses to report checksum error

От
Ashwin Agrawal
Дата:
On Thu, May 7, 2020 at 10:25 AM Stephen Frost <sfrost@snowman.net> wrote:
Greetings,

* Ashwin Agrawal (aagrawal@pivotal.io) wrote:
> On Wed, May 6, 2020 at 3:02 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > On Wed, May 6, 2020 at 5:48 PM Ashwin Agrawal <aagrawal@pivotal.io> wrote:
> > > If pg_basebackup is not able to read BLCKSZ content from file, then it
> > > just emits a warning "could not verify checksum in file "____" block
> > > X: read buffer size X and page size 8192 differ" currently but misses
> > > to error with "checksum error occurred". Only if it can read 8192 and
> > > checksum mismatch happens will it error in the end.
> >
> > I don't think it's a good idea to conflate "hey, we can't checksum
> > this because the size is strange" with "hey, the checksum didn't
> > match". Suppose the a file has 1000 full blocks and a partial block.
> > All 1000 blocks have good checksums. With your change, ISTM that we'd
> > first emit a warning saying that the checksum couldn't be verified,
> > and then we'd emit a second warning saying that there was 1 checksum
> > verification failure, which would also be reported to the stats
> > system. I don't think that's what we want.
>
> I feel the intent of reporting "total checksum verification failure" is to
> report corruption. Which way is the secondary piece of the puzzle. Not
> being able to read checksum itself to verify is also corruption and is
> checksum verification failure I think. WARNINGs will provide fine grained
> clarity on what type of checksum verification failure it is, so I am not
> sure we really need fine grained clarity in "total numbers" to
> differentiate these two types.

Are we absolutely sure that there's no way for a partial block to end up
being seen by pg_basebackup, which is just doing routine filesystem
read() calls, during normal operation though..?  Across all platforms?

Okay, that's a good point, I didn't think about it. This comment to skip verifying checksum, I suppose convinces, can't be sure and hence can't report partial blocks as corruption.

/*
 * Only check pages which have not been modified since the
  * start of the base backup. Otherwise, they might have been
  * written only halfway and the checksum would not be valid.
  * However, replaying WAL would reinstate the correct page in
  * this case. We also skip completely new pages, since they
  * don't have a checksum yet.
  */

Might be nice to have a similar comment for the partial block case to document why we can't report it as corruption. Thanks.