Re: [PATCH] Verify Checksums during Basebackups

Поиск
Список
Период
Сортировка
От Michael Banck
Тема Re: [PATCH] Verify Checksums during Basebackups
Дата
Msg-id 1521759951.15036.21.camel@credativ.de
обсуждение исходный текст
Ответ на Re: [PATCH] Verify Checksums during Basebackups  (Magnus Hagander <magnus@hagander.net>)
Список pgsql-hackers
Hi Magnus,

thanks a lot for looking at my patch!

Am Donnerstag, den 22.03.2018, 15:07 +0100 schrieb Magnus Hagander:
> On Sat, Mar 17, 2018 at 10:34 PM, Michael Banck <michael.banck@credativ.de> wrote:
> > On Fri, Mar 09, 2018 at 10:35:33PM +0100, Michael Banck wrote:
> > > Possibly open questions:
> > >
> > > 1. I have not so far changed the replication protocol to make verifying
> > > checksums optional. I can go about that next if the consensus is that we
> > > need such an option (and cannot just check it everytime)?
> > 
> > I think most people (including those I had off-list discussions about
> > this with) were of the opinion that such an option should be there, so I
> > added an additional option NOVERIFY_CHECKSUMS to the BASE_BACKUP
> > replication command and also an option -k / --no-verify-checksums to
> > pg_basebackup to trigger this.
> > 
> > Updated patch attached
> > 
> 
> Notes:
> 
> +   if (checksum_failure == true)
> 
> Really, just if(checksum_failure)
> 
> +           errmsg("checksum mismatch during basebackup")));
> 
> Should be "base backup" in messages.

I've changed both.

> +static const char *skip[] = {
> 
> I think that needs a much better name than just "skip". Skip for what?
> In particular since we are just skipping it for checksums, and not for
> the actual basebackup, that name is actively misinforming.

I have copied that verbatim from the online checksum patch, but of
course this is in src/backend/replication and not src/bin so warrants
more scrutiny. If you plan to commit both for v11, it might make sense
to have that separated out to a more central place?

But I guess what we mean is a test for "is a heap file". Do you have a
good suggestion where it should end up so that pg_verify_checksums can
use it as well?

In the meantime, I've changed the skip[] array to no_heap_files[] and
the skipfile() function to is_heap_file(), also reversing the logic. If
it helps pg_verify_checksums, we could make is_not_a_heap_file()
instead.

> +   filename = basename(pstrdup(readfilename));
> +   if (!noverify_checksums && DataChecksumsEnabled() &&
> +       !skipfile(filename) &&
> +       (strncmp(readfilename, "./global/", 9) == 0 ||
> +       strncmp(readfilename, "./base/", 7) == 0 ||
> +       strncmp(readfilename, "/", 1) == 0))
> +       verify_checksum = true;
> 
> I would include the checks for global, base etc into the skipfile()
> function as well (also renamed).

Check. I had to change the way (the previous) skipfile() works a bit,
because it was expecting a filename as argument, while we check
pathnames in the above.

> +                * Only check pages which have not been modified since the
> +                * start of the base backup.
> 
> I think this needs a description of why, as well (without having read
> this thread, this is a pretty subtle case).

I tried to expand on this some more.

> +system_or_bail 'dd', 'conv=notrunc', 'oflag=seek_bytes', 'seek=4000', 'bs=9', 'count=1', 'if=/dev/zero',
"of=$pgdata/$pg_class";
> 
> This part of the test will surely fail on Windows, not having a
> /dev/zero. Can we easily implement this part natively in perl perhaps?

Right, this was one of the open questions. I now came up with a perl 4-
liner that seems to do the trick, but I can't test it on Windows.

> Most of that stuff is trivial and can be cleaned up at commit time. Do
> you want to send an updated patch with a few of those fixes, or should
> I clean it?

I've attached a new patch, but I have not addressed the question whether
skipfile()/is_heap_file() should be moved somewhere else yet.

I found one more cosmetic issue: if there is an external tablespace, and
pg_basebackup encounters corruption, you would get the message
"pg_basebackup: changes to tablespace directories will not be undone"
from cleanup_directories_atexit(), which I now also suppress in case of
checksum failures.

> The test thing is a stopper until we figure that one out though. And
> while at it -- it seems we don't have any tests for the checksum
> feature in general. It would probably make sense to consider that at
> the same time as figuring out the right way to do this one.

I don't want to deflect work, but it seems to me the online checksums
patch would be in a better position to generally test checksums while
it's at it. Or did you mean something related to pg_basebackup?



Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dmitry Dolgov
Дата:
Сообщение: Re: [HACKERS] [PATCH] Generic type subscripting
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: [HACKERS] MERGE SQL Statement for PG11