Re: pgsql: Validate page level checksums in base backups

Поиск
Список
Период
Сортировка
От Magnus Hagander
Тема Re: pgsql: Validate page level checksums in base backups
Дата
Msg-id CABUevEz0uBqg_uyfA-yFiL6Wo=kjn5kE9tEQrJ-vwWug9Vrwfw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pgsql: Validate page level checksums in base backups  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: pgsql: Validate page level checksums in base backups  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: pgsql: Validate page level checksums in base backups  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: pgsql: Validate page level checksums in base backups  (Michael Banck <michael.banck@credativ.de>)
Re: pgsql: Validate page level checksums in base backups  (Michael Banck <michael.banck@credativ.de>)
Список pgsql-hackers


On Tue, Apr 3, 2018 at 8:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> Yeah, there's clearly a second problem here.

I think this test script is broken in many ways.

It's scribbling on the source cluster's disk files and assuming that that
translates one-for-one to what gets sent to the slave server --- but what
if some of the blocks that it modifies on-disk are resident in the
source's shared buffers?  I think you'd have to shut down the source and
then apply the corruption if you want stable results.

It doesn't actually use a slave server as part of the tests.

And basebackups don't read from the sources shared buffers, but it *does* read from the kernel buffers.


I'd bet a good lunch that nondefault BLCKSZ would break it, as well,
since the way in which the corruption is induced is just guessing
as to where page boundaries are.

Yeah, that might be a problem. Those should be calculated from the block size.


Also, scribbling on tables as sensitive as pg_class is just asking for
trouble IMO.  I don't see anything in this test, for example, that
prevents autovacuum from running and causing a PANIC before the test
can complete.  Even with AV off, there's a good chance that clobber-
cache-always animals will fall over because they do so many more
physical accesses to the system catalogs.  I'd suggest inducing the
corruption in some user table(s) that we can more tightly constrain
the source server's accesses to.

Yeah, that seems like a good idea. And probably also shut the server down while writing the corruption, just in case.

Will stick looking into that on my todo for when I'm back, unless beaten to it. Michael, you want a stab at it?

//Magnus

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: pgsql: Validate page level checksums in base backups
Следующее
От: Tom Lane
Дата:
Сообщение: Re: pgsql: Validate page level checksums in base backups