Re: Online verification of checksums

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Online verification of checksums
Дата
Msg-id 1daf5fb0-adee-2a65-772a-579cbdd063c0@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Online verification of checksums  (Stephen Frost <sfrost@snowman.net>)
Ответы Re: Online verification of checksums  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-hackers
On 09/17/2018 04:46 PM, Stephen Frost wrote:
> Greetings,
> 
> * Michael Banck (michael.banck@credativ.de) wrote:
>> On Mon, Sep 03, 2018 at 10:29:18PM +0200, Tomas Vondra wrote:
>>> Obviously, pg_verify_checksums can't do that easily because it's
>>> supposed to run from outside the database instance. 
>>
>> It reads pg_control anyway, so couldn't we just take
>> ControlFile->checkPoint?
>>
>> Other than that, basebackup.c seems to only look at pages which haven't
>> been changed since the backup starting checkpoint (see above if
>> statement). That's reasonable for backups, but is it just as reasonable
>> for online verification?
> 
> Right, basebackup doesn't need to look at other pages.
> 
>>> The pg_verify_checksum apparently ignores this skip logic, because on
>>> the retry it simply re-reads the page again, verifies the checksum and
>>> reports an error. Which is broken, because the newly read page might be
>>> torn again due to a concurrent write.
>>
>> Well, ok.
> 
> The newly read page will have an updated LSN though then on the re-read,
> in which case basebackup can know that what happened was a rewrite of
> the page and it no longer has to care about the page and can skip it.
> 
> I haven't looked, but if basebackup isn't checking the LSN again for the
> newly read page then that'd be broken, but I believe it does (at least,
> that's the algorithm we came up with for pgBackRest, and I know David
> shared that when the basebackup code was being written).
> 

Yes, basebackup does check the LSN on re-read, and skips the page if it
changed on re-read (because it eliminates the consistency guarantees
provided by the checkpoint).

>>> So IMHO this should do something similar to basebackup - check the page
>>> LSN, and if it changed then skip the page.
>>>
>>> I'm afraid this requires using the last checkpoint LSN, the way startptr
>>> is used in basebackup. In particular we can't simply remember LSN from
>>> the first read, because we might actually read [B1,A2] on the first try,
>>> and then [B1,B2] or [B1,C2] on the retry. (Actually, the page may be
>>> torn in various other ways, not necessarily at the 4kB boundary - it
>>> might be torn right after the LSN, for example).
>>
>> I'd prefer to come up with a plan where we don't just give up once we
>> see a new LSN, if possible. If I run a modified pg_verify_checksums
>> which skips on newer pages in a tight benchmark, basically everything
>> gets skipped as checkpoints don't happen often enough.
> 
> I'm really not sure how you expect to be able to do something different
> here.  Even if we started poking into shared buffers, all you'd be able
> to see is that there's a bunch of dirty pages- and we don't maintain the
> checksums in shared buffers, so it's not like you could verify them
> there.
> 
> You could possibly have an option that says "force a checkpoint" but,
> honestly, that's really not all that interesting either- all you'd be
> doing is forcing all the pages to be written out from shared buffers
> into the kernel cache and then reading them from there instead, it's not
> like you'd actually be able to tell if there was a disk/storage error
> because you'll only be looking at the kernel cache.
> 

Yeah.

>> So how about we do check every page, but if one fails on retry, and the
>> LSN is newer than the checkpoint, we then skip it? Is that logic sound?
> 
> I thought that's what basebackup did- if it doesn't do that today, then
> it really should.
> 

The crucial distinction here is that the trick is not in comparing LSNs
from the two page reads, but comparing it to the checkpoint LSN. If it's
greater, the page may be torn or broken, and there's no way to know
which case it is - so basebackup simply skips it.

>> In any case, if we decide we really should skip the page if it is newer
>> than the checkpoint, I think it makes sense to track those skipped pages
>> and print their number out at the end, if there are any.
> 
> Not sure what the point of this is.  If we wanted to really do something
> to cross-check here, we'd track the pages that were skipped and then
> look through the WAL to make sure that they're there.  That's something
> we've talked about doing with pgBackRest, but don't currently.
> 

I agree simply printing the page numbers seems rather useless. What we
could do is remember which pages we skipped and then try checking them
after another checkpoint. Or something like that.

>>> FWIW I also don't understand the purpose of pg_sleep(), it does not seem
>>> to protect against anything, really.
>>
>> Well, I've noticed that without it I get sporadic checksum failures on
>> reread, so I've added it to make them go away. It was certainly a
>> phenomenological decision that I am happy to trade for a better one.
> 
> That then sounds like we really aren't re-checking the LSN, and we
> really should be, to avoid getting these sporadic checksum failures on
> reread..
> 

Again, it's not enough to check the LSN against the preceding read. We
need a checkpoint LSN or something like that.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Jonathan S. Katz"
Дата:
Сообщение: Re: Stored procedures and out parameters
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Stored procedures and out parameters