Re: Online verification of checksums

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: Online verification of checksums
Дата
Msg-id CAOuzzgrZyz_vgRjd8GrLen39B7KWXY-qTyobDE5nP2mSTtGOgQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Online verification of checksums  (Michael Banck <michael.banck@credativ.de>)
Список pgsql-hackers
Greetings,

On Tue, Mar 19, 2019 at 04:15 Michael Banck <michael.banck@credativ.de> wrote:
Am Montag, den 18.03.2019, 16:11 +0800 schrieb Stephen Frost:
> On Mon, Mar 18, 2019 at 15:52 Michael Banck <michael.banck@credativ.de> wrote:
> > Am Montag, den 18.03.2019, 03:34 -0400 schrieb Stephen Frost:
> > > Thanks for that.  Reading through the code though, I don't entirely
> > > understand why we're making things complicated for ourselves by trying
> > > to seek and re-read the entire block, specifically this:
> >
> > [...]
> >
> > > I would think that we could just do:
> > >
> > >   insert_location = 0;
> > >   r = read(BLCKSIZE - insert_location);
> > >   if (r < 0) error();
> > >   if (r == 0) EOF detected, move to next
> > >   if (r < (BLCKSIZE - insert_location)) {
> > >     insert_location += r;
> > >     continue;
> > >   }
> > >
> > > At this point, we should have a full block, do our checks...
> >
> > Well, we need to read() into some buffer which you have ommitted.
>
> Surely there’s a buffer the read in the existing code is passing in,
> you just need to offset by the current pointer, sorry for not being
> clear.
>
> In other words the read would look more like:
>
> read(fd,buf + insert_ptr, BUFSZ - insert_ptr)
>
> And then you have to reset insert_ptr once you have a full block.

Ok, thanks for clearing that up.

I've tried to do that now in the attached, does that suit you?

Yes, that’s what I was thinking.  I’m honestly not entirely convinced that the lseek() efforts still need to be put in- I would have thought it’d be fine to simply check the LSN on a checksum failure and mark it as skipped if the LSN is past the current checkpoint.  That seems like it would make things much simpler, but I’m also not against keeping that logic now that it’s in, provided it doesn’t cause issues

> Yes, the question was more like this: have you ever seen a read return
> a partial result when you know you’re in the middle somewhere of an
> existing file and the length of the file hasn’t been changed by
> something else..?

I don't think I've seen that, but that wouldn't turn up in regular
testing anyway I guess but only in pathological cases?  I guess we are
probably dealing with this in the current version of the patch, but I
can't say for certain as it sounds pretty difficult to test.

Yeah, a lot of things in this area are unfortunately difficult to test.  I’m glad to hear that it doesn’t sound like you’ve seen it though. 

I have also added a paragraph to the documentation about possilby
skipping new or recently updated pages:

+   If the cluster is online, pages that have been (re-)written since the last
+   checkpoint will not count as checksum failures if they cannot be read or
+   verified correctly.

I would flip this around:

——-
In an online cluster, pages are being concurrently written to the files while the check is being run, leading to possible torn pages or partial reads.  When the tool detects a concurrently written page, indicated by the page’s LSN being beyond the checkpoint the tool started at, that page will be reported as skipped.  Note that in a crash scenario, any pages written since the last checkpoint will be replayed from the WAL.
——-

Now here’s the $64 question- have you tested this latest version under load..?  If not, could you?  And when you do, can you report back what the results are?  Do you still see any actual checksum failures?  Do the number of skipped pages seem reasonable in your tests or is there a concern there?

If you still see actual checksum failures which aren’t because the LSN is higher than the checkpoint, or because of a short read, then we need to investigate further but hopefully that isn’t happening now.  I think a lot of the concerns raised on this thread about wanting to avoid false positives are because the torn page (with higher LSN than current checkpoint) and short read cases were previously reported as failures when they really are expected.  Let’s test this as much as we can and make sure we aren’t seeing false positives anymore.

Thanks!

Stephen

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Banck
Дата:
Сообщение: Re: Progress reporting for pg_verify_checksums
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: partitioned tables referenced by FKs