Re: Online verification of checksums

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: Online verification of checksums
Дата
Msg-id 20190318073403.GL6197@tamriel.snowman.net
обсуждение исходный текст
Ответ на Re: Online verification of checksums  (Michael Banck <michael.banck@credativ.de>)
Ответы Re: Online verification of checksums
Список pgsql-hackers
Greetings,

* Michael Banck (michael.banck@credativ.de) wrote:
> Am Montag, den 18.03.2019, 02:38 -0400 schrieb Stephen Frost:
> > * Michael Paquier (michael@paquier.xyz) wrote:
> > > On Mon, Mar 18, 2019 at 01:43:08AM -0400, Stephen Frost wrote:
> > > > To be clear, I agree completely that we don't want to be reporting false
> > > > positives or "this might mean corruption!" to users running the tool,
> > > > but I haven't seen a good explaination of why this needs to involve the
> > > > server to avoid that happening.  If someone would like to point that out
> > > > to me, I'd be happy to go read about it and try to understand.
> > >
> > > The mentions on this thread that the server has all the facility in
> > > place to properly lock a buffer and make sure that a partial read
> > > *never* happens and that we *never* have any kind of false positives,
> >
> > Uh, we are, of course, going to have partial reads- we just need to
> > handle them appropriately, and that's not hard to do in a way that we
> > never have false positives.
>
> I think the current patch (V13 from https://www.postgresql.org/message-i
> d/1552045881.4947.43.camel@credativ.de) does that, modulo possible bugs.

I think the question here is- do you ever see false positives with this
latest version..?  If you are, then that's an issue and we should
discuss and try to figure out what's happening.  If you aren't seeing
false positives, then it seems like we're done here, right?

> > I do not understand, at all, the whole sub-thread argument that we have
> > to avoid partial reads.  We certainly don't worry about that when doing
> > backups, and I don't see why we need to avoid it here.  We are going to
> > have partial reads- and that's ok, as long as it's because we're at the
> > end of the file, and that's easy enough to check by just doing another
> > read to see if we get back zero bytes, which indicates we're at the end
> > of the file, and then we move on, no need to coordinate anything with
> > the backend for this.
>
> Well, I agree with you, but we don't seem to have consensus on that.

I feel like everyone is concerned that we'd report an acceptable partial
read as a failure, hence it would be a false positive, and I agree
entirely that we don't want false positives, but the answer to that
seems to be that we shouldn't report partial reads as failures, solving
the problem in a simple way that doesn't involve the server and doesn't
materially reduce the check that's being performed.

> > > directly preventing the set of issues we are trying to implement
> > > workarounds for in a frontend tool are rather good arguments in my
> > > opinion (you can grep for BufferDescriptorGetIOLock() on this thread
> > > for example).
> >
> > Sure the backend has those facilities since it needs to, but these
> > frontend tools *don't* need that to *never* have any false positives, so
> > why are we complicating things by saying that this frontend tool and the
> > backend have to coordinate?
> >
> > If there's an explanation of why we can't avoid having false positives
> > in the frontend tool, I've yet to see it.  I definitely understand that
> > we can get partial reads, but a partial read isn't a failure, and
> > shouldn't be reported as such.
>
> It is not in the current patch, it should just get reported as a skipped
> block in the end.  If the cluster is online that is, if it is offline,
> we do consider it a failure.

Ok, that sounds fine- and do we ever see false positives now?

> I have now rebased that patch on top of the pg_verify_checksums ->
> pg_checksums renaming, see attached.

Thanks for that.  Reading through the code though, I don't entirely
understand why we're making things complicated for ourselves by trying
to seek and re-read the entire block, specifically this:

>          if (r != BLCKSZ)
>          {
> -            fprintf(stderr, _("%s: could not read block %u in file \"%s\": read %d of %d\n"),
> -                    progname, blockno, fn, r, BLCKSZ);
> -            exit(1);
> +            if (online)
> +            {
> +                if (block_retry)
> +                {
> +                    /* We already tried once to reread the block, skip to the next block */
> +                    skippedblocks++;
> +                    if (lseek(f, BLCKSZ-r, SEEK_CUR) == -1)
> +                    {
> +                        skippedfiles++;
> +                        fprintf(stderr, _("%s: could not lseek to next block in file \"%s\": %m\n"),
> +                                progname, fn);
> +                        return;
> +                    }
> +                    continue;
> +                }
> +
> +                /*
> +                 * Retry the block. It's possible that we read the block while it
> +                 * was extended or shrinked, so it it ends up looking torn to us.
> +                 */
> +
> +                /*
> +                 * Seek back by the amount of bytes we read to the beginning of
> +                 * the failed block.
> +                 */
> +                if (lseek(f, -r, SEEK_CUR) == -1)
> +                {
> +                    skippedfiles++;
> +                    fprintf(stderr, _("%s: could not lseek in file \"%s\": %m\n"),
> +                            progname, fn);
> +                    return;
> +                }
> +
> +                /* Set flag so we know a retry was attempted */
> +                block_retry = true;
> +
> +                /* Reset loop to validate the block again */
> +                blockno--;
> +
> +                continue;
> +            }

I would think that we could just do:

  insert_location = 0;
  r = read(BLCKSIZE - insert_location);
  if (r < 0) error();
  if (r == 0) EOF detected, move to next
  if (r < (BLCKSIZE - insert_location)) {
    insert_location += r;
    continue;
  }

At this point, we should have a full block, do our checks...

Have you seen cases where the kernel will actually return a partial read
for something that isn't at the end of the file, and where you could
actually lseek past that point and read the next block?  I'd be really
curious to see that if you can reproduce it...  I've definitely seen
empty pages come back with a claim that the full amount was read, but
that's a very different thing.

Obviously the same goes for anywhere else we're trying to handle a
partial read return from..

Thanks!

Stephen

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Re: pg_basebackup ignores the existing data directory permissions
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Online verification of checksums