Re: Fixing a couple of buglets in how VACUUM sets visibility map bits

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Fixing a couple of buglets in how VACUUM sets visibility map bits
Дата
Msg-id CAH2-WzkBOWc1eXb6atLdRkaTsXbKKofG_-+SLxjdt472OerZBw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Fixing a couple of buglets in how VACUUM sets visibility map bits  (Andres Freund <andres@anarazel.de>)
Ответы Re: Fixing a couple of buglets in how VACUUM sets visibility map bits  (Peter Geoghegan <pg@bowt.ie>)
Re: Fixing a couple of buglets in how VACUUM sets visibility map bits  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Mon, Jan 9, 2023 at 11:57 AM Andres Freund <andres@anarazel.de> wrote:
> Afaict we'll need to backpatch this all the way?

I thought that we probably wouldn't need to, at first. But I now think
that we really have to.

I didn't realize that affected visibilitymap_set() calls could
generate useless set-VM WAL records until you pointed it out. That's
far more likely to happen than the race condition that I described --
it has nothing at all to do with concurrency. That's what clinches it
for me.

> > One of the calls to visibilitymap_set() during VACUUM's initial heap
> > pass could unset a page's all-visible bit during the process of setting
> > the same page's all-frozen bit.
>
> As just mentioned upthread, this just seems wrong.

I don't know why this sentence ever made sense to me. Anyway, it's not
important now.

> Do you have a reproducer for this?

No, but I'm quite certain that the race can happen.

If it's important to have a reproducer then I can probably come up
with one. I could likely figure out a way to write an isolation test
that reliably triggers the issue. It would have to work by playing
games with cleanup lock/buffer pin waits, since that's the only thing
that the test can hook into to make things happen in just the
right/wrong order.

> >                       elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation
\"%s\"page %u",
 
> >                                vacrel->relname, blkno);
>
> Hm. The message gets a bit less accurate with the change. Perhaps OK? OTOH, it
> might be useful to know what bit was wrong when debugging problems.

Theoretically it might change again, if we call
visibilitymap_get_status() again. Maybe I should just broaden the
error message a bit instead?

> I still think such changes are inappropriate for a bugfix, particularly one
> that needs to be backpatched.

I'll remove the changes that are inessential in the next revision. I
wouldn't have done it if I'd fully understood the seriousness of the
issue from the start.

If you're really concerned about diff size then I should point out
that the changes to lazy_vacuum_heap_rel() aren't strictly necessary,
and probably shouldn't be backpatched. I deemed that in scope because
it's part of the same overall problem of updating the visibility map
based on potentially stale information. It makes zero sense to check
with the visibility map before updating it when we already know that
the page is all-visible. I mean, are we trying to avoid the work of
needlessly updating the visibility map in cases where its state was
corrupt, but then became uncorrupt (relative to the heap page) by
mistake?

--
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Fixing a couple of buglets in how VACUUM sets visibility map bits
Следующее
От: Melanie Plageman
Дата:
Сообщение: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)