Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Дата
Msg-id CAH2-WzmGdE_ccJCoTxFik_opCdJrYxi76=xEfROP-6WnmrftuQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Andres Freund <andres@anarazel.de>)
Список pgsql-bugs
On Fri, Mar 18, 2022 at 7:53 AM Andres Freund <andres@anarazel.de> wrote:
> > Agreed. This is why I find your emphasis on as-recent-as-possible
> > initial horizons confusing. It seems as if you're giving almost equal
> > emphasis to both issues, even though one issue (more eager pruning
> > during VACUUM) is of obvious practical value, while the other
> > (as-recent-as-possible initial horizons) is pretty theoretical and
> > academic -- at best.
>
> I'm not intending to weigh them the same. I think using a more recent horizon
> is more important than you describe here, but "computed horizons" will be a
> considerably larger win.

Okay, got it.

My relfrozenxid and freezing patch series patch v10-0002-* took a new
approach to avoiding the visibility map aggressive VACUUM special
case, that seems like it might be relevant here (surprisingly enough).
The latest version totally embraces making all visibility map access
inside lazy_scan_heap use local state describing a range of skippable
blocks. Not that different, really, but still something that hints at
an interesting future direction.

What if we built a palloc'd array of these ranges up front, right
after establishing OldestXmin? We could even build a complete picture
of the entire table before we'd even scanned the first block. I don't
think it would require too much space. We'd then be operating against
a "snapshot of the visibility map just after OldestXmin was
established", even in a multi-hour VACUUM operation. This is kind of a
move in the opposite direction, but it's not necessarily the wrong
direction; at least this way we're not going to dirty concurrently
modified pages from the largest tables, just to set hint bits or clean
up one or two concurrently aborted transactions.

I also think that this would help with your aio work. Having a
palloc'd array like this would be an ideal target for prefetching. I
actually checked what you were doing here. Evidently the visibility
map stuff at the top of lazy_scan_heap was already a problem for aio:


https://github.com/anarazel/postgres/blob/29d3a3a3ddd47906d9128bc68c218664ab15a2c6/src/backend/access/heap/vacuumlazy.c#L927

I'm posting a more refined version of just that patch here, since it's
closer to the ideal for this kind of thing; it takes all of the
decisions away from lazy_scan_heap, which need only execute skipping
by following instructions built by the new helper function.

--
Peter Geoghegan

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: snapshot recovery conflict despite hot_standby_feedback set to on