Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Дата
Msg-id CAH2-Wzm63tpX3o81X_J6Do8ZX63YTuwmi=G1gJifdgv0Ruf4CA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Andres Freund <andres@anarazel.de>)
Ответы Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Andres Freund <andres@anarazel.de>)
Список pgsql-bugs
On Wed, Nov 10, 2021 at 4:57 PM Andres Freund <andres@anarazel.de> wrote:
> Yes. I don't think it's problematic right now, because the redirect would, I
> think, in all cases have to point to the chain element before those tuples,
> because the preceding value would just have to be DELETE_IN_PROGRESS, which we
> we don't follow in heap_prune_chain().

Actually, I was more worried about versions before Postgres 14 here --
versions that don't have that new DELETE_IN_PROGRESS pruning behavior
you mentioned.

> > I'm asking because I notice that the fragile "We need this primarily
> > to handle aborted HOT updates" precheck for
> > HeapTupleHeaderIsHeapOnly() doesn't just check if the heap-only tuple
> > is DEAD before deciding to mark it LP_UNUSED. It also checks
> > HeapTupleHeaderIsHotUpdated() against the target tuple -- that's
> > another condition of the tuple being marked unused. Of course, whether
> > or not a given tuple is considered HeapTupleHeaderIsHotUpdated() can
> > change from true to false when an updater concurrently aborts. Could
> > that have race conditions?
>
> I wondered about that too, but I couldn't *quite* come up with a problematic
> scenario, because I don't think any of the cases that can change
> HeapTupleHeaderIsHotUpdated() would have allowed to set the redirect to a
> subsequent chain element.

It's pretty complicated.

> > In other words: what if the aforementioned "aborted HOT updates"
> > precheck code doesn't deal with a DEAD tuple, imagining that it's not
> > a relevant tuple, while at the same time the later HOT-chain-chasing
> > code *also* doesn't get to the tuple? What if they each assume that
> > the other will/has taken care of it, due to a race?
>
> Then we'd just end up not pruning the tuple, I think. Which should be fine, as
> it could only happen for fairly new tuples.

It's pretty far from ideal if the situation cannot correct itself in a
timely manner. Offhand I wonder if this might have something to do
with remaining suspected cases where we restart pruning during VACUUM,
based on the belief that an inserter concurrently aborted. If the DEAD
tuple is unreachable by pruning *forever*, then we're in trouble.

The good news is that the fix for missing DEAD tuples (a hypothetical
problem) is the same as the fix for the known, concrete problem:
process whole HOT chains first, and only later (in a "second pass"
over the page) process remaining heap-only tuples -- those heap-only
tuples that cannot be located any other way (because we tried and
failed).

You probably noticed that my patch does *not* have the same
"!HeapTupleHeaderIsHotUpdated()" check for these
unreachable-via-HOT-chain heap-only tuples. The mere fact that they're
unreachable is a good enough reason to consider them DEAD (and so mark
them LP_UNUSED). We do check heap_prune_satisfies_vacuum() for these
tuples too, but that is theoretically unnecessary. This feels pretty
water tight to me -- contradictory interpretations of what heap-only
tuple is and is in what HOT chain (if any) now seem impossible.

I benchmarked the patch, and it looks like there is a consistent
reduction in latency -- which you probably won't find too surprising.
That isn't the goal, but it is nice to see that new performance
regressions are unlikely to be a problem for the bug fix.

-- 
Peter Geoghegan



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum