Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Дата
Msg-id 20211111014325.sc2s4pzx2bbnslkn@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Список pgsql-bugs
Hi,

On 2021-11-10 17:19:14 -0800, Peter Geoghegan wrote:
> On Wed, Nov 10, 2021 at 4:57 PM Andres Freund <andres@anarazel.de> wrote:
> > Yes. I don't think it's problematic right now, because the redirect would, I
> > think, in all cases have to point to the chain element before those tuples,
> > because the preceding value would just have to be DELETE_IN_PROGRESS, which we
> > we don't follow in heap_prune_chain().
> 
> Actually, I was more worried about versions before Postgres 14 here --
> versions that don't have that new DELETE_IN_PROGRESS pruning behavior
> you mentioned.

I think we're actually saying the same thing. I.e. that we, so far, don't have
a concrete reason to worry about pre 14 versions.


> > > In other words: what if the aforementioned "aborted HOT updates"
> > > precheck code doesn't deal with a DEAD tuple, imagining that it's not
> > > a relevant tuple, while at the same time the later HOT-chain-chasing
> > > code *also* doesn't get to the tuple? What if they each assume that
> > > the other will/has taken care of it, due to a race?
> >
> > Then we'd just end up not pruning the tuple, I think. Which should be fine, as
> > it could only happen for fairly new tuples.
> 
> It's pretty far from ideal if the situation cannot correct itself in a
> timely manner.

Wouldn't it immediately corrected when called from vacuum due to the DEAD
check? Afaict this is a one-way ratched, so the next prune is guaranteed to
get it?


> Offhand I wonder if this might have something to do with remaining suspected
> cases where we restart pruning during VACUUM, based on the belief that an
> inserter concurrently aborted. If the DEAD tuple is unreachable by pruning
> *forever*, then we're in trouble.

But why would it be unreachable forever? The next hot prune will see
!HeapTupleHeaderIsHotUpdated() and process it via the "If the tuple is DEAD
and doesn't chain to anything else" block?


> You probably noticed that my patch does *not* have the same
> "!HeapTupleHeaderIsHotUpdated()" check for these
> unreachable-via-HOT-chain heap-only tuples. The mere fact that they're
> unreachable is a good enough reason to consider them DEAD (and so mark
> them LP_UNUSED). We do check heap_prune_satisfies_vacuum() for these
> tuples too, but that is theoretically unnecessary. This feels pretty
> water tight to me -- contradictory interpretations of what heap-only
> tuple is and is in what HOT chain (if any) now seem impossible.

Hm. I guess you *have* to actually process them regardless of
!HeapTupleHeaderIsHotUpdated() with the new approach, because otherwise we'd
potentially only process only the tail item of a disconnected chain in each
heap_hot_prune() call? Although that might be unreachable due to the HTSV
calls likely breaking the chain in all disconnected cases (but perhaps not
with multixacts, there could be remaining lockers).

Greetings,

Andres Freund



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: BUG #17280: global-buffer-overflow on select from pg_stat_slru
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum