Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Дата
Msg-id 20211112222919.e7fkfpbpcoje6hsj@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-bugs
Hi,

On 2021-11-12 13:11:54 -0800, Peter Geoghegan wrote:
> It also addresses the separate issue of DEAD vs RECENTLY_DEAD
> disconnected tuples -- that was the other unresolved question. This
> revision takes a harder line on the state of disconnected heap-only
> tuples. Andres said that he doesn't know for sure that disconnected
> heap-only tuples cannot be DELETE/INSERT_IN_PROGRESS -- "I'm not
> actually sure the Assert is unreachable. I can imagine cases where
> we'd see e.g. DELETE/INSERT_IN_PROGRESS due to a concurrent
> subtransaction abort or such". But I don't see how that's possible. In
> fact, I don't even see how it's possible for these items to be
> RECENTLY_DEAD -- I think that they must always be DEAD (or we're in
> big trouble anyway).
>
> These are not just any heap-only tuples. They're heap-only tuples that
> cannot possibly be accessed from a HOT chain. And so it's just
> physically impossible for them to be returned by index scans -- this
> is a certainty. How could they not be DEAD, in every possible sense?
> How could they not come from an aborted transaction, specifically?

With subtransactions abort is a bit more complicated than with plain
transactions. I'm not at all sure a problematic scenario exists, but I
wouldn't want to rely on it.

Especially if suboverflowed comes into play there can be scenarios where one
backend uses TransactionIdDidAbort() + SubTransGetTopmostTransaction() for
in-progress determination while another just relies on the procarray. Those
aren't updated atomically with respect to each other.

Also, heap_update()'s wait = true path uses a bit different logic again to
wait for other backends than what HeapTupleSatisfiesVacuum() ends up with.


> Naturally, I also went through the exercise of trying to find a
> counterexample, where pruning doesn't see a disconnected tuple as DEAD
> in its HTSV. I could not get the assertion to fail with Alexander's
> test case, nor with make check-world.

I don't think that provides a meaningful coverage. Alexander's test has a
quite limited set operations (which e.g. doesn't include an subxacts), and our
own tests around subtransactions, and particularly concurrent subtransaction
heavy work, is quite, uh, minimal.

Greetings,

Andres Freund



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum