Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Дата
Msg-id CAH2-WzkpG9KLQF5sYHaOO_dSVdOjM+dv=nTEn85oNfMUTk836Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Andres Freund <andres@anarazel.de>)
Список pgsql-bugs
On Tue, Nov 9, 2021 at 3:31 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Attached is a WIP fix for the bug. The idea here is to follow all HOT
> chains in an initial pass over the page, while even following LIVE
> heap-only tuples. Any heap-only tuples that we don't determine are
> part of some valid HOT chain (following an initial pass over the whole
> heap page) will now be processed in a second pass over the page.

I realized that I could easily go further than in v1, and totally get
rid of the "marked" array (which tracks whether we have decided to
mark an item as LP_DEAD/LP_UNUSED/a new LP_REDIRECT/newly pointed to
by another LP_REDIRECT). In my v1 from earlier today we already had an
array that records whether or not each item is part of any known valid
chain, which is strictly better than knowing whether or not they were
"marked" earlier. So why bother with the "marked" array at all, even
for assertions? It is less robust (not to mention less efficient) than
just using the new "fromvalidchain" array.

Attached is v2, which gets rid of the "marked" array as described. It
also has better worked out comments and assertions. The patch has
stood up to a fair amount of stress-testing. I repeated Alexander's
original test case for over an hour with this. Getting the test case
to cause an assertion failure would usually take about 5 minutes
without any fix.

I have yet to do any work on validating the performance of this patch,
though that definitely needs to happen.

Anybody have any thoughts on how far this should be backpatched? We'll
probably need to do that for Postgres 14. Less sure about other
branches, which haven't been directly demonstrated to be affected by
the bug so far. Haven't tried to break earlier branches with
Alexander's test case, though I will note again that Alexander
couldn't do that when he tried.

-- 
Peter Geoghegan

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Следующее
От: Noah Misch
Дата:
Сообщение: Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data