Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Дата
Msg-id CAH2-Wzka8u1Y+GNP41TQiHjLMP7mYG9h3i+5jq-FzSKgWXBk+Q@mail.gmail.com
обсуждение исходный текст
Ответ на BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (PG Bug reporting form <noreply@postgresql.org>)
Ответы Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Dmitry Dolgov <9erthalion6@gmail.com>)
Список pgsql-bugs
On Mon, Nov 8, 2021 at 9:28 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> Interesting, I don't think I've observed those errors. In fact after the
> recent changes (I've compiled here from 39a31056) around assertion logic
> and index_delete_check_htid now I'm getting another type of crashes
> using your scripts. This time heap_page_prune_execute stumbles upon a
> non heap-only tuple trying to update unused line pointers:

It looks like the new heap_page_prune_execute() assertions catch the
same problem earlier. It's hard to not suspect the code in pruneheap.c
itself. Whatever else may have happened, the code in pruneheap.c ought
to not even try to set a non-heap-only tuple item to LP_UNUSED. ISTM
that it should explicitly look out for and avoid doing that.

Offhand, I wonder if we just need to have an additional check in
heap_prune_chain(), which is like the existing "If we find a redirect
somewhere else, stop --- it must not be same chain" handling used for
LP_REDIRECT items that aren't at the start of a HOT chain:

diff --git a/src/backend/access/heap/pruneheap.c
b/src/backend/access/heap/pruneheap.c
index c7331d810..3c72cdf67 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -646,6 +646,9 @@ heap_prune_chain(Buffer buffer, OffsetNumber
rootoffnum, PruneState *prstate)
             !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
             break;

+        if (nchain > 0 && !HeapTupleHeaderIsHeapOnly(htup))
+            break;      /* not at start of chain */
+
         /*
          * OK, this tuple is indeed a member of the chain.
          */

I would expect the "Check the tuple XMIN against prior XMAX" test to
do what we need in the vast vast majority of cases, but why should it
be 100% effective?

-- 
Peter Geoghegan



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
Следующее
От: Andres Freund
Дата:
Сообщение: Re: BUG #17268: Possible corruption in toast index after reindex index concurrently