Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Дата
Msg-id 20211030213948.3i3f7cieqxzhd7kq@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Dmitry Dolgov <9erthalion6@gmail.com>)
Ответы Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Peter Geoghegan <pg@bowt.ie>)
Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Dmitry Dolgov <9erthalion6@gmail.com>)
Список pgsql-bugs
Hi,

On 2021-10-29 16:55:32 +0200, Dmitry Dolgov wrote:
> > On Fri, Oct 29, 2021 at 07:00:01AM +0000, PG Bug reporting form wrote:
> > The following bug has been logged on the website:
> >
> > Bug reference:      17255
> > Logged by:          Alexander Lakhin
> > Email address:      exclusion@gmail.com
> > PostgreSQL version: 14.0
> > Operating system:   Ubuntu 20.04
> > Description:
> >
> > with the following stack:
> > Core was generated by `postgres: law regression [local] CREATE INDEX
> >                         '.
> > Program terminated with signal SIGABRT, Aborted.
> > #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> > 50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> > (gdb) bt
> > #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> > #1  0x00007f8a7f97a859 in __GI_abort () at abort.c:79
> > #2  0x0000562dabb49700 in index_delete_sort_cmp (deltid2=<synthetic
> > pointer>, deltid1=<optimized out>) at heapam.c:7582
> > #3  index_delete_sort (delstate=0x7fff6f609f10, delstate=0x7fff6f609f10) at
> > heapam.c:7623
> > #4  heap_index_delete_tuples (rel=0x7f8a76523e08, delstate=0x7fff6f609f10)
> > at heapam.c:7296
> > #5  0x0000562dabc5519a in table_index_delete_tuples
> > (delstate=0x7fff6f609f10, rel=0x562dac23d6c2)
> >     at ../../../../src/include/access/tableam.h:1327
> > #6  _bt_delitems_delete_check (rel=rel@entry=0x7f8a7652cc80,
> > buf=buf@entry=191, heapRel=heapRel@entry=0x7f8a76523e08,
> >     delstate=delstate@entry=0x7fff6f609f10) at nbtpage.c:1541
> > #7  0x0000562dabc4dbe1 in _bt_simpledel_pass (maxoff=<optimized out>,
> > minoff=<optimized out>, newitem=<optimized out>,
> >     ndeletable=55, deletable=0x7fff6f609f30, heapRel=0x7f8a76523e08,
> > buffer=191, rel=0x7f8a7652cc80)
> >     at nbtinsert.c:2899
> > ...
> >
> > Discovered while hunting to another bug related to autovacuum (unfortunately
> > I still can't produce the reliable reproducing script for that).
> 
> Thanks for reporting (in fact I'm impressed how many issues you've
> discovered, hopefully there are at least some t-shirts "I've found X
> bugs in PostgreSQL" available as a reward) and putting efforts into the
> reproducing steps. I believe I've managed to reproduce at least a
> similar crash with the same trace.
> 
> In my case it crashed on pg_unreachable (which is an abort, when asserts
> are enabled) inside index_delete_sort_cmp. It seems like item pointers
> to compare both have the same block and offset number. In the view of
> the recent discussions I was thinking it could be somehow related to the
> issues with duplicated TIDs, but delstate->deltids doesn't in fact have
> any duplicated entries -- so not sure about that, still investigating
> the core dump.

I suspect this is the same bug as #17245. Could you check if it's fixed by
https://www.postgresql.org/message-id/CAH2-WzkN5aESSLfK7-yrYgsXxYUi__VzG4XpZFwXm98LUtoWuQ%40mail.gmail.com

The crash is somewhere in pg_class, which is also manually VACUUMed by the
test, which could trigger the issue we found in the other thread. The likely
reason the loop in the repro is needed is that that'll push one of the indexes
on pg_class over the 512kb/min_parallel_index_scan_size boundary to start
using paralell vacuum.

Greetings,

Andres Freund



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #17254: Crash with 0xC0000409 in pg_stat_statements when pg_stat_tmp\pgss_query_texts.stat exceeded 2GB.
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum