Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
От | Dmitry Dolgov |
---|---|
Тема | Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum |
Дата | |
Msg-id | 20211113150640.vk5zhjangylufxaa@localhost обсуждение исходный текст |
Ответ на | Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum |
Список | pgsql-bugs |
> On Fri, Nov 12, 2021 at 02:46:22PM -0800, Peter Geoghegan wrote: > On Fri, Nov 12, 2021 at 2:29 PM Andres Freund <andres@anarazel.de> wrote: > > > Naturally, I also went through the exercise of trying to find a > > > counterexample, where pruning doesn't see a disconnected tuple as DEAD > > > in its HTSV. I could not get the assertion to fail with Alexander's > > > test case, nor with make check-world. > > > > I don't think that provides a meaningful coverage. Alexander's test has a > > quite limited set operations (which e.g. doesn't include an subxacts), and our > > own tests around subtransactions, and particularly concurrent subtransaction > > heavy work, is quite, uh, minimal. > > It's a start. > > We need to be pragmatic here. There is some uncertainty about what > HTSV might say about a disconnected tuple in the absence of > corruption, or there is a risk of a new problem like that coming up in > the future -- let's work within those confines, then. What do you want > to do about that? There aren't that many choices, since, to repeat, > the tuple is "morally" DEAD no matter what. Even with corruption, even > without corruption in the presence of some unanticipated corner case > with HTSV -- this is fundamental. I've got curious if modifying the Alexander's test case could reveal something interesting, and sprinkled it with savepoints and rollbacks. Almost immediately a new problem has manifested itself, although the crash has nothing to do with the disconnected tuples as far as I can tell -- still probably worth mentioning. In this case vacuum invoked lazy_scan_prune, and during the first scan one of the chains had a HEAPTUPLE_DEAD at the third position. The processing flow fell through to heap_prune_record_prunable and crashed on an assert with an InvalidTransactionId: #3 0x000055a2b260d1f9 in heap_prune_record_prunable (prstate=0x7ffd0c0ecdf0, xid=0) at pruneheap.c:872 #4 0x000055a2b260ca72 in heap_prune_chain (buffer=2117, rootoffnum=150, prstate=0x7ffd0c0ecdf0) at pruneheap.c:695 #5 0x000055a2b260bcd6 in heap_page_prune (relation=0x7fb98e217e20, buffer=2117, vistest=0x55a2b31d2d60 <GlobalVisCatalogRels>,old_snap_xmin=0, old_snap_ts=0, report_stats=false, off_loc=0x55a2b3e6a0cc) at pruneheap.c:288 #6 0x000055a2b261309c in lazy_scan_prune (vacrel=0x55a2b3e6a060, buf=2117, blkno=192, page=0x7fb97856bf80 "", vistest=0x55a2b31d2d60<GlobalVisCatalogRels>, prunestate=0x7ffd0c0ee9d0) at vacuumlazy.c:1739 Applying heap_prune_record_prunable only if TransactionIdIsNormal seems to help. The original implementation didn't reach heap_prune_record_prunable either and also doesn't crash.
В списке pgsql-bugs по дате отправления: