Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
От | Peter Geoghegan |
---|---|
Тема | Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum |
Дата | |
Msg-id | CAH2-Wzm9_2s=_6eKm5bNfz6-X9QYmWPd2h=iF_1wBhZufQ4WQw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
(Peter Geoghegan <pg@bowt.ie>)
|
Список | pgsql-bugs |
On Fri, Dec 10, 2021 at 8:57 PM Andres Freund <andres@anarazel.de> wrote: > Given the lack of further discussion, let's go with the "minimal" fix and your > stuff later. Attached is further polished patch - no significant changes, > plenty copy-editing stuff. Makes sense. > 0001 is the bugfix LGTM. > 0002 adds the additional assertions - I'm wondering about only committing that > in HEAD? I don't think that it makes much difference, since these are only assertions. I probably wouldn't bother with that myself, but I also have zero concern about it breaking anything. > I think your patch should easily be able to use prstate->htsv? Let's get this committed, to 14 and HEAD. And then leave it to me to rebase, as additional HEAD-only hardening. > 0003 removes the redundant RelationGetNumberOfBlocks() calls. As 0001 does not > change the total number of RelationGetNumberOfBlocks() calls I'm inclined to > just apply this to HEAD? Yeah, that can be part of my follow-up hardening patch, which (to repeat) will be HEAD-only. > 0004 is a patch that tries to address your point about the GlobalVisTestFor() > placement, sort of > > My earlier point that moving it to earlier would be a bad idea because it'd > make the horizon "older" was bogus - GlobalVisTestFor() doesn't itself do > any horizon determination. However moving the GlobalVisTestFor() earlier > still seems wrong - imo the vacuum_set_xid_limits() call should be moved to > later. The attached patch moves both, wrapped in a new function, to just > before the scan in lazy_scan_heap(). I think that this is the right direction, and I like the comments a lot. But I don't like the idea of putting this initialization into lazy_scan_heap(), for purely structural reasons. I plan to get rid of the VacuumParams argument to lazy_scan_heap() completely, which would make "VacuumParams setup" the sole responsibility of its caller, heap_vacuum_rel(). The draft version of my "remove special cases to advance relfrozenxid" patch (the one you've reviewed) already does this. In general I really think it's important to make the state in vacuumlazy.c as simple as possible. I have no objection to delaying the lazy_scan_heap_limits() stuff until right before lazy_scan_heap() is called. However, I do think that we should always know certain basic "immutable" facts about a VACUUM at the point that we call lazy_scan_heap(), which is not the case with this patch. Honestly, I'm surprised that you see much value in delaying the lazy_scan_heap_limits() stuff until the very last microsecond. How many microseconds could we possibly delay it by? > I tried pretty hard to get the test to be reliable enough to commit it. But in > the end I think using the index locking as a "control" mechanism just isn't > great - as evidenced by 0003 breaking the test entirely. I think unfortunately > there's not much hope for testing this unless we get wait points - which seems > not too close :(. Hopefully somebody will seriously commit to that at some point. It's important, but not very glamorous. -- Peter Geoghegan
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Strahinja KustudićДата:
Сообщение: Re: BUG #17330: EXPLAIN hangs and very long query plans
Следующее
От: Peter GeogheganДата:
Сообщение: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum