Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
От | Andres Freund |
---|---|
Тема | Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum |
Дата | |
Msg-id | 20211111030821.2xxl75qfkynaga4j@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
(Peter Geoghegan <pg@bowt.ie>)
Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum (Andres Freund <andres@anarazel.de>) |
Список | pgsql-bugs |
Hi, On 2021-11-09 19:07:02 -0800, Peter Geoghegan wrote: > diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c > index c7331d810..31fa4d2a3 100644 > --- a/src/backend/access/heap/pruneheap.c > +++ b/src/backend/access/heap/pruneheap.c > @@ -31,6 +31,7 @@ > typedef struct > { > Relation rel; > + BlockNumber targetblkno; It's a good optimization to remove the reundant BufferGetBlockNumber(), but perhaps it's worth committing that separately? > @@ -256,10 +260,8 @@ heap_page_prune(Relation relation, Buffer buffer, > offnum <= maxoff; > offnum = OffsetNumberNext(offnum)) > { > - ItemId itemid; > - > /* Ignore items already processed as part of an earlier chain */ > - if (prstate.marked[offnum]) > + if (prstate.fromvalidchain[offnum]) > continue; > > /* ISTM that we should now call heap_prune_chain() only for the start of actual chains (i.e. redirect or non-hot tuple). I think it'd be good to have a comment above the loop explaining that we're just iterating over chains starting at a non-HOT element. > @@ -269,15 +271,29 @@ heap_page_prune(Relation relation, Buffer buffer, > if (off_loc) > *off_loc = offnum; > > - /* Nothing to do if slot is empty or already dead */ > - itemid = PageGetItemId(page, offnum); > - if (!ItemIdIsUsed(itemid) || ItemIdIsDead(itemid)) > - continue; > - > /* Process this item or chain of items */ > ndeleted += heap_prune_chain(buffer, offnum, &prstate); > } Why did you move this into heap_prune_chain()? > + /* > + * Scan the page again, processing any heap-only tuples that we now know > + * are not part of any valid HOT chain > + */ > + for (offnum = FirstOffsetNumber; > + offnum <= maxoff; > + offnum = OffsetNumberNext(offnum)) > + { > + /* Ignore items already processed as part of a known valid chain */ > + if (prstate.fromvalidchain[offnum]) > + continue; > + > + if (off_loc) > + *off_loc = offnum; > + > + /* Process this disconnected heap-only tuple */ > + ndeleted += heap_prune_heaponly(buffer, offnum, &prstate); > + } > + > /* Clear the offset information once we have processed the given page. */ > if (off_loc) > *off_loc = InvalidOffsetNumber; > @@ -383,19 +399,8 @@ heap_page_prune(Relation relation, Buffer buffer, > pgstat_update_heap_dead_tuples(relation, ndeleted - prstate.ndead); > > /* > - * XXX Should we update the FSM information of this page ? > - * > - * There are two schools of thought here. We may not want to update FSM > - * information so that the page is not used for unrelated UPDATEs/INSERTs > - * and any free space in this page will remain available for further > - * UPDATEs in *this* page, thus improving chances for doing HOT updates. > - * > - * But for a large table and where a page does not receive further UPDATEs > - * for a long time, we might waste this space by not updating the FSM > - * information. The relation may get extended and fragmented further. > - * > - * One possibility is to leave "fillfactor" worth of space in this page > - * and update FSM with the remaining space. > + * Don't update the FSM information on this page. We leave it up to our > + * caller to decide what to do about that. > */ I don't think this should be part of the commit? > offnum = rootoffnum; > + nchain = 0; > /* while not end of the chain */ > for (;;) > { > ItemId lp; > - bool tupdead, > - recent_dead; > + HeapTupleHeader htup; > + HeapTupleData tup; > + bool tupdead; > > /* Sanity check (pure paranoia) */ > if (offnum < FirstOffsetNumber) > @@ -601,15 +603,11 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate) > break; > > /* If item is already processed, stop --- it must not be same chain */ > - if (prstate->marked[offnum]) > + if (nchain != 0 && prstate->fromvalidchain[offnum]) > break; I think we should make it an error to reach the same tuple multiple ways. It's not *quite* as trivial as making this an assert/error though, as we can only raise an error after checking that the xmin/xmax thing passes. > /* > * If we are looking at the redirected root line pointer, jump to the > * first normal tuple in the chain. If we find a redirect somewhere > @@ -619,42 +617,63 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate) > { > if (nchain > 0) > break; /* not at start of chain */ > + Assert(rootisredirect); > chainitems[nchain++] = offnum; > offnum = ItemIdGetRedirect(rootlp); > continue; > } Why are we handling this inside the loop? > - /* > - * Likewise, a dead line pointer can't be part of the chain. (We > - * already eliminated the case of dead root tuple outside this > - * function.) > - */ > - if (ItemIdIsDead(lp)) > + /* LP_UNUSED or LP_DEAD items obviously not part of the chain */ > + if (!ItemIdIsUsed(lp) || ItemIdIsDead(lp)) > + { > + /* > + * XXX What if we just came from root item, which is a plain heap > + * tuple? Do we need assert that notices when we reach an LP_DEAD > + * or LP_UNUSED item having started from such a root item? > + */ > + Assert(!rootisredirect || nchain > 1); I don't think we can assert that. It's perfectly possible to have a non-hot update chain where the following element can be vacuumed, but the preceding element can't (e.g. when the updater has an older xid that prevents it from being pruned). > chainitems[nchain++] = offnum; > + prstate->fromvalidchain[offnum] = true; > > /* > * Check tuple's visibility status. > */ > - tupdead = recent_dead = false; > + tupdead = false; > > switch (heap_prune_satisfies_vacuum(prstate, &tup, buffer)) > { > @@ -663,7 +682,6 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate) > break; > > case HEAPTUPLE_RECENTLY_DEAD: > - recent_dead = true; > > /* > * This tuple may soon become DEAD. Update the hint field so > @@ -679,6 +697,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate) > * This tuple may soon become DEAD. Update the hint field so > * that the page is reconsidered for pruning in future. > */ > + Assert(!tupdead); > heap_prune_record_prunable(prstate, > HeapTupleHeaderGetUpdateXid(htup)); > break; > @@ -692,6 +711,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate) > * But we don't. See related decisions about when to mark the > * page prunable in heapam.c. > */ > + Assert(!tupdead); > break; > > default: I don't understand these new assert? We just set tupdead to false above? > @@ -700,11 +720,18 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate) > } > > /* > - * Remember the last DEAD tuple seen. We will advance past > - * RECENTLY_DEAD tuples just in case there's a DEAD one after them; > - * but we can't advance past anything else. We have to make sure that > - * we don't miss any DEAD tuples, since DEAD tuples that still have > - * tuple storage after pruning will confuse VACUUM. > + * Remember the last DEAD tuple seen from this HOT chain. > + * > + * We expect to sometimes find a RECENTLY_DEAD tuple after a DEAD > + * tuple. When this happens, the RECENTLY_DEAD tuple will be treated > + * as if it was DEAD all along. Manage that now. I now actually wonder why this is correct. There's another comment about it: > We also prune any RECENTLY_DEAD tuples preceding a DEAD tuple. > * This is OK because a RECENTLY_DEAD tuple preceding a DEAD tuple is really But that doesn't justify very much. What prevents the scenario that some other backend e.g. has a snapshot with xmin=xmax=RECENTLY_DEAD-row. If the RECENTLY_DEAD row has an xid that is later than the DEAD row, this afaict would make it perfectly legal to prune the DEAD row, but *not* the RECENTLY_DEAD one. > + * We're not just interested in DEAD and RECENTLY_DEAD tuples. We > + * need to traverse each and every HOT chain exhaustively, in order to > + * determine which heap-only tuples are part of a valid HOT chain. > + * Heap-only tuples that cannot be located like this must not be part > + * of a valid HOT chain. They are therefore processed during our > + * second pass over the page. > */ > if (tupdead) > { This comment can't really be understood without the historical behaviour of the function. > +static int > +heap_prune_heaponly(Buffer buffer, OffsetNumber offnum, PruneState *prstate) > +{ > + Page dp = (Page) BufferGetPage(buffer); > + ItemId lp; > + HeapTupleHeader htup; > + HeapTupleData tup; > + HTSV_Result res; > + > + lp = PageGetItemId(dp, offnum); > + Assert(ItemIdIsNormal(lp)); > + htup = (HeapTupleHeader) PageGetItem(dp, lp); > + > + /* > + * Caller must make sure that the tuple at 'offnum' is in fact a heap-only > + * tuple that is disconnected from its HOT chain (i.e. isn't reachable > + * through a HOT traversal that starts at any plausible-looking root item > + * on the page). > + */ > + Assert(!prstate->fromvalidchain[offnum]); > + Assert(HeapTupleHeaderIsHeapOnly(htup)); > + > + /* > + * We expect that disconnected heap-only tuples must be from aborted > + * transactions. Any RECENTLY_DEAD tuples we see here are really DEAD, > + * but the heap_prune_satisfies_vacuum test is too coarse to detect it. > + */ > + tup.t_len = ItemIdGetLength(lp); > + tup.t_tableOid = RelationGetRelid(prstate->rel); > + tup.t_data = htup; > + ItemPointerSet(&(tup.t_self), prstate->targetblkno, offnum); > + res = heap_prune_satisfies_vacuum(prstate, &tup, buffer); > + if (res == HEAPTUPLE_DEAD || res == HEAPTUPLE_RECENTLY_DEAD) > + { > + heap_prune_record_unused(prstate, offnum); > + HeapTupleHeaderAdvanceLatestRemovedXid(htup, > + &prstate->latestRemovedXid); > + } > + else > + Assert(false); > + > + /* > + * Should always be DEAD. A DEAD heap-only tuple is always counted in > + * top-level ndeleted counter for pruning operation. > + */ > + return 1; > +} It seems weird to have the if (res == HEAPTUPLE_DEAD ..) branch, but then to unconditionally return 1. I'm not actually sure the Assert is unreachable. I can imagine cases where we'd see e.g. DELETE/INSERT_IN_PROGRESS due to a concurrent subtransaction abort or such. Greetings, Andres Freund
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Michael PaquierДата:
Сообщение: Re: BUG #17280: global-buffer-overflow on select from pg_stat_slru
Следующее
От: Andres FreundДата:
Сообщение: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum