Issues with hash and GiST LP_DEAD setting for kill_prior_tuple
От | Peter Geoghegan |
---|---|
Тема | Issues with hash and GiST LP_DEAD setting for kill_prior_tuple |
Дата | |
Msg-id | CAH2-Wz=3eeujcHi3P_r+L8n-vDjdue9yGa+ytb95zh--S9kWfA@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Issues with hash and GiST LP_DEAD setting for kill_prior_tuple
|
Список | pgsql-hackers |
Both hash and GiST indexes set LP_DEAD bits for kill_prior_tuple, using an approach based on that of nbtree. hash gained this ability in commit 6977b8b7f4, while GiST gained it in commit 013ebc0a7b. gistkillitems() and _hash_kill_items() both have similar bugs: * gistkillitems() correctly checks if the page's LSN has changed in the period between when we initially read the leaf page and the period when/after we accessed the heap. But (unlike nbtree), it fails to account for unlogged relations, where the LSN won't ever change, even when there was unsafe concurrent heap TID recycling by VACUUM at all. This bug is likely related to the way that GiST and SP-GiST never hold on to buffer pins on leaf pages. There are other known bugs [1][2] where index-only scans are broken for the same reason -- we also need to hold on to a pin to make those safe. (Using an MVCC snapshot is necessary but not sufficient to make both of these things safe. This isn't acknowledged by the amgettuple sgml docs, but is nevertheless true.) * _hash_kill_items makes roughly the opposite mistake: while hash might hang on to "leaf" page buffer pins (which makes it okay to set LP_DEAD bits even when the page has been modified), there appears to be cases where we don't hang on to a pin. When we don't, we just acquire the pin within _hash_kill_items for ourselves. This is wrong: it neglects to compare a stashed page LSN against the leaf page's now-current LSN in those cases where no pin was held throughout, which is what nbtree does (that's the only way we can safely drop a pin eagerly like this). There is no stashed LSN to compare in HashScanPosData. FWIW _hash_readpage has a comment about a stashed LSN, so it seems as if this was barely missed by the work on hash indexes around 2017: /* * Remember next and previous block numbers for scrollable * cursors to know the start position and return false * indicating that no more matching tuples were found. Also, * don't reset currPage or lsn, because we expect * _hash_kill_items to be called for the old page after this * function returns. */ so->currPos.prevPage = prev_blkno; so->currPos.nextPage = InvalidBlockNumber; so->currPos.buf = buf; return false; * Minor issue: aren't the calls to _hash_kill_items that happen in _hash_readpage() useless/impossible to reach? I'm referring to this: /* * Could not find any matching tuples in the current page, move to * the next page. Before leaving the current page, deal with any * killed items. */ if (so->numKilled > 0) _hash_kill_items(scan); If we couldn't find any matching tuples, then there won't have been any visibility checks of any tuple/TID. So how can we possibly have any LP_DEAD items to set in passing? Background info: I looked into this kill_prior_tuple infrastructure as part of work that will move most of the logic shared by index AMs that implement amgettuple out of individual index AMs, and into indexam.c [1]. My hope is that centralizing this kind of logic in indexam.c will make these kinds of problems impossible. [1] https://commitfest.postgresql.org/patch/5721/ [2] https://commitfest.postgresql.org/patch/5542/ [3] https://postgr.es/m/CAH2-WzmVvU2KmKyq8sUeYXrc_roA5LfOgDE1-vdtorpk_M3DfA@mail.gmail.com -- Peter Geoghegan
В списке pgsql-hackers по дате отправления: