Re: Issues with hash and GiST LP_DEAD setting for kill_prior_tuple

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Issues with hash and GiST LP_DEAD setting for kill_prior_tuple
Дата
Msg-id CAH2-Wz=+=QzveoEPpAEmXks+C6Y2ji-=Zi6tRaPE-KknJe-TQQ@mail.gmail.com
обсуждение исходный текст
Ответ на Issues with hash and GiST LP_DEAD setting for kill_prior_tuple  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Tue, Jul 15, 2025 at 2:19 PM Peter Geoghegan <pg@bowt.ie> wrote:
> * gistkillitems() correctly checks if the page's LSN has changed in
> the period between when we initially read the leaf page and the period
> when/after we accessed the heap. But (unlike nbtree), it fails to
> account for unlogged relations, where the LSN won't ever change, even
> when there was unsafe concurrent heap TID recycling by VACUUM at all.

On second thought, I think that GiST is okay here: the gistGetFakeLSN
stuff (added by commit 2edc5cd493 and improved in commit 62401db45c)
is sufficient to make the approach taken by gistkillitems() correct.
All modifications to GiST pages that are logged with a logged relation
will get LSNs that work for gistkillitems' purposes when it deals with
an unlogged relation.

Aside: Obviously, we don't use gistGetFakeLSN in nbtree. That's why we
currently cannot set "BTScanOpaqueData.dropPin = true" with an
unlogged relation. I wonder if it would make sense to make nbtree
follow gist here. We could generalize gistGetFakeLSN, and use it in
nbtree, too. That way, we could remove the
"RelationNeedsWAL(scan->indexRelation)" special case when setting
BTScanOpaqueData.dropPin at the start of each scan. This approach is
arguably simpler, and comes with a performance benefit: nbtree
wouldn't block progress by VACUUM when scanning unlogged relations by
holding on to its pin across amgettuple calls, exactly like with
logged relations. In general, I see no reason why amgettuple index AMs
shouldn't all take exactly the same approach to holding on to/dropping
leaf page buffer pins.

> * _hash_kill_items makes roughly the opposite mistake: while hash
> might hang on to "leaf" page buffer pins (which makes it okay to set
> LP_DEAD bits even when the page has been modified), there appears to
> be cases where we don't hang on to a pin. When we don't, we just
> acquire the pin within _hash_kill_items for ourselves. This is wrong:
> it neglects to compare a stashed page LSN against the leaf page's
> now-current LSN in those cases where no pin was held throughout, which
> is what nbtree does (that's the only way we can safely drop a pin
> eagerly like this). There is no stashed LSN to compare in
> HashScanPosData.

Still no reason to doubt that _hash_kill_items is faulty, though

--
Peter Geoghegan



В списке pgsql-hackers по дате отправления: