Issues with hash and GiST LP_DEAD setting for kill_prior_tuple

Поиск

Список

Период

Сортировка

От	Peter Geoghegan
Тема	Issues with hash and GiST LP_DEAD setting for kill_prior_tuple
Дата	15 июля 21:19:56
Msg-id	CAH2-Wz=3eeujcHi3P_r+L8n-vDjdue9yGa+ytb95zh--S9kWfA@mail.gmail.com обсуждение исходный текст
Ответы	Re: Issues with hash and GiST LP_DEAD setting for kill_prior_tuple
Список	pgsql-hackers

Дерево обсуждения

Both hash and GiST indexes set LP_DEAD bits for kill_prior_tuple,
using an approach based on that of nbtree. hash gained this ability in
commit 6977b8b7f4, while GiST gained it in commit 013ebc0a7b.

gistkillitems() and _hash_kill_items() both have similar bugs:

* gistkillitems() correctly checks if the page's LSN has changed in
the period between when we initially read the leaf page and the period
when/after we accessed the heap. But (unlike nbtree), it fails to
account for unlogged relations, where the LSN won't ever change, even
when there was unsafe concurrent heap TID recycling by VACUUM at all.

This bug is likely related to the way that GiST and SP-GiST never hold
on to buffer pins on leaf pages. There are other known bugs [1][2]
where index-only scans are broken for the same reason -- we also need
to hold on to a pin to make those safe. (Using an MVCC snapshot is
necessary but not sufficient to make both of these things safe. This
isn't acknowledged by the amgettuple sgml docs, but is nevertheless
true.)

* _hash_kill_items makes roughly the opposite mistake: while hash
might hang on to "leaf" page buffer pins (which makes it okay to set
LP_DEAD bits even when the page has been modified), there appears to
be cases where we don't hang on to a pin. When we don't, we just
acquire the pin within _hash_kill_items for ourselves. This is wrong:
it neglects to compare a stashed page LSN against the leaf page's
now-current LSN in those cases where no pin was held throughout, which
is what nbtree does (that's the only way we can safely drop a pin
eagerly like this). There is no stashed LSN to compare in
HashScanPosData.

FWIW _hash_readpage has a comment about a stashed LSN, so it seems as
if this was barely missed by the work on hash indexes around 2017:

/*
* Remember next and previous block numbers for scrollable
* cursors to know the start position and return false
* indicating that no more matching tuples were found. Also,
* don't reset currPage or lsn, because we expect
* _hash_kill_items to be called for the old page after this
* function returns.
*/
so->currPos.prevPage = prev_blkno;
so->currPos.nextPage = InvalidBlockNumber;
so->currPos.buf = buf;
return false;

* Minor issue: aren't the calls to _hash_kill_items that happen in
_hash_readpage() useless/impossible to reach? I'm referring to this:

/*
* Could not find any matching tuples in the current page, move to
* the next page. Before leaving the current page, deal with any
* killed items.
*/
if (so->numKilled > 0)
_hash_kill_items(scan);

If we couldn't find any matching tuples, then there won't have been
any visibility checks of any tuple/TID. So how can we possibly have
any LP_DEAD items to set in passing?

Background info: I looked into this kill_prior_tuple infrastructure as
part of work that will move most of the logic shared by index AMs that
implement amgettuple out of individual index AMs, and into indexam.c
[1]. My hope is that centralizing this kind of logic in indexam.c will
make these kinds of problems impossible.

[1] https://commitfest.postgresql.org/patch/5721/
[2] https://commitfest.postgresql.org/patch/5542/
[3] https://postgr.es/m/CAH2-WzmVvU2KmKyq8sUeYXrc_roA5LfOgDE1-vdtorpk_M3DfA@mail.gmail.com
--
Peter Geoghegan

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Issues with hash and GiST LP_DEAD setting for kill_prior_tuple