Re: Returning nbtree posting list TIDs in DESC order during backwards scans
От | Andy Fan |
---|---|
Тема | Re: Returning nbtree posting list TIDs in DESC order during backwards scans |
Дата | |
Msg-id | 87h5yx92ci.fsf@163.com обсуждение исходный текст |
Ответ на | Returning nbtree posting list TIDs in DESC order during backwards scans (Peter Geoghegan <pg@bowt.ie>) |
Список | pgsql-hackers |
Hi, > On Thu, Jul 17, 2025 at 2:26 PM Peter Geoghegan <pg@bowt.ie> wrote: >> The loop has an early check for this (for non-itemDead entries) here: >> >> /* Quickly skip over items never ItemDead-set by btgettuple */ >> if (!kitem->itemDead) >> continue; >> >> I really doubt that this matters > I'll need to think about issues around adding the new kitem->itemDead > bitfield. I'm not really worried about _bt_killitems; more so about > the routines called by _bt_readpage, which must make sure that the bit > is unset every time a TID is saved in so->currPos.items[]. Yes, otherwise the live tuple maybe marked as dead, which is terrible. I think you have unset the bit on all the needed places, including _bt_saveitem, _bt_savepostingitem and _bt_setuppostingitems for the first item in postlist item. I can't find out anything is missed. (I thought another place for this work might be _bt_returnitem, this might be a more centralized place to set. Later I think it is totally wrong because for the queries like SELECT * FROM t WHERE idx_col = 3 LIMIT 3; Some items in so->currPos.items[] were filled in _bt_readpage but maybe never returned to heap at all, and later _bt_killitems also run against it on the whole, so unsetting the bit on _bt_returnitem is too late...) I think this patch does two things. one is refactoring the data struct for _bt_killitems, the other one is scaning the postlist in the backward way for the backward scan. then does the below changes belongs to any of them? Is it an intentional change? _bt_killitems: if (offnum < minoff) - continue; /* pure paranoia */ + { + /* + * Page must have originally been the rightmost page, but has + * since split + */ + Assert(!so->dropPin); + offnum = minoff; + } At last, I can get the same test result as you. The buffer hits go back to 23 in the test case, thank for working on this! > I think that this patch isn't too far off being committable. I think so. -- Best Regards Andy Fan
В списке pgsql-hackers по дате отправления: