Re: Returning nbtree posting list TIDs in DESC order during backwards scans
| От | Peter Geoghegan |
|---|---|
| Тема | Re: Returning nbtree posting list TIDs in DESC order during backwards scans |
| Дата | |
| Msg-id | CAH2-Wznq0zWu5bMEmaFZ+s3mkNpHZm-nAD=7yx_eS4t3=jZEUA@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: Returning nbtree posting list TIDs in DESC order during backwards scans (Victor Yegorov <vyegorov@gmail.com>) |
| Ответы |
Re: Returning nbtree posting list TIDs in DESC order during backwards scans
|
| Список | pgsql-hackers |
On Wed, Dec 3, 2025 at 10:18 AM Victor Yegorov <vyegorov@gmail.com> wrote: > Patch looks fine, applies and compiles cleanly, passes tests. This patch was trickier than initially expected. I paid no attention to the possible downside of changing the posting list iteration code in _bt_readpage (i.e. from scan posting lists in descending order for backwards scans), which was an oversight. It seems that we're very sensitive to how the compiler allocates registers within _bt_readpage, which led to v4 of the patch (and presumably all earlier versions) not storing itemIndex in a register. With v4, the compiler spilled the itemIndex comparison to the stack (at least on my machine, which uses GCC 15.2 from Debian unstable), and so had to read itemIndex from memory on each loop iteration. This regressed pgbench's SELECT workload by about 1% of total TPS (on my Zen 3 CPU). Attached v5 avoids the regression by tweaking _bt_readpage. I will commit this version soon (I really mean it this time!). I'm not sure why these changes have the intended effect -- I mostly figured all this out through trial and error. Though I can say that my testing showed that _not_ changing the posting list iteration code in the _bt_readpage forwards scan loop makes the crucial difference. The other tweaks probably aren't strictly necessary, but they seem to make the compiler generate ever so slightly faster code (at least with pgbench SELECT), so I kept them in. I also gave up on the idea of using a bitmapset for v5 -- the issue with regressing _bt_readpage discouraged me from adding code that allocates memory (more than once per scan) within btgettuple, which is a performance-critical path. We can simply sort and unique-ify the killedItems array from within _bt_killitems instead -- which is largely the same way that my original v1 did it. That way it won't matter what order we append to killItems in, relative to the scan direction. (The downside of sticking with an array for killedItems is that we can still run out of array space in extreme cases involving scrollable cursors that return many killable tuples, and repeatedly append the same TID to killedItems[], but that doesn't seem like much of a loss to me -- that almost never happens in practice.) > I'd like to point out a missing space after the dot in the 2nd para of the commit message, > falls out of style. Fixed that too. Thanks -- Peter Geoghegan
Вложения
В списке pgsql-hackers по дате отправления: