Re: [HACKERS] memory layouts for binary search in nbtree

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: [HACKERS] memory layouts for binary search in nbtree
Дата
Msg-id CAH2-WzmW9LjTfzp7uY4CHsF3NH0YM-_pow3LXRh18ORLgeu48A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: memory layouts for binary search in nbtree  (Peter Geoghegan <pg@heroku.com>)
Ответы Re: [HACKERS] memory layouts for binary search in nbtree  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Thu, May 19, 2016 at 7:28 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Abbreviated keys in indexes are supposed to help with this. Basically,
> the ItemId array is made to be interlaced with small abbreviated keys
> (say one or two bytes), only in the typically less than 1% of pages
> that are internal (leaf page binary searches don't change).

I looked at this again recently. I wrote a patch to prove to myself
that we can fairly easily reclaim 15 bits from every nbtree internal
page ItemId, and put an abbreviated key there instead. The patch has
nbtree tell PageAdditem() to store an abbreviated key within lp_len
(actually, we call PageAddItemAbbrev() now). I didn't realize that
stealing lp_len might be feasible until recently, but it appears to be
-- this is a lot simpler than other approaches might be. I already
implemented a rudimentary, POC encoding scheme for int4, but text is
the datatype that I'd expect to see a real benefit for.

While this POC patch of mine could easily have bugs, it is at least
true that "make check-world" passes for me. For nbtree, reclaiming the
lp_len space was just a matter of teaching a small number of existing
places to go to the IndexTuple to get a tuple's length, rather than
trusting an ItemId's lp_len field (that is, rather than calling
ItemIdGetLength()). Most nbtree related code already gets the length
from the index tuple header today, since it's pretty rare to care
about the length of a tuple but not its contents. This did require
updating some generic routines within bufpage.c, too, but that wasn't
so bad.

Can you suggest a workload/hardware to assess the benefits of an
optimization like this, Andres? Is there a profile available somewhere
in the archives that shows many cache misses within _bt_binsrch()?

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Petr Jelinek
Дата:
Сообщение: Re: [HACKERS] Get stuck when dropping a subscription duringsynchronizing table
Следующее
От: Andres Freund
Дата:
Сообщение: Re: [HACKERS] memory layouts for binary search in nbtree