Re: B-tree cache prefetches

Поиск
Список
Период
Сортировка
От Andrey Borodin
Тема Re: B-tree cache prefetches
Дата
Msg-id 74574D24-15C2-4B41-B484-5D84252BBF0B@yandex-team.ru
обсуждение исходный текст
Ответ на Re: B-tree cache prefetches  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-hackers

> 15 окт. 2018 г., в 2:38, Thomas Munro <thomas.munro@enterprisedb.com> написал(а):
>
> On Mon, Oct 15, 2018 at 12:03 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>> 31 авг. 2018 г., в 2:40, Thomas Munro <thomas.munro@enterprisedb.com> написал(а):
>> A related topic is the cache-unfriendliness of traditional binary
>> searches of sorted data.  Check out "Array Layouts for
>> Comparison-Based Searching"[1] if you haven't already.  It says that
>> if your array fits in L2 cache, as our btree pages do, then binary
>> search still wins against Eytzinger et al, which I was a bit
>> disappointed
>>
>> I've re-read that paper. Their results are not about our case, here's a quote:
>>> For arrays small enough _to be kept_ in L2 cache, the branch-free binary search code listed in Listing 2 is the
fastestalgorithm 
>>
>> Surely, we cannot keep all pages in L2. Eytzinger layout could be realistic possibility, except one simple problem:
Ido not see how can we insert items into this layout. 
>
> I don't know.  Aren't we talking about binary search within one 8KB
> page here, anyway?
The paper discuss multiple searches inside one 8Kb region, while we are doing single search in all-uncached page and
moveaway. That's the difference. 

Binserach is optimal, if the page is in L1 before we start search.

>
> Thinking about some other ideas for cache prefetching in btree code,
> here's an idea to improve pipelining of index scans (a bit like the
> two-step pipeline idea from the hash join prefetch thread).  We know
> the TIDs of heap tuples we'll be looking up in the near future, so can
> we prefetch all the way to the tuple?   There are three random
> accesses that follow soon after reading an index tuple:
> BufTableLookup(), then page header, then the tuple itself.  At the
> logical extreme, we could anticipate the probe of SharedBufHash for
> the TID from index tuple i + 3 (perhaps dynahash could expose a
> hash_prefetch_for_hash() facility), and anticipate the read of the
> page header for the tuple pointed to by index tuple i + 2 (perhaps a
> dirty read of SharedBufHash that is only prepared to look at the first
> entry in the bucket would be OK for this), and anticipate the read of
> the tuple for the TID from index tuple i + 1 (perhaps a dirty read of
> the page item table).  Or some less ambitious subset, if that pipeline
> turns out to be too deep; it seems like the simplest experiment would
> be to try prefetching just SharedBufHash lookups.  This may all be
> completely crazy, I haven't tried any of it.
This idea also looks good to me. One thing bothers me. if we do __builtin_prefetch(&a[b[c[d]]],0,1), and a, b, c and d
allare not in cache - will we incur CPU wait time for fetching a,b and c? 
This [0] guys are expoiting C++ coroutines for prefetching this kind of stuff, but it seems like too much of hassle.
>
>> Also, that research is quite distant from B-tree binsearch: thier data items are small and do not represent
referenceto actual data. Eytzinger exploits the fact that two posiible future keys share same cache line. But it is
hardto provide, given we place items at quite random place of the page. 
>
> Based on the the above quotes, I'm not suggesting we should use
> Eytzinger for search-within-page.  But out of curiosity, I checked,
> and saw that you can actually get a pair of index tuples holding a
> six-character text key or an integer key into a 64 byte cache line.

Well, this proves that in theory Eytzinger is an opportunity.

Best regards, Andrey Borodin.

[0] http://www.vldb.org/pvldb/vol11/p1702-jonathan.pdf

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: WIP: Avoid creation of the free space map for small tables
Следующее
От: Sandeep Thakkar
Дата:
Сообщение: Re: PostgreSQL 11 RC1 + GA Dates