Re: RFC: Improve CPU cache locality of syscache searches

Поиск

Список

Период

Сортировка

От	Andres Freund
Тема	Re: RFC: Improve CPU cache locality of syscache searches
Дата	4 августа 2021 г. 22:44:44
Msg-id	20210804194444.gneyztouva3fmngg@alap3.anarazel.de обсуждение исходный текст
Ответ на	RFC: Improve CPU cache locality of syscache searches (John Naylor <john.naylor@enterprisedb.com>)
Ответы	Re: RFC: Improve CPU cache locality of syscache searches (John Naylor <john.naylor@enterprisedb.com>)
Список	pgsql-hackers

Дерево обсуждения

Hi,

On 2021-08-04 12:39:29 -0400, John Naylor wrote:
> CPU caches have multiple levels, so I had an idea to use that concept in
> the syscaches.

I do think we loose a good bit to syscache efficiency in real workloads, so I
think it's worth investing time into it.

However:

> Imagine if catcache buckets were cacheline-sized. In that case, we would
> store the most recently accessed hash values and pointers to catctup, in
> addition to the dlist_head:
>
> typedef struct cc_bucket
> {
>   uint32 hashes[4];
>   catctup *ct[4];
>   dlist_head;
> };

I'm not convinced that the above the right idea though. Even if the hash
matches, you're still going to need to fetch at least catctup->keys[0] from
a separate cacheline to be able to return the cache entry.

If the hashes we use are decent and we're resizing at reasonable thresholds,
we shouldn't constantly lookup up values that are later in a collision chain -
particularly because we move cache hits to the head of the bucket. So I don't
think a scheme as you described would actually result in a really better cache
hit ratio?

ISTM that something like

struct cc_bucket_1
{
    uint32 hashes[3]; // 12
    // 4 bytes alignment padding
    Datum key0s[3]; // 24
    catctup *ct[3]; // 24
    // cacheline boundary
    dlist_head conflicts; // 16
};

would be better for 1 key values?

It's obviously annoying to need different bucket types for different key
counts, but given how much 3 unused key Datums waste, it seems worth paying
for?

One issue with stuff like this is that aset.c won't return cacheline aligned
values...

It's possible that it'd be better to put the catctup pointers onto a
neigboring cacheline (so you still get TLB benefits, as well as making it easy
for prefetchers) and increase the number of hashes/keys that fit on one
cacheline.

If we stuffed four values into one bucket we could potentially SIMD the hash
and Datum comparisons ;)

Greetings,

Andres Freund

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Robert Haas
Дата: 04 августа 2021 г., 22:37:36
Сообщение: Re: A varint implementation for PG?

Следующее

От: Andres Freund
Дата: 04 августа 2021 г., 22:45:58
Сообщение: Re: A varint implementation for PG?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: RFC: Improve CPU cache locality of syscache searches

Предыдущее

Следующее