Re: RFC: Improve CPU cache locality of syscache searches

Поиск
Список
Период
Сортировка
От John Naylor
Тема Re: RFC: Improve CPU cache locality of syscache searches
Дата
Msg-id CAFBsxsEtCJkO8vOjfBzfHdsG0TDGjt5CGRE3RG_1+0WOpyw3sg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: RFC: Improve CPU cache locality of syscache searches  (Andres Freund <andres@anarazel.de>)
Ответы Re: RFC: Improve CPU cache locality of syscache searches  (John Naylor <john.naylor@enterprisedb.com>)
Список pgsql-hackers

On Thu, Aug 5, 2021 at 4:12 PM Andres Freund <andres@anarazel.de> wrote:
> I have wondered before whether we should have syscache definitions generate
> code specific to each syscache definition. I do think that'd give a good bit
> of performance boost. But I don't see a trivial way to get there without
> notational overhead.
>
> We could define syscaches in a header using a macro that's defined differently
> in syscache.c than everywhere else. The header would declare a set of
> functions for each syscache, syscache.c would define them to call an
> always_inline function with the relevant constants.
>
> Or perhaps we should move syscache definitions into the pg_*.h headers, and
> generate the relevant code as part of their processing. That seems like it
> could be nice from a modularity POV alone. And it could do better than the
> current approach, because we could hardcode the types for columns in the
> syscache definition without increasing notational overhead.

I had code laying around to generate the info needed for initialization, but I apparently deleted it and never emailed it. :-(

If we went that far, I wonder if we could specialize the cc_bucket for the actual types, rather than just number of Datums, which would further save on space. More complex, though.

Also, if comparisons got cheaper, maybe we could get away with not storing the full hash in the bucket in favor of a tag of the 16 highest bits. (The catctup would still need the full hash for invalidation, at least).

Another idea I came across while researching in-memory key-value stores is "bulk chaining" -- see [1] for an example. In that case, our example 2-cacheline bucket would have a dlist_head pointing not to the catctups, but to another bucket. So that throws away my L1/L2 analogy and uses a chain of buckets, each of which contain multiple sets of hashes/keys/pointers. That's closer to a normal collision chain, but with better cache locality. If we did that, I imagine the catctup could dispense with storing its Datum array and its dlist_node entirely.

[1] https://www.usenix.org/system/files/conference/nsdi14/nsdi14-paper-lim.pdf

--
John Naylor
EDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "tanghy.fnst@fujitsu.com"
Дата:
Сообщение: [PATCH]Remove obsolete macro CHECKFLOATVAL in btree_gist
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Alias collision in `refresh materialized view concurrently`