Allow simplehash to use already-calculated hash values

Поиск
Список
Период
Сортировка
От Jeff Davis
Тема Allow simplehash to use already-calculated hash values
Дата
Msg-id 48abe675e1330f0c264ab2fe0d4ff23eb244f9ef.camel@j-davis.com
обсуждение исходный текст
Ответы Re: Allow simplehash to use already-calculated hash values  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
The attached small patch adds new entry points to simplehash.h that
allow the caller to pass in the already-calculated hash value, so that
simplehash doesn't need to recalculate it.

This is helpful for Memory-Bounded Hash Aggregation[1], which uses the
hash value for multiple purposes. For instance, if the hash table is
full and the group is not already present in the hash table, it needs
to spill the tuple to disk. In that case, it would use the hash value
for the initial lookup, then to select the right spill partition.
Later, when it processes the batch, it will again need the same hash
value to perform a lookup. By separating the hash value calculation
from where it's used, we can avoid needlessly recalculating it for each
of these steps.

There is already an option for simplehash to cache the calculated hash
value and return it with the entry, but that doesn't quite fit the
need. The hash value is needed in cases where the lookup fails, because
that is when the tuple must be spilled; but if the lookup fails, it
returns NULL, discarding the calculated hash value.

I am including this patch separately from Hash Aggregation because it
is a small and independently-reviewable change.

In theory, this could add overhead for "SH_SCOPE extern" for callers
not specifying their own hash value, because it adds an extra external
function call. I looked at the generated LLVM and it's a simple tail
call, and I looked at the generated assembly and it's just an extra
jmp. I tested by doing a hash aggregation of 30M zeroes, which should
exercise that path a lot, and I didn't see any difference. Also, once
we actually use this for hash aggregation, there will be no "SH_SCOPE
extern" callers that don't specify the hash value anyway.

Regards,
    Jeff Davis

[1] 
https://postgr.es/m/507ac540ec7c20136364b5272acbcd4574aa76ef.camel%40j-davis.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: POC: Cleaning up orphaned files using undo logs
Следующее
От: Andres Freund
Дата:
Сообщение: Re: POC: Cleaning up orphaned files using undo logs