Re: Macro customizable hashtable / bitmapscan & aggregation perf

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Macro customizable hashtable / bitmapscan & aggregation perf
Дата
Msg-id 20161003183826.tpmvemicsemkd4ct@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Macro customizable hashtable / bitmapscan & aggregation perf  (Arthur Silva <arthurprs@gmail.com>)
Список pgsql-hackers
Hi,


On 2016-10-03 13:26:09 +0200, Arthur Silva wrote:
> On Sat, Oct 1, 2016 at 2:44 AM, Andres Freund <andres@anarazel.de> wrote:
> A couple of comments.
> * 80% occupancy is a bit conservative for RH hashing, it works well up to
> 90% if you use the early stops due to distance. So that TODO is worth
> pursuing.

I found 90% a tiny bit slower during modifications, due to the increased
moving of cells around.  I actually implemented that TODO at some point,
it's not actually a very clear win for narrow elements and mid-sized
tables - the additional instructions for distance computations cost.

I've played with a different version of robin hood hashing, where the
buckets are ordered by hash-value. I.e. the hashvalue is shifted right,
instead of being masked, to get the hash bucket. That allows to have a
hashtable which is ordered by the hash-value, thus distance checks
simply become >=.  The problem with that is that it only works if you
have an "overflow" area at the end of the table, which is hard to size
correctly.


> * An optimized memory layout for RH hashmap is along the lines of
> HHHKVKVKV, using one bit of the hash as the present/absent flag. That plays
> specially well with large occupancies (~90%). Due to RH the probings on the
> H array are very short and within a single cache line. Even with a 31bit
> hash it's reallyyy rare to have to probe more than 1 entry in the KV array.
> Essentially guaranteeing that the 99% percentile of lookups will only hit a
> couple of cache lines (if you ignore crossing cache lines and other
> details).

That seems likely to end up being noticeably more complicated - I'm not
sure the cpu overhead of the more complex lookups weighs of the
benefits.  I'd like to get the basics in, we can optimize further later
on.  Based on my instrumentation the majority of time is currently spent
in the cache-miss for the initial bucket, everything else inside the
hash table code is quite small. After that it's hash value computations
in the execGrouping case.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Removing link-time cross-module refs in contrib
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Removing link-time cross-module refs in contrib