Re: Change GUC hashtable to use simplehash?
От | Ants Aasma |
---|---|
Тема | Re: Change GUC hashtable to use simplehash? |
Дата | |
Msg-id | CANwKhkP7pCiW_5fAswLhs71-JKGEz1c1+PC0a_w1fwY4iGMqUA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Change GUC hashtable to use simplehash? (Jeff Davis <pgsql@j-davis.com>) |
Ответы |
Re: Change GUC hashtable to use simplehash?
(John Naylor <johncnaylorls@gmail.com>)
|
Список | pgsql-hackers |
On Sun, 21 Jan 2024 at 03:06, Jeff Davis <pgsql@j-davis.com> wrote: > Yes, thank you. I don't think we need to change the algorithm. Jumping in here at a random point just to share my findings from poking around this on and off. I am concentrating here on cstring hashing as that is the most complicated one. One thing that caught my eye in testing was that the unaligned cstring code was unexpectedly faster for short strings (3-18B uniform distribution). Looking into it the cause was fasthash_accum() called in the final iteration. In the unaligned case compiler (clang-15) unrolled the inner loop which allowed it to jump directly into the correct place in the switch. In the unaligned case clang decided to use a data dependent jump which then mispredicts all of the time. But given that we know the data length and we have it in a register already, it's easy enough to just mask out data past the end with a shift. See patch 1. Performance benefit is about 1.5x Measured on a small test harness that just hashes and finalizes an array of strings, with a data dependency between consecutive hashes (next address depends on the previous hash output). Unaligned case can actually take advantage of the same trick as the aligned case, it just has to shuffle the data from two consecutive words before applying the combine function. Patch 2 implements this. It makes the unaligned case almost as fast as the aligned one, both on short and long strings. 10% benefit on short strings, 50% on long ones. Not sure if the second one is worth the extra code. A different approach would be to use the simple word at a time hashing for the unaligned case too and handle word accesses that straddle a page boundary as a special case. Obviously this only makes sense for platforms that support unaligned access. On x86 unaligned access within a cache line is basically free, and across cache lines is only slightly more expensive. On benchmarks calling the aligned code on unaligned strings only has a 5% penalty on long strings, short ones are indistinguishable. I also took a look at using SIMD for implementing the hash using the same aligned access + shuffle trick. The good news is that the shuffling works well enough that neither it nor checking for string end are the longest chain. The bad news is that the data load, alignment, zero finding and masking form a big dependency chain on the first iteration. Mixing and finalization is even worse, fasthash uses 64bit imul instruction that has a 3 cycle latency, the iteration to iteration chain is imul + xor, for 4 cycles or 2 B/cycle (in practice a bit less due to ALU port contention). In SIMD registers there is no 64bit multiply, and 32 bit multiply has a terrible 10 cycle latency on Intel. AES instructions are an interesting option, but it seems that 2 are needed for good enough mixing, at 4 cycles each, we again end up at 2B/cycle. Finalization needs another 3 AES instructions, a shuffle and a xor fold to pass SMHasher, for 17 cycles. The mix latency issue could be worked around by doing more mixing in parallel, potentially up to 8x faster, but this does not help short strings at all and would make the code way bigger. SIMD code does use fewer instructions so it interleaves better with nearby code that is not dependent on it, not sure if that matters anywhere. The short version is that for very long (4k+) strings the attached SIMD code is 35% faster, for short strings it is 35% slower, and this is very much x86-64-v3 only and would need a fallback when AVX and AES-NI are not available. Basically a dead end for the use cases this hash function is used for. Regards, Ants Aasma
Вложения
В списке pgsql-hackers по дате отправления: