Re: Patch: fix lock contention for HASHHDR.mutex
От | Aleksander Alekseev |
---|---|
Тема | Re: Patch: fix lock contention for HASHHDR.mutex |
Дата | |
Msg-id | 20151229184851.1bb7d1bd@fujitsu обсуждение исходный текст |
Ответ на | Re: Patch: fix lock contention for HASHHDR.mutex (Aleksander Alekseev <a.alekseev@postgrespro.ru>) |
Ответы |
Re: Patch: fix lock contention for HASHHDR.mutex
(Aleksander Alekseev <a.alekseev@postgrespro.ru>)
Re: Patch: fix lock contention for HASHHDR.mutex (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Список | pgsql-hackers |
Good news, everyone! I discovered that NUM_LOCK_OF_PARTITIONS is a bottleneck for a last patch. Long story short - NUM_LOCK_OF_PARTITIONS = (1 << 7) gives best performance: PN = 16, AN = 8, NUM_LOCK_PARTITIONS = (1 << 7): 4782.9 PN = 16, AN = 4, NUM_LOCK_PARTITIONS = (1 << 7): 4089.9 PN = 16, AN = 2, NUM_LOCK_PARTITIONS = (1 << 7): 2822.5 PN = 8, AN = 8, NUM_LOCK_PARTITIONS = (1 << 7): 4391.7 PN = 8, AN = 4, NUM_LOCK_PARTITIONS = (1 << 7): 3665.0 PN = 8, AN = 2, NUM_LOCK_PARTITIONS = (1 << 7): 2205.7 PN = 4, AN = 8, NUM_LOCK_PARTITIONS = (1 << 7): 4031.9 PN = 4, AN = 4, NUM_LOCK_PARTITIONS = (1 << 7): 3378.2 PN = 4, AN = 2, NUM_LOCK_PARTITIONS = (1 << 7): 2142.4 Also I tried to optimize global_free_list_* procedures as I planned. Optimized version is asymptotically faster but works about 5% slower in practice because of some additional operations we need to perform. Patch is attached to this message in case you would like to examine a code. Here is a funny thing - benchmark results I shared 22.12.2015 are wrong because I forgot to run `make clean` after changing lwlock.h (autotools doesn't rebuild project properly after changing .h files). Here are correct results: 60-core server, pgbench -j 64 -c 64 -f pgbench.sql -T 50 -P 1 my_database NUM_LOCK_ | master | no locks | lwlock | spinlock | spinlock PARTITIONS | (99ccb2) | | | | array -----------|----------|----------|----------|----------|---------- 1 << 4 | 660.4 | 2296.3 | 1441.8 | 454.4 | 1597.8 -----------|----------|----------|----------|----------|---------- 1 << 5 | 536.6 | 3006.5 | 1632.3 | 381.8 | 1896.8 -----------|----------|----------|----------|----------|---------- 1 << 6 | 469.9 | 4083.5 | 1693.4 | 427.7 | 2271.5 -----------|----------|----------|----------|----------|---------- 1 << 7 | 430.8 | 4653.0 | 2016.3 | 361.1 | 3277.1 As you may see "spinlock array" version works quite well. It is almost 5 times faster than current dynahash implementation and only 30% slower than "no locks" version. It also guarantees that we will use all available memory. I experimented with various improvements of "spinlock array" as well. E.g. I tried to borrow elements not one a time, but it didn't cause any performance improvements. In case you believe that 5 times faster is not good enough we can also use sharded-global-free-list.patch with appropriate AN and PN values. Still this patch doesn't guarantee that all available memory will be used ("no lock" patch doesn't guarantee it either). Considering that benchmark above is not for all possible cases, but for a very specific one, and that "spinlock array" patch has much better guarantees then "no lock" patch, and that lock-free algorithms are pretty complicated and error-prone I suggest we call "spinlock array" solution good enough, at least until someone complain again that dynahash is a bottleneck. Naturally first I clean a code, write more comments, then re-check once again that there is no performance degradation according to usual pgbench, etc.
Вложения
В списке pgsql-hackers по дате отправления: