[HACKERS] LWLock optimization for multicore Power machines

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема [HACKERS] LWLock optimization for multicore Power machines
Дата
Msg-id CAPpHfdsKrh7c7P8-5eG-qW3VQobybbwqH=gL5Ck+dOES-gBbFg@mail.gmail.com
обсуждение исходный текст
Ответы Re: [HACKERS] LWLock optimization for multicore Power machines  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Re: [HACKERS] LWLock optimization for multicore Power machines  (Robert Haas <robertmhaas@gmail.com>)
Re: [HACKERS] LWLock optimization for multicore Power machines  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Hi everybody!

During FOSDEM/PGDay 2017 developer meeting I said that I have some special assembly optimization for multicore Power machines.  From the answers of other hackers I realized following.
  1. There are some big Power machines with PostgreSQL in production use.  Not as many as Intel, but some of them.
  2. Community could be interested in special assembly optimization for Power machines despite cost of maintaining it.
Power processors use specific implementation of atomic operations.  This implementation is some kind of optimistic locking. 'lwarx' instruction 'reserves index', but that reservation could be broken on 'stwcx', and then we have to retry.  So, for instance CAS operation on Power processor is a loop.  So, loop of CAS operations is two level nested loop.  Benchmarks showed that it becomes real problem for LWLockAttemptLock().  However, one actually can put arbitrary logic between 'lwarx' and 'stwcx' and make it a single loop.  The downside is that this logic has to be implemented in assembly.  See [1] for experiment details.

Results in [1] have a lot of junk which isn't relevant anymore.  This is why I draw a separate graph.

power8-lwlock-asm-ro.png – results of read-only pgbench test on IBM E880 which have 32 physical cores and 256 virtual thread via SMT.  The curves have following meaning.
 * 9.5: unpatched PostgreSQL 9.5
 * pinunpin-cas: PostgreSQL 9.5 + earlier version of 48354581
 * pinunpin-lwlock-asm: PostgreSQL 9.5 + earlier version of 48354581 + LWLock implementation in assembly.

lwlock-power-1.patch – is the patch for assembly implementation of LWLock which I used that time rebased to current master.  

Using assembly in lwlock.c looks rough.  This is why I refactored it by introducing new atomic operation pg_atomic_fetch_mask_add_u32 (see lwlock-power-2.patch).  It checks that all masked bits are clear and then adds to variable.  This atomic have special assembly implementation for Power, and generic implementation for other platforms with loop of CAS.  Probably we would have other implementations for other architectures in future.  This level of abstraction is the best I managed to invent.

Unfortunately, I have no big enough Power machine at hand to reproduce that results.  Actually, I have no Power machine at hand at all.  So, lwlock-power-2.patch was written "blindly".  I would very appreciate if someone would help me with testing and benchmarking.


------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Corey Huinker
Дата:
Сообщение: Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands:\quit_if, \quit_unless)
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] [COMMITTERS] pgsql: pageinspect: Try to fix some bugs in previous commit.