Re: [HACKERS] HACKERS[PROPOSAL] split ProcArrayLock into multipleparts

Поиск
Список
Период
Сортировка
От Sokolov Yura
Тема Re: [HACKERS] HACKERS[PROPOSAL] split ProcArrayLock into multipleparts
Дата
Msg-id 1677284f35c40af909c317c072339492@postgrespro.ru
обсуждение исходный текст
Ответ на Re: [HACKERS] HACKERS[PROPOSAL] split ProcArrayLock into multiple parts  ("Jim Van Fleet" <vanfleet@us.ibm.com>)
Список pgsql-hackers
Good day Robert, Jim, and everyone.

On 2017-06-08 00:06, Jim Van Fleet wrote:
> Robert Haas <robertmhaas@gmail.com> wrote on 06/07/2017 12:12:02 PM:
> 
>> > OK -- would love the feedback and any suggestions on how to
> mitigate the low
>> > end problems.
>> 
>> Did you intend to attach a patch?
> Yes I do -- tomorrow or Thursday -- needs a little cleaning up ...
> 
>> > Sokolov Yura has a patch which, to me, looks good for pgbench rw
>> > performance.  Does not do so well with hammerdb (about the same as
> base) on
>> > single socket and two socket.
>> 
>> Any idea why?  I think we will have to understand *why* certain
> things
>> help in some situations and not others, not just *that* they do, in
>> order to come up with a good solution to this problem.

My patch improves acquiring contended/blocking LWLock on NUMA cause:
a. patched procedure generates much lesser writes, especially because  taking WaitListLock is unified with acquiring
thelock itself.  Access to modified memory is very expensive on NUMA, so less writes  leads to less wasted time.
 
b. it spins several time on lock->state in attempts to acquire lock  before starting attempts to queue self to wait
list.It is really the  cause of some speedup. Without spinning patch just removes  degradation on contention.  I don't
knowwhy spinning doesn't improves single socket performance  though :-) Probably still because all algorithmic overhead
(system calls, sleeping and awakening process) is not too expensive until  NUMA is involved.
 

> Looking at the data now -- LWLockAquire philosophy is different -- at
> first glance I would have guessed "about the same" as the base, but I
> can not yet explain why we have super pgbench rw performance and "the
> same" hammerdb performance

My patch improves only blocking contention, ie when a lot of EXCLUSIVE
locks are involved. pgbench rw generates a lot of write traffic, so
there is a lot of contention and waiting on WALInsertLocks (in
XLogInsertRecord, and waiting in XLogFlush), WalWriteLock (in
XLogFlush), CLogControlLock (in TransactionIdSetTreeStatus).

The case when SHARED lock is much more common than EXCLUSIVE is not
affected by patch, because SHARED is acquired then on the fast path
in both original and patched version.

So, looks like hammerdb doesn't produce much EXCLUSIVE contention on
LWLocks, so it is not improved with the patch.

Splitting ProcArrayLock helps with acquiring SHARED lock on NUMA in
absence of EXCLUSIVE lock because of the same reason why my patch
improves acquiring of blocking lock: less writes to same memory.
Since every process writes to some one part of ProcArrayLock, there
is a lot less writes to each part of ProcArrayLock, so acquiring
SHARED lock pays lesser for accessing to modified memory on NUMA.

Probably I'm mistaken somewhere.

> 
>> 
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>> 

-- 
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table