Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA

Поиск
Список
Период
Сортировка
От Jesper Pedersen
Тема Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA
Дата
Msg-id 8e1dce77-6d4c-cfdb-00e0-8e9b3321aaba@redhat.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA  (Sokolov Yura <funny.falcon@postgrespro.ru>)
Ответы Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA  (Jesper Pedersen <jesper.pedersen@redhat.com>)
Список pgsql-hackers
Hi,

On 09/08/2017 03:35 PM, Sokolov Yura wrote:
>> I'm seeing
>>
>> -M prepared: Up to 11% improvement
>> -M prepared -S: No improvement, no regression ("noise")
>> -M prepared -N: Up to 12% improvement
>>
>> for all runs the improvement shows up the closer you get to the number
>> of CPU threads, or above. Although I'm not seeing the same
>> improvements as you on very large client counts there are definitely
>> improvements :)
> 
> It is expected:
> - patch "fixes NUMA": for example, it doesn't give improvement on 1 socket
>    at all (I've tested it using numactl to bind to 1 socket)
> - and certainly it gives less improvement on 2 sockets than on 4 sockets
>    (and 28 cores vs 72 cores also gives difference),
> - one of hot points were CLogControlLock, and it were fixed with
>    "Group mode for CLOG updates" [1]
>

I'm planning to re-test that patch.

>>
>> +static inline bool
>> +LWLockAttemptLockOrQueueSelf(LWLock *lock, LWLockMode mode,
>> LWLockMode waitmode)
>>
>> I'll leave it to the Committer to decide if this method is too big to
>> be "inline".
> 
> GCC 4.9 doesn't want to inline it without directive, and function call
> is then remarkable in profile.
> 
> Attach contains version with all suggestions applied except remove of
> "inline".
>

Yes, ideally the method will be kept at "inline".

>> Open questions:
>> ---------------
>> * spins_per_delay as extern
>> * Calculation of skip_wait_list
> 
> Currently calculation of skip_wait_list is mostly empirical (ie best
> i measured).
>

Ok, good to know.

> I strongly think that instead of spins_per_delay something dependent
> on concrete lock should be used. I tried to store it in a LWLock
> itself, but it were worse.

Yes, LWLock should be kept as small as possible, and cache line aligned 
due to the cache storms, as shown by perf c2c.

> Recently I understand it should be stored in array indexed by tranche,
> but I didn't implement it yet, and therefore didn't measure.
> 

Different constants for the LWLock could have an impact, but the 
constants would also be dependent on machine setup, and work load.

Thanks for working on this !

Best regards, Jesper


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] Still another race condition in recovery TAP tests
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: [HACKERS] The case for removing replacement selection sort