Hi Yura,
On 11/27/2017 07:41 AM, Юрий Соколов wrote:
>>> I looked at assembly, and remembered, that last commit simplifies
>>> `init_local_spin_delay` to just two-three writes of zeroes (looks
>>> like compiler combines 2*4byte write into 1*8 write). Compared to
>>> code around (especially in LWLockAcquire itself), this overhead
>>> is negligible.
>>>
>>> Though, I found that there is benefit in calling LWLockAttemptLockOnce
>>> before entering loop with calls to LWLockAttemptLockOrQueue in the
>>> LWLockAcquire (in there is not much contention). And this way, `inline`
>>> decorator for LWLockAttemptLockOrQueue could be omitted. Given, clang
>>> doesn't want to inline this function, it could be the best way.
>>
>> In attach version with LWLockAcquireOnce called before entering loop
>> in LWLockAcquire.
>>
>
> Oh... there were stupid error in previos file.
> Attached fixed version.
>
I can reconfirm my performance findings with this patch; system same as
up-thread.
Thanks !
Best regards, Jesper