Improving spin-lock implementation on ARM.
* Spin-Lock is known to have a significant effect on performance
with increasing scalability.
* Existing Spin-Lock implementation for ARM is sub-optimal due to
use of TAS (test and swap)
* TAS is implemented on ARM as load-store so even if the lock is not free,
store operation will execute to replace the same value.
This redundant operation (mainly store) is costly.
* CAS is implemented on ARM as load-check-store-check that means if the
lock is not free, check operation, post-load will cause the loop to
return there-by saving on costlier store operation. [1]
* x86 uses optimized xchg operation.
ARM too started supporting it (using Large System Extension) with
ARM-v8.1 but since it not supported with ARM-v8, GCC default tends
to roll more generic load-store assembly code.
* gcc-9.4+ onwards there is support for outline-atomics that could emit
both the variants of the code (load-store and cas/swp) and based on
underlying supported architecture proper variant it used but still a lot
of distros don't support GCC-9.4 as the default compiler.
* In light of this, we would like to propose a CAS-based approach based on
our local testing has shown improvement in the range of 10-40%.
(attaching graph).
* Patch enables CAS based approach if the CAS is supported depending on
existing compiled flag HAVE_GCC__ATOMIC_INT32_CAS
(Thanks to Amit Khandekar for rigorously performance testing this patch
with different combinations).
[1]:
https://godbolt.org/z/jqbEsaP.S: Sorry if I missed any standard pgsql protocol since I am just starting with pgsql.