Krunal Bauskar <krunalbauskar@gmail.com> writes:
> On Mon, 30 Nov 2020 at 10:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The results I posted at [1] seem to contradict this for Apple's new
>> machines.
> For the results you saw on Mac-Mini was LSE enabled by default.
Hmm, I don't know how to get Apple's clang to admit what its default
settings are ... anybody?
However, it does accept "-march=armv8-a+lse", and that seems to
not be the default, because I get different results from my spinlock-
pounding test than I did yesterday. Abbreviating into a table:
--- CFLAGS=-O2 --- --- CFLAGS="-O2 -march=armv8-a+lse" ---
TPS HEAD CAS patch HEAD CAS patch
clients=1 2127 2174 2612 2722
clients=2 1816 859 892 950
clients=4 714 519 610 468
clients=8 - - 108 185
Unfortunately, that still doesn't lead me to think that either LSE
or CAS are net wins on this hardware. It's quite clear that LSE
makes the uncontended case a good bit faster, but the contended case
is a lot worse, so is that really a tradeoff we want?
> * I would also suggest if possible try with higher scalability (more than 4
> to check if with increase scalability CAS out-perform).
As I said yesterday, running more than 4 processes is just going
to bring the low-performance cores into the equation, which is likely
to swamp any interesting comparison. I did run the test with "-c 8"
today, as shown in the right-hand columns, and the results seem
to bear that out.
regards, tom lane