Re: Improving spin-lock implementation on ARM.

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: Improving spin-lock implementation on ARM.
Дата	1 декабря 2020 г. 06:01:20
Msg-id	1367116.1606802480@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Re: Improving spin-lock implementation on ARM. (Alexander Korotkov <aekorotkov@gmail.com>)
Ответы	Re: Improving spin-lock implementation on ARM.
Список	pgsql-hackers

Дерево обсуждения

Alexander Korotkov <aekorotkov@gmail.com> writes:
> 2) None of the patches considered in this thread give a clear
> advantage for PostgreSQL built with LSE.

Yeah, I think so.

> To further confirm this let's wait for Kunpeng 920 tests by Krunal
> Bauskar and Amit Khandekar.  Also it would be nice if someone will run
> benchmarks similar to [1] on Apple M1.

I did what I could in this department.  It's late and I'm not going to
have time to run read/write benchmarks before bed, but here are some
results for the "pgbench -S" cases.  I tried to match your testing
choices, but could not entirely:

* Configure options are --enable-debug, --disable-cassert, no other
special configure options or CFLAG choices.

* I have not been able to find a way to make Apple's compiler not
use the LSE spinlock instructions, so all of these correspond to
your LSE cases.

* I used shared_buffers = 1GB, because this machine only has 16GB
RAM so 32GB is clearly out of reach.  Also I used pgbench scale
factor 100 not 1000.  Since we're trying to measure contention
effects not I/O speed, I don't think a huge test case is appropriate.

* I still haven't gotten pgbench to work with -j settings above 128,
so these runs use -j equal to half -c.  Shouldn't really affect
conclusions about scaling.  (BTW, I see a similar limitation on
macOS Catalina x86_64, so whatever that is, it's not new.)

* Otherwise, the first plot shows median of three results from
"pgbench -S -M prepared -T 120 -c $n -j $j", as you had it.
The right-hand plot shows all three of the values in yerrorbars
format, just to give a sense of the noise level.

Clearly, there is something weird going on at -c 4.  There's a cluster
of results around 180K TPS, and another cluster around 210-220K TPS,
and nothing in between.  I suspect that the scheduler is doing
something bogus with sometimes putting pgbench onto the slow cores.
Anyway, I believe that the apparent gap between HEAD and the other
curves at -c 4 is probably an artifact: HEAD had two 180K-ish results
and one 220K-ish result, while the other curves had the reverse, so
the medians are different but there's probably not any non-chance
effect there.

Bottom line is that these patches don't appear to do much of
anything on M1, as you surmised.

            regards, tom lane

Вложения

m1-results.png

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Improving spin-lock implementation on ARM.

Вложения