Re: Improving spin-lock implementation on ARM.
От | Alexander Korotkov |
---|---|
Тема | Re: Improving spin-lock implementation on ARM. |
Дата | |
Msg-id | CAPpHfdtQmq1uLK0P3Bo0yErYvYA0gY_79Z0CR7PcmynLYyDViQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Improving spin-lock implementation on ARM. (Krunal Bauskar <krunalbauskar@gmail.com>) |
Список | pgsql-hackers |
On Wed, Dec 2, 2020 at 6:58 AM Krunal Bauskar <krunalbauskar@gmail.com> wrote: > 1. CAS patch (applied on the baseline) > - Kunpeng: 10-45% improvement observed [1] > - Graviton2: 30-50% improvement observed [2] What does lower boundary of improvement mean? Does it mean minimal improvement observed? Obviously not, because there is no improvement with a low number of clients or read-only benchmark. > - M1: Only select results are available cas continue to maintain a marginal gain but not significant. [3] Read-only benchmark doesn't involve spinlocks (I've just rechecked this). So, this difference is purely speculative. Also, Tom observed the regression [1]. The benchmark is artificial, but it may correspond to some real workload with heavily-loaded spinlocks. And that might have an explanation. ldrex/strex themselves don't work as memory barrier (this is why compiler adds explicit memory barrier afterwards). And I bet ldrex unpaired with strex could see an outdated value. On high-contended spinlocks that may cause too pessimistic waits. > 2. Let's ignore CAS for a sec and just think of LSE independently > - Kunpeng: regression observed Yeah, it's sad that there are ARM chips, where making the same action in a single instruction is slower than the same actions in multiple instructions. > - Graviton2: gain observed Have to say it's 5.5x improvement, which is dramatically more than CAS-loop patch can give. > - M1: regression observed > [while lse probably is default explicitly enabling it with +lse causes regression on the head itself [4]. > client=2/4: 1816/714 ---- vs ---- 892/610] This is plain false. LSE was enabled in all the versions tested in M1 [2]. So not LSE instructions or +lse option caused the regression, but lack of other options enabled by default in Apple's clang. > Let me know what do you think about this analysis and any specific direction that we should consider to help move forward. In order to move forward with ARM-optimized spinlocks I would investigate other options (at least [3]). Regarding CAS spinlock patch I can propose the following steps to clarify the things. 1. Run Tom's spinlock test on ARM machines other than M1. 2. Try to construct a pgbench script, which produces heavily-loaded spinlock without custom C-function, and also run it across various ARM machines. Links 1. https://www.postgresql.org/message-id/741389.1606530957%40sss.pgh.pa.us 2. https://www.postgresql.org/message-id/1274781.1606760475%40sss.pgh.pa.us 3. https://linux-concepts.blogspot.com/2018/05/spinlock-implementation-in-arm.html ------ Regards, Alexander Korotkov
В списке pgsql-hackers по дате отправления: