Re: Remove Instruction Synchronization Barrier in spin_delay() for ARM64 architecture
От | Andres Freund |
---|---|
Тема | Re: Remove Instruction Synchronization Barrier in spin_delay() for ARM64 architecture |
Дата | |
Msg-id | fgsf5ofxte7er3z6t2womog6t3nlhiwklyy5bg6jfshj3maln2@enb6qeculxlm обсуждение исходный текст |
Ответ на | Re: Remove Instruction Synchronization Barrier in spin_delay() for ARM64 architecture (Nathan Bossart <nathandbossart@gmail.com>) |
Ответы |
Re: Remove Instruction Synchronization Barrier in spin_delay() for ARM64 architecture
Re: Remove Instruction Synchronization Barrier in spin_delay() for ARM64 architecture |
Список | pgsql-hackers |
Hi, On 2025-08-15 12:57:52 -0500, Nathan Bossart wrote: > On Fri, Aug 15, 2025 at 01:39:52PM -0400, Andres Freund wrote: > > On 2025-08-14 11:29:08 +0200, Álvaro Herrera wrote: > >> However, changing that spinlock to an lwlock doesn't look easy, because of > >> the way each pgss entry is created as a dynahash entry, and then deallocated > >> from there. With spinlocks we can just reinit the spinlock each time, but > >> that doesn't work with lwlocks. We have no easy way to associate then > >> disassociate each entry from a specific lwlock. > > > > I'm not following? The lwlock can just be inside the struct, just like the > > spinlock is? "Association" is just LWLockInitialize() and deassociation is not > > needed. > > Indeed. I rebased an old patch that I had lying around to demonstrate. If > my past testing [0] is to be trusted, this actually hurts performance, > unfortunately. FWIW, rather interesting result of testing the patch briefly: On my older workstation, the patch is a substantial *gain* when there's a lot of contention. But on my newer workstation it's a *loss*. The penalty from enabling pg_stat_statements for readonly pgbench on the newer workstation is rather bad - about 1/3 the throughput. I think the main reason that lwlocks loose on the newer machine is that we loose spinning. The newer machine has more cores and more numa domains and the fairer locks lead to more cacheline pingpong... IMO, the only way to actually make pg_stat_statements scale is to move to a model much more like our regular stats. I.e. accumulate counters in backend local memory and only occasionally update the shared stats. Even if you were to move pgss successfully to atomics, the cacheline contention still would be terrible for performance. FWIW, I'd not be surprised if moving to atomics would often cause *slowdowns* compared to using the spinlocks. You'd replace one atomic operation with dozens, to update all those fields individually. With loads of cacheline pingpong inbetween. Greetings, Andres Freund
В списке pgsql-hackers по дате отправления: