Re: Read-Write optimistic lock (Re: sinvaladt.c: remove msgnumLock, use atomic operations on maxMsgNum)
От | Olga Antonova |
---|---|
Тема | Re: Read-Write optimistic lock (Re: sinvaladt.c: remove msgnumLock, use atomic operations on maxMsgNum) |
Дата | |
Msg-id | db4aca5c-c22b-4eb5-850d-212768f4fcac@postgrespro.ru обсуждение исходный текст |
Ответ на | Re: Read-Write optimistic lock (Re: sinvaladt.c: remove msgnumLock, use atomic operations on maxMsgNum) (Andres Freund <andres@anarazel.de>) |
Список | pgsql-hackers |
Hi, On 7/16/25 18:54, Andres Freund wrote: > That was not in reply to the changed patch, but about the performance numbers > you relayed. We had no repro, and even with the repro that Sergey has now > delivered, we don't see similar levels of what you reported as contention. We investigated this issue in detail and were able to reproduce the spinlock contention in SIGetDataEntries. The problem is most evident on multiprocessor systems with multiple NUMA nodes, but it also occurs on a single node, albeit less pronounced. This is probably also the case for high-frequency CPU. We ran tests on two bare-metal servers: 4 NUMA nodes × 24 CPUs Intel(R) Xeon(R) Gold 6348H CPU @ 2.30GHz. PostgreSQL was running on 3 nodes (72 CPUs). 2 NUMA nodes × 32 CPUs Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz. PostgreSQL was running on a single node (32 CPUs). and two PostgreSQL builds: from master branch and the with the patch v5-0001-Read-Write-optimistic-spin-lock.patch. To generate frequent cache invalidations, we executed a background workload that repeatedly created and dropped temporary tables with indexes in a loop. do $$ begin for i in 1..1000000 loop create temp table tt1 ( f0 bigserial primary key, f1 int, f2 int, f3 int, f4 int, f5 int, f6 int, f7 int, f8 int, f9 int, f10 int); CREATE INDEX ON tt1(f1); CREATE INDEX ON tt1(f2); CREATE INDEX ON tt1(f3); CREATE INDEX ON tt1(f4); CREATE INDEX ON tt1(f5); CREATE INDEX ON tt1(f6); CREATE INDEX ON tt1(f7); CREATE INDEX ON tt1(f8); CREATE INDEX ON tt1(f9); CREATE INDEX ON tt1(f10); drop table tt1; commit; end loop; end; $$; As a benchmark, we used a pgbench select-only scenario with 64 clients: pgbench -U postgres -c 64 -j 32 -T 200 -s 100 -M prepared -b select-only postgres -n For convenience, the test is included as test.sh (attached), with description and setup instructions provided in the README. During the test, we ran perf for 10 seconds using the command perf record -F 99 -a -g --call-graph=dwarf -o perf_data sleep 10. Аnd then generated flame graphs from the collected data 1. Three NUMA nodes (72 CPUs) According to the flame graph (fg_3numa_nopatch.xml), about 34% of exec_bind_message is spent in SIGetDataEntries, >90% of which is spinlock wait (see fg_3numa_nopatch.xml). With the patch the share of SIGetDataEntries decreases to ~6.6%, the main waiting shifts to LWLockAcquire, and RWOptSpinReadStart accounts for only ~1.1% (fg_3numa_patch.xml). TPS improvement: +6–8% (over 5 runs). Without patch: TPS = 731171.336542 With patch: TPS = 786077.155196 2. Single NUMA node (32 CPUs) In this case the problem is less pronounced, but still SIGetDataEntries takes 10.1% of exec_bind_message, of which 82.3% is spinlock wait (fg_1numa_nopatch.xml). With the patch we observed a stable 1.5–2% TPS increase (5 runs). Without patch: TPS = 518941.051825 With patch: TPS = 528768.641836 The flame graph does not show absolute time, but the relative distribution confirms contention on the spinlock in SIGetDataEntries. The problem exists and is a bottleneck under high load, especially on multiprocessor NUMA systems. The patch mitigates this contention and improves performance. --- Best regards, Olga Antonova
Вложения
В списке pgsql-hackers по дате отправления: