Re: Read-Write optimistic lock (Re: sinvaladt.c: remove msgnumLock, use atomic operations on maxMsgNum)
От | Sergey Shinderuk |
---|---|
Тема | Re: Read-Write optimistic lock (Re: sinvaladt.c: remove msgnumLock, use atomic operations on maxMsgNum) |
Дата | |
Msg-id | e960e889-f85c-4be8-819c-acd6ca299ce2@postgrespro.ru обсуждение исходный текст |
Ответ на | Re: Read-Write optimistic lock (Re: sinvaladt.c: remove msgnumLock, use atomic operations on maxMsgNum) (Andres Freund <andres@anarazel.de>) |
Список | pgsql-hackers |
On 16.06.2025 17:41, Andres Freund wrote: > TBH, I don't see a point in continuing with this thread without something that > others can test. I rather doubt that the right fix here is to just change the > lock model over, but without a repro I can't evaluate that. Hello, I think I can reproduce the issue with pgbench on a muti-core server. I start a regular select-only test with 64 clients, and while it's running, I start a plpgsql loop creating and dropping temporary tables from a single psql session. I observe ~25% drop in tps reported by pgbench until I cancel the query in psql. $ pgbench -n -S -c64 -j64 -T300 -P1 progress: 10.0 s, 1249724.7 tps, lat 0.051 ms stddev 0.002, 0 failed progress: 11.0 s, 1248289.0 tps, lat 0.051 ms stddev 0.002, 0 failed progress: 12.0 s, 1246001.0 tps, lat 0.051 ms stddev 0.002, 0 failed progress: 13.0 s, 1247832.5 tps, lat 0.051 ms stddev 0.002, 0 failed progress: 14.0 s, 1248205.8 tps, lat 0.051 ms stddev 0.002, 0 failed progress: 15.0 s, 1247737.3 tps, lat 0.051 ms stddev 0.002, 0 failed progress: 16.0 s, 1219444.3 tps, lat 0.052 ms stddev 0.039, 0 failed progress: 17.0 s, 893943.4 tps, lat 0.071 ms stddev 0.159, 0 failed progress: 18.0 s, 927861.3 tps, lat 0.069 ms stddev 0.150, 0 failed progress: 19.0 s, 886317.1 tps, lat 0.072 ms stddev 0.163, 0 failed progress: 20.0 s, 877200.1 tps, lat 0.073 ms stddev 0.164, 0 failed progress: 21.0 s, 875424.4 tps, lat 0.073 ms stddev 0.163, 0 failed progress: 22.0 s, 877693.0 tps, lat 0.073 ms stddev 0.165, 0 failed progress: 23.0 s, 897202.8 tps, lat 0.071 ms stddev 0.158, 0 failed progress: 24.0 s, 917853.4 tps, lat 0.070 ms stddev 0.153, 0 failed progress: 25.0 s, 907865.1 tps, lat 0.070 ms stddev 0.154, 0 failed Here I started the following loop in psql around 17s and tps dropped by ~25%: do $$ begin for i in 1..1000000 loop create temp table tt1 (a bigserial primary key, b text); drop table tt1; commit; end loop; end; $$; Now, if I simply remove the spinlock in SIGetDataEntries, I see a drop of just ~6% under concurrent DDL. I think this strongly suggests that the spinlock is the bottleneck. Before that, I tried removing `if (!hasMessages) return` optimization in SIGetDataEntries to stress the spinlock and observed ~35% drop in tps of select-only with an empty sinval queue (no DDL running in background). Then I also removed the spinlock in SIGetDataEntries, and the loss was just ~4%, which may be noise. I think this also suggests that the spinlock could be the bottleneck. I'm running this on a 2 socket AMD EPYC 9654 96-Core server with postgres and pgbench bound to distinct CPUs. PGDATA is placed on tmpfs. postgres is running with the default settings. pgbench tables are of scale 1. pgbench is connecting via loopback/127.0.0.1. Does this sound convincing? Best regards, -- Sergey Shinderuk https://postgrespro.com/
В списке pgsql-hackers по дате отправления: