Re: Read-Write optimistic lock (Re: sinvaladt.c: remove msgnumLock, use atomic operations on maxMsgNum)

Поиск
Список
Период
Сортировка
От Sergey Shinderuk
Тема Re: Read-Write optimistic lock (Re: sinvaladt.c: remove msgnumLock, use atomic operations on maxMsgNum)
Дата
Msg-id e960e889-f85c-4be8-819c-acd6ca299ce2@postgrespro.ru
обсуждение исходный текст
Ответ на Re: Read-Write optimistic lock (Re: sinvaladt.c: remove msgnumLock, use atomic operations on maxMsgNum)  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On 16.06.2025 17:41, Andres Freund wrote:
> TBH, I don't see a point in continuing with this thread without something that
> others can test.  I rather doubt that the right fix here is to just change the
> lock model over, but without a repro I can't evaluate that.


Hello,

I think I can reproduce the issue with pgbench on a muti-core server. I 
start a regular select-only test with 64 clients, and while it's 
running, I start a plpgsql loop creating and dropping temporary tables 
from a single psql session. I observe ~25% drop in tps reported by 
pgbench until I cancel the query in psql.


$ pgbench -n -S -c64 -j64 -T300 -P1

progress: 10.0 s, 1249724.7 tps, lat 0.051 ms stddev 0.002, 0 failed
progress: 11.0 s, 1248289.0 tps, lat 0.051 ms stddev 0.002, 0 failed
progress: 12.0 s, 1246001.0 tps, lat 0.051 ms stddev 0.002, 0 failed
progress: 13.0 s, 1247832.5 tps, lat 0.051 ms stddev 0.002, 0 failed
progress: 14.0 s, 1248205.8 tps, lat 0.051 ms stddev 0.002, 0 failed
progress: 15.0 s, 1247737.3 tps, lat 0.051 ms stddev 0.002, 0 failed
progress: 16.0 s, 1219444.3 tps, lat 0.052 ms stddev 0.039, 0 failed
progress: 17.0 s, 893943.4 tps, lat 0.071 ms stddev 0.159, 0 failed
progress: 18.0 s, 927861.3 tps, lat 0.069 ms stddev 0.150, 0 failed
progress: 19.0 s, 886317.1 tps, lat 0.072 ms stddev 0.163, 0 failed
progress: 20.0 s, 877200.1 tps, lat 0.073 ms stddev 0.164, 0 failed
progress: 21.0 s, 875424.4 tps, lat 0.073 ms stddev 0.163, 0 failed
progress: 22.0 s, 877693.0 tps, lat 0.073 ms stddev 0.165, 0 failed
progress: 23.0 s, 897202.8 tps, lat 0.071 ms stddev 0.158, 0 failed
progress: 24.0 s, 917853.4 tps, lat 0.070 ms stddev 0.153, 0 failed
progress: 25.0 s, 907865.1 tps, lat 0.070 ms stddev 0.154, 0 failed

Here I started the following loop in psql around 17s and tps dropped by 
~25%:

do $$
begin
   for i in 1..1000000 loop
     create temp table tt1 (a bigserial primary key, b text);
     drop table tt1;
     commit;
   end loop;
end;
$$;

Now, if I simply remove the spinlock in SIGetDataEntries, I see a drop 
of just ~6% under concurrent DDL. I think this strongly suggests that 
the spinlock is the bottleneck.

Before that, I tried removing `if (!hasMessages) return` optimization in 
SIGetDataEntries to stress the spinlock and observed ~35% drop in tps of 
select-only with an empty sinval queue (no DDL running in background). 
Then I also removed the spinlock in SIGetDataEntries, and the loss was 
just ~4%, which may be noise. I think this also suggests that the 
spinlock could be the bottleneck.

I'm running this on a 2 socket AMD EPYC 9654 96-Core server with 
postgres and pgbench bound to distinct CPUs. PGDATA is placed on tmpfs. 
postgres is running with the default settings. pgbench tables are of 
scale 1. pgbench is connecting via loopback/127.0.0.1.

Does this sound convincing?

Best regards,

-- 
Sergey Shinderuk        https://postgrespro.com/




В списке pgsql-hackers по дате отправления: