$ pgbench -S -T 10 -c 32 -j 32 -M prepared -P 1 pgbenchpgbench (15.3 (Ubuntu 15.3-1.pgdg23.04+1))
progress: 1.0 s, 1015713.0 tps, lat 0.031 ms stddev 0.007, 0 failed
progress: 2.0 s, 1083780.4 tps, lat 0.029 ms stddev 0.007, 0 failed...
progress: 8.0 s, 1084574.1 tps, lat 0.029 ms stddev 0.001, 0 failed
progress: 9.0 s, 1082665.1 tps, lat 0.029 ms stddev 0.001, 0 failed
tps = 1077739.910163 (without initial connection time)
Which even seems a whole 0.9% faster than 14 on this hardware! The wonders never cease.
On Thu, Jun 8, 2023 at 9:21 PM Andres Freund <
andres@anarazel.de> wrote:
You might need to add --no-children to the perf report invocation, otherwise
it'll show you the call graph inverted.
My problem was not writing kernel symbols out, I was only getting addresses for some reason. This worked:
sudo perf record -g --call-graph dwarf -d --phys-data -a sleep 1
perf report --stdio
And once I looked at the stack trace I immediately saw the problem, fixed the config option, and this report is now closed as PEBKAC on my part. Somehow I didn't notice the 15 installs on both systems had log_min_duration_statement=0, and that's why the performance kept dropping *only* on the fastest runs.
What I've learned today then is that if someone sees osq_lock in simple perf top out on oddly slow server, it's possible they are overloading a device writing out log file data, and leaving out the boring parts the call trace you might see is:
EmitErrorReport
__GI___libc_write
ksys_write
__fdget_pos
mutex_lock
__mutex_lock_slowpath
__mutex_lock.constprop.0
71.20% osq_lock
Everyone was stuck trying to find the end of the log file to write to it, and that was the entirety of the problem. Hope that call trace and info helps out some future goofball making the same mistake. I'd wager this will come up again.
Thanks to everyone who helped out and I'm looking forward to PG16 testing now that I have this rusty, embarrassing warm-up out of the way.