Re: Major pgbench synthetic SELECT workload regression, Ubuntu 23.04+PG15

Поиск
Список
Период
Сортировка
От Gregory Smith
Тема Re: Major pgbench synthetic SELECT workload regression, Ubuntu 23.04+PG15
Дата
Msg-id CAHLJuCX0NC7HOZPD-AOXjfQGE8j++sxXkLCcDkWecM_wMJoxzg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Major pgbench synthetic SELECT workload regression, Ubuntu 23.04+PG15  (Andres Freund <andres@anarazel.de>)
Ответы Re: Major pgbench synthetic SELECT workload regression, Ubuntu 23.04+PG15  (Gurjeet Singh <gurjeet@singh.im>)
Re: Major pgbench synthetic SELECT workload regression, Ubuntu 23.04+PG15  (Melanie Plageman <melanieplageman@gmail.com>)
Список pgsql-hackers
Let me start with the happy ending to this thread:

$ pgbench -S -T 10 -c 32 -j 32 -M prepared -P 1 pgbench
pgbench (15.3 (Ubuntu 15.3-1.pgdg23.04+1))
progress: 1.0 s, 1015713.0 tps, lat 0.031 ms stddev 0.007, 0 failed
progress: 2.0 s, 1083780.4 tps, lat 0.029 ms stddev 0.007, 0 failed...
progress: 8.0 s, 1084574.1 tps, lat 0.029 ms stddev 0.001, 0 failed
progress: 9.0 s, 1082665.1 tps, lat 0.029 ms stddev 0.001, 0 failed
tps = 1077739.910163 (without initial connection time)

Which even seems a whole 0.9% faster than 14 on this hardware!  The wonders never cease.

On Thu, Jun 8, 2023 at 9:21 PM Andres Freund <andres@anarazel.de> wrote:
You might need to add --no-children to the perf report invocation, otherwise
it'll show you the call graph inverted.

My problem was not writing kernel symbols out, I was only getting addresses for some reason.  This worked:

  sudo perf record -g --call-graph dwarf -d --phys-data -a sleep 1
  perf report --stdio

And once I looked at the stack trace I immediately saw the problem, fixed the config option, and this report is now closed as PEBKAC on my part.  Somehow I didn't notice the 15 installs on both systems had log_min_duration_statement=0, and that's why the performance kept dropping *only* on the fastest runs.

What I've learned today then is that if someone sees osq_lock in simple perf top out on oddly slow server, it's possible they are overloading a device writing out log file data, and leaving out the boring parts the call trace you might see is:

EmitErrorReport
 __GI___libc_write
  ksys_write
   __fdget_pos
    mutex_lock
     __mutex_lock_slowpath
      __mutex_lock.constprop.0
       71.20% osq_lock

Everyone was stuck trying to find the end of the log file to write to it, and that was the entirety of the problem.  Hope that call trace and info helps out some future goofball making the same mistake.  I'd wager this will come up again.

Thanks to everyone who helped out and I'm looking forward to PG16 testing now that I have this rusty, embarrassing warm-up out of the way.

--
Greg Smith  greg.smith@crunchydata.com
Director of Open Source Strategy

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Error in calculating length of encoded base64 string
Следующее
От: Michael Kefeder
Дата:
Сообщение: Re: GTIN14 support for contrib/isn