pgbench prints suspect tps numbers

Поиск
Список
Период
Сортировка
От Daniel Wood
Тема pgbench prints suspect tps numbers
Дата
Msg-id 1486586885.129235.1561428677354@connect.xfinity.com
обсуждение исходный текст
Ответы Re: pgbench prints suspect tps numbers  (Fabien COELHO <coelho@cri.ensmp.fr>)
Список pgsql-hackers

Short benchmark runs are bad if the runs aren't long enough to produce consistent results.

Having to do long runs because a benchmarking tool 'converges to reality' over time in reporting a tps number, due to miscalculation, is also bad.


I want to measure TPS at a particular connection count.  A fully cached Select Only pgbench produces fairly consistent numbers over short runs of a few minutes.

pgbench's "including connections establishing" number is polluted by fact that for many seconds the benchmark is running with less than the expected number of connections.  I thought that was why the 'excluding' number was also printed and I had been relying on that number.

pgbench's "excluding connections establishing" number seems to be a total garbage number which can be way way over the actual tps.  During a period when I had a bug causing slow connections I noticed a consistent value of about 100K tps over the measurement intervals.  At the end of a 5 minute run it reported 450K tps!  There was no point anywhere during the benchmark that it ran anywhere near that number.


I had been using 'excluding' because it 'seemed' perhaps right in the past.  It was only when I got crazy numbers I looked at the calculation to find:


    tps_exclude = total->cnt / (time_include - (INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients));


The 'cnt' is the total across the entire run including the period when connections are ramping up.  I don't see how dividing by the total time minus the average connection time produces the correct result.


Even without buggy slow connections, when connecting 1000 clients, I've wondered why the 'excluding' number seemed a bit higher than any given reporting interval numbers, over a 5 minute run.  I now understand why.  NOTE: When the system hits 100% cpu utilization(after about the first 100 connections), on a fully cached Select only pgbench, further connections can struggle to get connected which really skews the results.


How about a patch which offered the option to wait on an advisory lock as a mechanism to let the main thread delay the start of the workload after all clients have connected and entered a READY state?  This would produce a much cleaner number.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ian Barwick
Дата:
Сообщение: Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Prevent invalid memory access in LookupFuncName