Re: pgbench prints suspect tps numbers

Поиск
Список
Период
Сортировка
От Fabien COELHO
Тема Re: pgbench prints suspect tps numbers
Дата
Msg-id alpine.DEB.2.21.1906250655110.2477@lancre
обсуждение исходный текст
Ответ на pgbench prints suspect tps numbers  (Daniel Wood <hexexpert@comcast.net>)
Список pgsql-hackers
Hello Daniel,

> I want to measure TPS at a particular connection count. [...]
>
> pgbench's "including connections establishing" number is polluted by 
> fact that for many seconds the benchmark is running with less than the 
> expected number of connections.  I thought that was why the 'excluding' 
> number was also printed and I had been relying on that number.
>
> pgbench's "excluding connections establishing" number seems to be a 
> total garbage number which can be way way over the actual tps.  During a 
> period when I had a bug causing slow connections I noticed a consistent 
> value of about 100K tps over the measurement intervals.  At the end of a 
> 5 minute run it reported 450K tps!  There was no point anywhere during 
> the benchmark that it ran anywhere near that number.

Could you report the precise version, settings and hardware?

In particular, how many threads, clients and what is the underlying 
hardware?

Are you reconnecting on each transaction?

> I had been using 'excluding' because it 'seemed' perhaps right in the 
> past.  It was only when I got crazy numbers I looked at the calculation 
> to find:
>
>    tps_exclude = total->cnt / (time_include - (INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients));
>
>
> The 'cnt' is the total across the entire run including the period when 
> connections are ramping up.

Yep. The threads are running independently, so there is no p

> I don't see how dividing by the total time minus the average connection 
> time produces the correct result.

The above formula looks okay to me, at least at 7AM:-) Maybe the variable 
could be given better names.

> Even without buggy slow connections, when connecting 1000 clients,

That is a lot. Really.

> I've wondered why the 'excluding' number seemed a bit higher than any 
> given reporting interval numbers, over a 5 minute run.  I now understand 
> why.  NOTE: When the system hits 100% cpu utilization(after about the 
> first 100 connections),

Obviously.

> on a fully cached Select only pgbench, further connections can struggle 
> to get connected which really skews the results.

Sure, with 1000 clients the system can only by highly overloaded.

> How about a patch which offered the option to wait on an advisory lock 
> as a mechanism to let the main thread delay the start of the workload 
> after all clients have connected and entered a READY state?  This would 
> produce a much cleaner number.

A barrier could be implemented, but it should be pretty useless because 
without reconnections the connection time is expected to be negligeable.

-- 
Fabien



В списке pgsql-hackers по дате отправления:

Предыдущее
От: yuzuko
Дата:
Сообщение: Re: Problem with default partition pruning
Следующее
От: Prabhat Sahu
Дата:
Сообщение: Re: tableam vs. TOAST