pgbench calculates summary numbers a wrong way.
| От | Kyotaro Horiguchi |
|---|---|
| Тема | pgbench calculates summary numbers a wrong way. |
| Дата | |
| Msg-id | 20200917.174154.1612932702426612205.horikyota.ntt@gmail.com обсуждение исходный текст |
| Ответы |
Re: pgbench calculates summary numbers a wrong way.
|
| Список | pgsql-hackers |
Hello.
While running pgbench, I found tps excluding connection somewhat
strange. To emphasize the behavior, I inserted some sleep at the end
of doConnect() and ran pgbench with several times.
$ pgbench -T 5 -j 1 -c 4 (with 1s sleep in doConnect())
> number of transactions actually processed: 409
! latency average = 48.980 ms
> tps = 81.665671 (including connections establishing)
> tps = 102.323285 (excluding connections establishing)
$ pgbench -T 5 -j 4 -c 4 (with 1s sleep in doConnect())
> number of transactions actually processed: 1628
! latency average = 12.303 ms
> tps = 325.123448 (including connections establishing)
> tps = 422.671015 (excluding connections establishing)
Why the latencies are so different? If I tell pgbench to measure the
latencies, it is about 8-9ms in both cases.
$ pgbench -T 5 -j 1 -c 4 -L100 (with 1s sleep in doConnect())
> number of transactions actually processed: 419
> number of transactions above the 100.0 ms latency limit: 0/419 (0.000 %)
! latency average = 8.984 ms
> latency stddev = 3.259 ms
> tps = 83.633694 (including connections establishing)
> tps = 104.909130 (excluding connections establishing)
$ pgbench -T 5 -j 4 -c 4 -L100 (with 1s sleep in doConnect())
> number of transactions actually processed: 2047
> number of transactions above the 100.0 ms latency limit: 0/2047 (0.000 %)
! latency average = 7.807 ms
> latency stddev = 2.916 ms
> tps = 408.677024 (including connections establishing)
> tps = 511.837509 (excluding connections establishing)
That's reasonable. So the latency estimation is apparently a kind of
bogus. It seems to me that the calculation is wrong to use
time_include. I think it should use time_exclude instead.
$ pgbench -T 5 -j 1 -c 4 (with 1s sleep in doConnect())
> number of transactions actually processed: 407
! latency average = 11.674 ms
> tps = 81.262992 (including connections establishing)
> tps = 101.724507 (excluding connections establishing)
>
$ pgbench -T 5 -j 4 -c 4 (with 1s sleep in doConnect())
> number of transactions actually processed: 1718
! latency average = 11.674 ms
> tps = 342.652333 (including connections establishing)
> tps = 428.850338 (excluding connections establishing)
Looks far better, but "tps excluding connection establishing" looks
strange. The reason seems to me threadRun() counts connection time a
wrong way. thread->conn_time is assumed to be the sum of the time took
by each client until it runs the first transaction. If each
connection took 1 second, conn_time of a thread that runs three
clients is calculated as 3 seconds. But in reality, if each connection
took 1 second, all clients on the thread waits for 3 seconds before
starting a transaction so it should be 3s * 3(clients/thread) = 9
seconds.
We could multiply conn_time by nstate on each thread, but the same
happens for -C (is_connect) mode. When a client took 1 second to
establish a connection, no other clients can do anything meanwhile.
Somewhat different from the !is_connect case precisely, but basically
the same at the point that making a connection of a client actually
blocks all other clients on the same thread.
Therefore I changed nclients in the calculation of time_exclude to
nthreads.
$ pgbench -T 5 -j 1 -c 4 (with 1s sleep in doConnect())
> number of transactions actually processed: 399
> latency average = 9.402 ms
> tps = 79.671581 (including connections establishing)
> tps = 425.460179 (excluding connections establishing)
$ pgbench -T 5 -j 4 -c 4 (with 1s sleep in doConnect())
> number of transactions actually processed: 1903
> latency average = 8.330 ms
> tps = 379.944770 (including connections establishing)
> tps = 480.211639 (excluding connections establishing)
Looks fine. Also the -C cases looks sainer.
(- is unpatched, + is patched)
> $ pgbench -T 5 -j 1 -c 4 -C (with 100ms sleep in doConnect())
### time_include=5.1, conn_total_time=4.9
- number of transactions actually processed: 43
- latency average = 469.850 ms
- tps = 8.513360 (including connections establishing)
- tps = 11.246110 (excluding connections establishing)
+ number of transactions actually processed: 38
+ latency average = 16.735 ms
+ tps = 7.493800 (including connections establishing)
+ tps = 239.022345 (excluding connections establishing)
> $ pgbench -T 5 -j 4 -c 4 -C (with 100ms sleep in doConnect())
### time_include=5.1, conn_total_time=18.3
-number of transactions actually processed: 172
-latency average = 117.571 ms
-tps = 34.021873 (including connections establishing)
-tps = 346.296591 (excluding connections establishing)
+ number of transactions actually processed: 172
+ latency average = 11.671 ms
+ tps = 33.851599 (including connections establishing)
+ tps = 342.716081 (excluding connections establishing)
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From 117118699e24085db209ab66698cb33c7a70fd21 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Date: Thu, 17 Sep 2020 17:20:10 +0900
Subject: [PATCH] Fix latency and tps calculation of pgbench
Fix the calculation for "latency average" and the "tps excluding
connections establishing" not to wrongly affected by connections
establishing time.
---
src/bin/pgbench/pgbench.c | 25 ++++++++++++++++++++++---
1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 332eabf637..adb522ed77 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -5194,16 +5194,35 @@ printResults(StatsData *total, instr_time total_time,
instr_time conn_total_time, int64 latency_late)
{
double time_include,
+ time_exclude,
tps_include,
tps_exclude;
int64 ntx = total->cnt - total->skipped;
time_include = INSTR_TIME_GET_DOUBLE(total_time);
+ /*
+ * conn_total_time is the sum of the time each clients took to establish a
+ * connection. In the multi-threaded case, all clients run on a thread
+ * waits for all the clients to establish a connection. So the actual total
+ * connection time of a thread is thread->conn_time * thread->nstate. So
+ * the total time took for connection establishment is:
+ *
+ * sum(thread->conn_time * thread->nstate) / nclients
+ *
+ * Assuming all client took the same time to connection establishment and
+ * clients are distributed equally to threads, the expression is
+ * approximated as:
+ *
+ * thread->conn_time * (nclients/nthreads) / nclients
+ * = conn_total_time / nthreads
+ */
+ time_exclude = (time_include - (INSTR_TIME_GET_DOUBLE(conn_total_time) /
+ nthreads));
+
/* tps is about actually executed transactions */
tps_include = ntx / time_include;
- tps_exclude = ntx /
- (time_include - (INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients));
+ tps_exclude = ntx / time_exclude;
/* Report test parameters. */
printf("transaction type: %s\n",
@@ -5249,7 +5268,7 @@ printResults(StatsData *total, instr_time total_time,
{
/* no measurement, show average latency computed from run time */
printf("latency average = %.3f ms\n",
- 1000.0 * time_include * nclients / total->cnt);
+ 1000.0 * time_exclude * nclients / total->cnt);
}
if (throttle_delay)
--
2.18.4
В списке pgsql-hackers по дате отправления: