Hello Kuroda-san,
On Fri, 11 Jun 2021 08:58:45 +0000
"kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com> wrote:
> Hi Hackers,
>
> I played pgbench with wrong parameters, and I found bug-candidate.
> The latest commit in my source is 3a09d75.
>
> 1. Do initdb and start.
> 2. Initialize schema and data with "scale factor" = 1.
> 3. execute following command many times:
>
> $ pgbench -c 101 -j 10 postgres
>
> Then, sometimes the negative " initial connection time" was returned.
> Lateyncy average is also strange.
>
> ```
> $ pgbench -c 101 -j 10 postgres
> starting vacuum...end.
> pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL: sorry, too many clients already
> pgbench (PostgreSQL) 14.0
> transaction type: <builtin: TPC-B (sort of)>
> scaling factor: 1
> query mode: simple
> number of clients: 101
> number of threads: 10
> number of transactions per client: 10
> number of transactions actually processed: 910/1010
> latency average = 41387686.662 ms
> initial connection time = -372896921.586 ms
> tps = 0.002440 (without initial connection time)
> ```
>
> I sought pgbench.c and found a reason.
> When a thread failed to get some connections, they do not fill any values to thread->bench_start in threadRun().
> And if the failure is caused in the final thread (this means threads[nthreads - 1]->bench_start is zero),
> the following if-statement sets bench_start to zero.
>
> ```
> 6494 /* first recorded benchmarking start time */
> 6495 if (bench_start == 0 || thread->bench_start < bench_start)
> 6496 bench_start = thread->bench_start;
> ```
>
> The wrong bench_start propagates to printResult() and then some invalid values are appered.
>
> ```
> 6509 printResults(&stats, pg_time_now() - bench_start, conn_total_duration,
> 6510 bench_start - start_time, latency_late);
> ```
>
> I cannot distinguish whether we have to fix it, but I attache the patch.
> This simply ignores a result when therad->bench_start is zero.
+ /* skip if the thread faild to get connection */
+ if (thread->bench_start == 0)
+ continue;
It detects if a thread failed to get the initial connection by thread->bench_start == 0, but this assumes the initial
valueis zero. For ensuring this, I think it is better to initialize it in an early state, for example like this.
@@ -6419,6 +6419,7 @@ main(int argc, char **argv)
initRandomState(&thread->ts_throttle_rs);
initRandomState(&thread->ts_sample_rs);
thread->logfile = NULL; /* filled in later */
+ thread->bench_start = 0;
thread->latency_late = 0;
initStats(&thread->stats, 0)
typo: faild -> failed
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>