Re: pgbench logging broken by time logic changes
| От | Fabien COELHO | 
|---|---|
| Тема | Re: pgbench logging broken by time logic changes | 
| Дата | |
| Msg-id | alpine.DEB.2.22.394.2106230802100.1117163@pseudo обсуждение исходный текст | 
| Ответ на | Re: pgbench logging broken by time logic changes (Michael Paquier <michael@paquier.xyz>) | 
| Ответы | Re: pgbench logging broken by time logic changes | 
| Список | pgsql-hackers | 
Hello,
> +# note: this test is time sensitive, and may fail on a very
> +#       loaded host.
> +# note: --progress-timestamp is not tested
> +my $delay = pgbench(
> +    '-T 2 -P 1 -l --aggregate-interval=1 -S -b se@2'
> +    . ' --rate=20 --latency-limit=1000 -j ' . $nthreads
> +    . ' -c 3 -r',
> +    0,
> +    [   qr{type: multiple},
> +        qr{clients: 3},
> +        qr{threads: $nthreads},
> +        qr{duration: 2 s},
> +        qr{script 1: .* select only},
> +        qr{script 2: .* select only},
> +        qr{statement latencies in milliseconds},
> +        qr{FROM pgbench_accounts} ],
> +    [ qr{vacuum}, qr{progress: 1\b} ],
> +    'pgbench progress', undef,
> +    "--log-prefix=$bdir/001_pgbench_log_1");
> Could we make that shorter at 1s?  That will shorten the duration of
> the test run.  It is easy to miss that this test is for
> --aggregate-interval (aka the logAgg() path) without a comment.
It is for -T, -P and --aggregate-interval. The units of all these is 
seconds, the minimum is 1, I put 2 so that It is pretty sure to get some 
output. We could try 1, but I'm less confident that an output is ensured 
in all cases on a very slow host which may decide to shorten the run 
before having shown a progress for instance.
> +# cool check that we are around 2 seconds
> +# The rate may results in an unlucky schedule which triggers
> +# an early exit, hence the loose bound.
> +ok(0.0 < $delay && $delay < 4.0, "-T 2 run around 2 seconds");
>
> The command itself would not fail, but we would just fail on a machine
> where the difference in start/stop time is higher than 4 seconds,
> right?
Yep. It could detect a struck pgbench process which was one of the 
reported issue. Maybe there should be a timeout added.
> On RPI-level machines, this could fail reliably.
Dunno, Not sure what RPI means. Probably not "Retail Price Index"… maybe 
Rasberry-Pi? In that case, the O-4 second leeway is intended to be loose 
enough to accomodate such hosts, but I cannot test that.
> I am not completely sure what's the additional value we can get from 
> that extra test, to be honest.
This would be to detect a somehow stuck process. It could be looser if 
necessary. Or removed, or preferably commented out, or enabled with some 
options (eg an environment variable? configure option?). Such control 
could also skip all 3 calls, in which case the 2 seconds is not an issue.
> +# $nthreads threads, 2 seconds, but due to timing imprecision we might get
> +# only 1 or as many as 3 progress reports per thread.
> +check_pgbench_logs($bdir, '001_pgbench_log_1', $nthreads, 1, 3,
> +    qr{^\d{10,} \d{1,2} \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+$});
> +
> Now this one is good and actually stable thanks to the fact that we'd
> get at least the final logs, and the main complain we got to discuss
> about on this thread was the format of the logs.
Yep, this test would have probably detected the epoch issue reported by 
Greg.
> I would say to give up on the first test, and keep the second.
You mean remove the time check and keep the log check. I'd like to keep a 
time check, or at least have it in comment so that I can enable it simply.
> But, is this regex correct?  Shouldn't we check for six integer fields 
> only with the first one having a minimal number of digits for the epoch?
Epoch (seconds since 1970-01-01?) is currently 10 digits. Not sure how 
well it would work if some host have another zero start date.
Given the options of the bench run, there are that many fields in the 
log output, I'm not sure why we would want to check for less?
-- 
Fabien.
		
	В списке pgsql-hackers по дате отправления: