Re: pgbench progress report improvements
От | Noah Misch |
---|---|
Тема | Re: pgbench progress report improvements |
Дата | |
Msg-id | 20130921044353.GA395477@tornado.leadboat.com обсуждение исходный текст |
Ответ на | pgbench progress report improvements (Fabien <coelho@cri.ensmp.fr>) |
Ответы |
Re: pgbench progress report improvements
(Fabien COELHO <coelho@cri.ensmp.fr>)
Re: pgbench progress report improvements - split 1 (Fabien COELHO <coelho@cri.ensmp.fr>) Re: pgbench progress report improvements - split 2 (Fabien <coelho@cri.ensmp.fr>) Re: pgbench progress report improvements - split 3 (Fabien <coelho@cri.ensmp.fr>) |
Список | pgsql-hackers |
For others following along, Pavel Stehule reviewed this on a different thread: http://www.postgresql.org/message-id/CAFj8pRAVYUhv8x3fEEwaSGGvDcijOgVLZSAVd7rmWn9Mw8RANA@mail.gmail.com On Tue, Aug 06, 2013 at 10:47:14AM +0200, Fabien wrote: > Improve pgbench measurements & progress report These changes are loosely coupled; please separate them into several patch files: > - Use progress option both under init & bench. > > Activate progress report by default, every 5 seconds. > When initializing, --quiet reverts to the old every 100,000 insertions > behavior... Patch (1): Change the default from --progress=0 to --progress=5 This has a disadvantage of causing two extra gettimeofday() calls per transaction. That's not cheap on all platforms; a user comparing pgbench results across versions will need to make a point of disabling this again to make his results comparable. Given that threat and uncertainty about which default would be more popular, I think we should drop this part. Patch (2): Make --initialize mode respect --progress. The definition you've chosen for --quiet makes that option contrary to its own name: message-per-100k-tuples is typically more voluminous than your proposed new default of message-per-5s. In fact, since --quiet currently switches from message-per-100k-tuples to message-per-5s, your patch gives it the exact opposite of its present effect. During the 9.3 development cycle, we _twice_ made changes pertaining to --initialize message frequency: http://www.postgresql.org/message-id/flat/20120620.170427.347012755716399568.t-ishii@sraoss.co.jp http://www.postgresql.org/message-id/flat/1346472039.18010.10.camel@vanquo.pezone.net That gives me pause about working through yet another change; we keep burning time on this minor issue. > - Measure transaction latency instead of computing it. > > The previous computed figure does not make sense under throttling, > as sleep throttling time was included in the figures. > The latency and its standard deviation are printed out under progress > and in the final report. Patch (3): Report the standard deviation of latency. Seems potentially useful; see inline comments below. In my limited testing, the interesting latency cases involved stray clusters of transactions showing 10x-100x mean latency. If that's typical, I'd doubt mean/stddev is a great analytical tool. But I have little reason to believe that my experience here was representative. Patch (4): Redefine "latency" as reported by pgbench and report "lag" more. If we do this, should we also augment the --log output to contain the figures necessary to reconstruct this calculation? I think that would mean adding fields for the time when the first command of the transaction was sent. > > - Take thread start time at the beginning of the thread. > > Otherwise it includes pretty slow thread/fork system start times in > the measurements. May help with bug #8326. Patch (5) That theory is sound, but I would like at least one report reproducing that behavior and finding that this change improved it. > - Reduce and compensate throttling underestimation bias. > > This is a consequence of relying on an integer random generator. > It was about 0.5% with 1000 distinct values. Multiplier added to > compensate the 0.05% bias with 10000 distinct integer values. Patch (6) See inline comment below. > - Updated documentation & help message. Include any relevant documentation and --help updates with the patch necessitating them. If there are errors in the documentation today, put fixes for those in a separate patch (7). Additionally, this patch contains numerous whitespace-only changes. Do not mix those into a patch that makes a functional change. Most of them would be done or undone by the next pgindent run, so I suggest simply dropping them. See the first entry here: https://wiki.postgresql.org/wiki/Submitting_a_Patch#Reasons_your_patch_might_be_returned Patches (5) and (6) are nicely small and free of UI questions. At a minimum, let's try to get those done in this CommitFest. I propose dropping patches (1) and (2). Patches (3) and (4) have open questions, but I think they have good value potential. > Sample output under benchmarking with --progress=1 > progress: 36.0 s, 10.9 tps, 91.864 +- 124.874 ms lat > progress: 37.0 s, 115.2 tps, 8.678 +- 1.792 ms lat "x +- y" does not convey that y is a standard deviation. I suggest getting the term "stddev" in there somehow, maybe like this: progress: 37.0 s, 115.2 tps, latency avg=8.678 ms stddev=1.792 > --- a/contrib/pgbench/pgbench.c > +++ b/contrib/pgbench/pgbench.c > - " -R, --rate=SPEC target rate in transactions per second\n" > + " -R, --rate=NUM target rate in transactions per second\n" This would fall under patch (7) if you feel it's needed. > @@ -928,14 +931,18 @@ top: > * Use inverse transform sampling to randomly generate a delay, such > * that the series of delays will approximate a Poisson distribution > * centered on the throttle_delay time. > - * > - * 1000 implies a 6.9 (-log(1/1000)) to 0.0 (log 1.0) delay multiplier. > + * > + * 10000 implies a 9.2 (-log(1/10000)) to 0.0 (log 1) delay multiplier, > + * and results in a 0.055 % target underestimation bias: > + * > + * SELECT 1.0/AVG(-LN(i/10000.0)) FROM generate_series(1,10000) AS i; > + * = 1.000552717032611116335474 > * > * If transactions are too slow or a given wait is shorter than > * a transaction, the next transaction will start right away. > */ > - int64 wait = (int64) > - throttle_delay * -log(getrand(thread, 1, 1000)/1000.0); > + int64 wait = (int64) (throttle_delay * > + 1.00055271703 * -log(getrand(thread, 1, 10000)/10000.0)); This quantizes the random number, then dequantizes it, then corrects for the resolution lost in the round trip. Shouldn't it just be: int64 wait = (int64) (throttle_delay * -log(1 - pg_erand48(thread->random_state))); > - if ((!use_quiet) && (j % 100000 == 0)) > + if (!progress && (j % 100000 == 0)) Project style is "progress == 0" rather than "!progress". > @@ -2143,8 +2174,15 @@ printResults(int ttype, int normal_xacts, int nclients, > * transaction. The measured lag may be caused by thread/client load, > * the database load, or the Poisson throttling process. > */ > - printf("average rate limit schedule lag: %.3f ms (max %.3f ms)\n", > - 0.001 * throttle_lag / normal_xacts, 0.001 * throttle_lag_max); > + double latency = 0.001 * total_latencies / normal_xacts; > + double sqlat = (double) total_sqlats / normal_xacts; > + > + printf("average latency under rate limit: %.3f +- %.3f ms\n" > + "average rate limit lag: %.3f ms (max %.3f ms)\n", > + latency, > + 0.001 * sqrt(sqlat - 1000000*latency*latency), > + 0.001 * throttle_lag / normal_xacts, > + 0.001 * throttle_lag_max); This output only happens when using --rate. That has made sense heretofore, because in the absence of rate limiting, average latency can be trivially calculated from tps. Since we will now track stddev as well, let's print something like this even when --rate is not in use. > - float tps, total_run, latency; > + double tps, total_run, latency, sqlat, stdev, lag; Why these variable type changes? nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: