Обсуждение: pgbench small bug fix
While testing for something else I encountered two small bugs under very low rate (--rate=0.1). The attached patches fixes these. - when a duration (-T) is specified, ensure that pgbench ends at that time (i.e. do not wait for a transaction beyondthe end of the run). - when there is a progress (-P) report, ensure that all progress reports are shown even if no more transactions are schedule. -- Fabien.
On Wed, Jan 27, 2016 at 2:31 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > While testing for something else I encountered two small bugs under very low > rate (--rate=0.1). The attached patches fixes these. > > - when a duration (-T) is specified, ensure that pgbench ends at that > time (i.e. do not wait for a transaction beyond the end of the run). Why does this use INSTR_TIME_GET_DOUBLE() and not INSTR_TIME_GET_MICROSEC()? Also, why do we really need this change? Won't the timer expiration stop the thread at the right time anyway? I mean, sure, in theory it's wasteful for the thread to sit around doing nothing waiting for the timer to expire, but it's not evident to me that hurts anything, really. > - when there is a progress (-P) report, ensure that all progress > reports are shown even if no more transactions are schedule. That's pretty ugly - it would be easy for the test at the top of the loop to be left out of sync with the similar test inside the loop by some future patch. And again, I wonder why this is really a bug. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
<Ooops, resent, wrong "From" again... sorry :-( > Hello Robert, > > While testing for something else I encountered two small bugs under very low > > rate (--rate=0.1). The attached patches fixes these. > > > > - when a duration (-T) is specified, ensure that pgbench ends at that > > time (i.e. do not wait for a transaction beyond the end of the run). > > Why does this use INSTR_TIME_GET_DOUBLE() and not INSTR_TIME_GET_MICROSEC()? I choose that because duration is in seconds, but MICROSEC would work fine as well. See attached version. > Also, why do we really need this change? Won't the timer expiration > stop the thread at the right time anyway? Not always. When several threads are used, the time expiration in non-zero threads is currently triggered after the last schedule transaction, even if this transaction is long after the expiration, which means it runs overtime: sh> time pgbench -T 5 -R 0.1 -P 1 -c 2 -j 2 starting vacuum...end. progress: 1.0 s, 0.0 tps, lat -nan ms stddev -nan, lag-nan ms progress: 2.0 s, 1.0 tps, lat 12.129 ms stddev 0.000, lag 1.335 ms progress: 3.0 s, 0.0 tps, lat -nan ms stddev-nan, lag -nan ms progress: 4.0 s, 0.0 tps, lat -nan ms stddev -nan, lag -nan ms progress: 5.0 s, 0.0 tps, lat -nanms stddev -nan, lag -nan ms < 6 seconds wait for a schedule transaction in thread 1 no report because thread 0 expired...>transaction type: TPC-B (sort of) scaling factor: 1 query mode: simple number of clients: 2 number of threads:2 duration: 5 s number of transactions actually processed: 2 latency average: 14.648 ms latency stddev: 2.518 msrate limit schedule lag: avg 5.598 (max 9.861) ms tps = 0.180517 (including connections establishing) tps = 0.180566 (excludingconnections establishing) real 0m11.158s ^^ 11 seconds, not 5. user 0m0.041s sys 0m0.013s > I mean, sure, in theory it's wasteful for the thread to sit around doing > nothing waiting for the timer to expire, but it's not evident to me that hurts > anything, really. I compute stats on the output, including the progress report, to check for responsiveness (eg it is not stuck somewhere because of a checkpoint which triggered some IO storm or whatever), for that purpose it is better for the trace to be there as expected. > > - when there is a progress (-P) report, ensure that all progress > > reports are shown even if no more transactions are schedule. > > That's pretty ugly - it would be easy for the test at the top of the > loop to be left out of sync with the similar test inside the loop by > some future patch. Hmmm. Cannot help it. A cleaner fix would be to have the main thread do only the progress and periodic stats aggregation, while other threads would do actual transactions, however that would break pgbench working without threads, so I think this is a no go. > And again, I wonder why this is really a bug. Well, if you are fine to set "-T 5 -P 1" and the run not to last 5 seconds nor print any report every second, then it is not a bug: sh> time ./pgbench -T 5 -P 1 -R 0.1 -c 2 -j 2 starting vacuum...end. < UNLUCKY, NO PROGRESS REPORT AT ALL...> transactiontype: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 2 number of threads:2 duration: 5 s number of transactions actually processed: 1 latency average = 16.198 ms latency stddev = 0.000 msrate limit schedule lag: avg 4.308 (max 4.308) ms tps = 0.236361 (including connections establishing) tps = 0.236771 (excludingconnections establishing) real 0m4.256s -- Fabien.
> time pgbench -T 5 -R 0.1 -P 1 -c 2 -j 2 On my laptop this command executes 25 seconds instead of 5. I'm pretty sure it IS a bug. Probably a minor one though. I tested this patch on Ubuntu 14.04 LTS with GCC 4.8. It applies cleanly on master branch (c7111d11) and solves a described problem. No compilation warnings. Everything else works as before. Still I wonder if code could be patched more cleanly. Instead of: if(someint) if(somebool) ... you should probably write: if(someint > 0) if(somebool == TRUE) Also I suggest to introduce a few new boolean variables with meaningful names instead of writing all these long expressions right inside of if( ... ). As a side note I noticed that pgbench.c is not pgindent'ed. Since you are modifying this file anyway probably you cold solve this issue too? As a separate patch perhaps.
On Thu, Mar 3, 2016 at 7:23 AM, Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote: >> time pgbench -T 5 -R 0.1 -P 1 -c 2 -j 2 > > On my laptop this command executes 25 seconds instead of 5. I'm pretty > sure it IS a bug. Probably a minor one though. > > I tested this patch on Ubuntu 14.04 LTS with GCC 4.8. It applies > cleanly on master branch (c7111d11) and solves a described problem. > No compilation warnings. Everything else works as before. > > Still I wonder if code could be patched more cleanly. Instead of: > > if(someint) > if(somebool) > > ... you should probably write: > > if(someint > 0) > if(somebool == TRUE) I think our usual style is to test Booleans directly; that is if (somebool). But for other types, we typically include an explicit comparison, like if (someint != 0) or if (someint > 0). > As a side note I noticed that pgbench.c is not pgindent'ed. Since you > are modifying this file anyway probably you cold solve this issue too? > As a separate patch perhaps. That's not really this patch's job. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello Aleksander,
Thanks for the look at the patch.
>> time pgbench -T 5 -R 0.1 -P 1 -c 2 -j 2
>
> On my laptop this command executes 25 seconds instead of 5.
> I'm pretty sure it IS a bug. Probably a minor one though.
Sure.
> [...] you should probably write:
>
> if(someint > 0)
Ok.
> if(somebool == TRUE)
I like "if (somebool)", the "== TRUE" looks like a tautology, and the 
short version is also the current practice in the project.
> Also I suggest to introduce a few new boolean variables with meaningful
> names instead of writing all these long expressions right inside of
> if( ... ).
I agree about the lisibility, but there are semantics issues to consider:
    if (short-A && pretty-long-B)
If short-A is false then pretty-long-B is not evaluated, which is a win 
because it also costs, I try to order conditions... If I move 
pretty-long-B before then the cost is always paid. Now I could write:
    if (short-A) {      bool b = pretty-long-B;      if (b) {        ...
But this looks contrived and people would raise other questions about such
a strange construct for implementing && in 3 lines, 2 if and 1 variable...
> As a side note I noticed that pgbench.c is not pgindent'ed. Since you
> are modifying this file anyway probably you cold solve this issue too?
> As a separate patch perhaps.
As Robert said, not the purpose of this patch.
Attached is a v3 which test integers more logically. I'm a lazy programmer 
who tends to minimize the number of key strokes.
-- 
Fabien.
			
		> Attached is a v3 which test integers more logically. I'm a lazy > programmer who tends to minimize the number of key strokes. Well. From what I can tell this patch is Ready for Committer.
Aleksander Alekseev wrote: > > Attached is a v3 which test integers more logically. I'm a lazy > > programmer who tends to minimize the number of key strokes. > > Well. From what I can tell this patch is Ready for Committer. I'm not a fan of this approach either. Would it be too complicated if we had a global variable that indicates which thread is the progress reporter? We start that with thread 0, but if the reporter thread finishes its transactions then it elects some other thread which hasn't yet finished. For this to work, each thread would have to maintain in a global variable whether it has finished or not. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello Alvaro, >>> Attached is a v3 which test integers more logically. I'm a lazy >>> programmer who tends to minimize the number of key strokes. >> >> Well. From what I can tell this patch is Ready for Committer. > > I'm not a fan of this approach either. Would it be too complicated if > we had a global variable that indicates which thread is the progress > reporter? We start that with thread 0, but if the reporter thread > finishes its transactions then it elects some other thread which hasn't > yet finished. For this to work, each thread would have to maintain in a > global variable whether it has finished or not. Hmmm. Probably it is possible, but it will sure need more that one little condition to be achieved... I do not think that introducing a non trivial distributed election algorithm involving locks and so would be a good decision for this very little matter. My advice is "keep it simple". If this is a blocker, I can sure write such an algorithm, when I have some spare time, but I'm not sure that the purpose is worth it. -- Fabien.
Fabien COELHO wrote: > Probably it is possible, but it will sure need more that one little > condition to be achieved... I do not think that introducing a non trivial > distributed election algorithm involving locks and so would be a good > decision for this very little matter. > > My advice is "keep it simple". > > If this is a blocker, I can sure write such an algorithm, when I have some > spare time, but I'm not sure that the purpose is worth it. You're probably right, but TBH I'm pretty unsure about this whole thing. I will leave it alone for the time being. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>> Probably it is possible, but it will sure need more that one little >> condition to be achieved... I do not think that introducing a non trivial >> distributed election algorithm involving locks and so would be a good >> decision for this very little matter. >> >> My advice is "keep it simple". >> >> If this is a blocker, I can sure write such an algorithm, when I have some >> spare time, but I'm not sure that the purpose is worth it. > > You're probably right, but TBH I'm pretty unsure about this whole thing. If the question is "is there a bug", then answer is yes. The progress report may disappear if thread 0 happens to stop, even of all other threads go on. Obviously it only concerns slow queries, but there is no reason why pgbench should not work with slow queries. I can imagin good reason to do that, say to check the impact of such queries on an OLTP load. The bug can be kept instead, and it can be called a feature. > I will leave it alone for the time being. Maybe you could consider pushing the first part of the patch, which stops if a transaction is scheduled after the end of the run? Or is this part bothering you as well? -- Fabien.
Fabien COELHO wrote: > > >>Probably it is possible, but it will sure need more that one little > >>condition to be achieved... I do not think that introducing a non trivial > >>distributed election algorithm involving locks and so would be a good > >>decision for this very little matter. > >> > >>My advice is "keep it simple". > >> > >>If this is a blocker, I can sure write such an algorithm, when I have some > >>spare time, but I'm not sure that the purpose is worth it. > > > >You're probably right, but TBH I'm pretty unsure about this whole thing. > > If the question is "is there a bug", then answer is yes. The progress report > may disappear if thread 0 happens to stop, even of all other threads go on. > Obviously it only concerns slow queries, but there is no reason why pgbench > should not work with slow queries. I can imagin good reason to do that, say > to check the impact of such queries on an OLTP load. > > The bug can be kept instead, and it can be called a feature. No, I agree that this looks like a bug and that we should fix it; for example, if all connections from thread 0 terminate for some reason, there will be no more reports, even if the other threads continue. That's bad too. What I'm unsure about is the proposed fix. > >I will leave it alone for the time being. > > Maybe you could consider pushing the first part of the patch, which stops if > a transaction is scheduled after the end of the run? Or is this part > bothering you as well? So there are *two* bugs here? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>> You're probably right, but TBH I'm pretty unsure about this whole thing. >> >> If the question is "is there a bug", then answer is yes. The progress report >> may disappear if thread 0 happens to stop, even of all other threads go on. >> Obviously it only concerns slow queries, but there is no reason why pgbench >> should not work with slow queries. I can imagin good reason to do that, say >> to check the impact of such queries on an OLTP load. >> >> The bug can be kept instead, and it can be called a feature. > > No, I agree that this looks like a bug and that we should fix it; for > example, if all connections from thread 0 terminate for some reason, > there will be no more reports, even if the other threads continue. > That's bad too. > > What I'm unsure about is the proposed fix. > >>> I will leave it alone for the time being. >> >> Maybe you could consider pushing the first part of the patch, which stops if >> a transaction is scheduled after the end of the run? Or is this part >> bothering you as well? > > So there are *two* bugs here? Hmmm... AFAICR, maybe fixing the first creates the second issue, i.e. maybe the second issue is currently hidden by the thread going on after the end of the run, so the second is just a latent bug that cannot be encountered. I'm not sure whether I'm very clear:-) -- Fabien.
On Wed, Jan 27, 2016 at 2:31 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > - when a duration (-T) is specified, ensure that pgbench ends at that > time (i.e. do not wait for a transaction beyond the end of the run). Every other place where doCustom() returns false is implemented as return clientDone(...). I think this should probably do the same. I also think that we should probably store the end time someplace instead of recomputing it in multiple places (this patch adds two such places). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Wed, Jan 27, 2016 at 2:31 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> - when a duration (-T) is specified, ensure that pgbench ends at that >> time (i.e. do not wait for a transaction beyond the end of the run). > > Every other place where doCustom() returns false is implemented as > return clientDone(...). I think this should probably do the same. Why not. clientDone() second arguments is totally ignored, I put true because it looks better. > I also think that we should probably store the end time someplace > instead of recomputing it in multiple places (this patch adds two such > places). Why not. Attached is a v4. -- Fabien.
On Wed, Mar 9, 2016 at 4:12 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> On Wed, Jan 27, 2016 at 2:31 PM, Fabien COELHO <coelho@cri.ensmp.fr> >> wrote: >>> >>> - when a duration (-T) is specified, ensure that pgbench ends at that >>> time (i.e. do not wait for a transaction beyond the end of the run). >> >> >> Every other place where doCustom() returns false is implemented as >> return clientDone(...). I think this should probably do the same. > > > Why not. clientDone() second arguments is totally ignored, I put true > because it looks better. > >> I also think that we should probably store the end time someplace >> instead of recomputing it in multiple places (this patch adds two such >> places). > > > Why not. > > Attached is a v4. OK, I've committed the fix for the -T part. It didn't back-patch cleanly, and it is a minor bug, so I'm not inclined to worry about it further. I didn't commit the fix for the -P part, because Alvaro objected to the proposed method of fixing it as ugly, and I think he's right. Unless you can come up with a nicer-looking fix, I think that part is going to stay unfixed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> OK, I've committed the fix for the -T part. It didn't back-patch > cleanly, and it is a minor bug, so I'm not inclined to worry about it > further. I agree that it is a very minor bug and not necessary worth back-patching. > I didn't commit the fix for the -P part, because Alvaro objected to > the proposed method of fixing it as ugly, and I think he's right. > Unless you can come up with a nicer-looking fix, I think that part is > going to stay unfixed. A bug kept on esthetical ground, that's a first! -- Fabien.