Обсуждение: Why do we still have commit_delay and commit_siblings?
This code is our pre-9.2 group commit implementation, pretty much in its entirety: if (CommitDelay > 0 && enableFsync &&MinimumActiveBackends(CommitSiblings))pg_usleep(CommitDelay); This code is placed directly before the RecordTransactionCommit() call of XLogFlush(). It seeks to create a delay of commit_delay immediately prior to flushing when MinimumActiveBackends(CommitSiblings), in the hope that that delay will be enough that when we do eventually reach XLogFlush(), we can fastpath out of it due to Postgres by then having already flushed up to the XLogRecPtr we need flushed. In this way, commits can piggyback off of each other and there won't be what are effectively duplicate write/fsync requests, or at least that's the idea. It is hardly surprising that the practical advice surrounding commit_delay and commit_siblings is only subtlety different from "don't use in production". There is even a big fat caveat attached to this code in a comment: "This needs work still, because on most Unixen, the minimum select() delay is 10msec or more, which is way too long". The original group commit patch that Simon and I submitted deprecated these settings, and I fail to understand why the committed patch didn't do that too. These days, the only way this code could possibly result in a fastpath is here, at transam/xlog.c:2094, with the potentially stale (and likely too stale to be useful when new group commit is in play) LogwrtResult value: /* done already? */ if (XLByteLE(record, LogwrtResult.Flush)) break; Now, previously, we could also fastpath a bit later, at about transam/xlog.c:2114: /* Got the lock */ LogwrtResult = XLogCtl->LogwrtResult; if (!XLByteLE(record, LogwrtResult.Flush)) { ..... } /* control might have fastpathed and missed the above block. We're done now. */ ...but now control won't even reach here unless we have the exclusive lock on WALWriteLock (i.e. we're the leader backend during 9.2's group commit), so fastpathing out of XLogFlush() becomes very rare in 9.2 . One illuminating way of quickly explaining the new group commit code is that it also inserts a delay at approximately the same place (well, more places now, since the delay was previously inserted only at the xact.c callsite of XLogFlush(), and there are plenty more sites than that), only that delay is now just about perfectly adaptive. That isn't quite the full truth, since we *also* have the benefit of *knowing* that there is an active leader/flusher at all times we're delayed. The unusual semantics of LWLockAcquireOrWait() result in the new 9.2 code almost always just causing a delay for most backends when group commit matters (that is, a delay before they reach their final, "productive" iteration of the for(;;) loop for the commit, usually without them ever actually acquiring the exclusive lock/XlogWrite()'ing). Have I missed something? Why do we keep around this foot-gun that now appears to me to be at best useless and at worst harmful? I can see why the temptation to keep this setting around used to exist, since it probably wasn't too hard to get good numbers from extremely synthetic pgbench runs, but I cannot see why the new adaptive implementation wouldn't entirely shadow the old one even in that situation. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Sun, May 13, 2012 at 7:17 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > Have I missed something? Why do we keep around this foot-gun that now > appears to me to be at best useless and at worst harmful? I can see > why the temptation to keep this setting around used to exist, since it > probably wasn't too hard to get good numbers from extremely synthetic > pgbench runs, but I cannot see why the new adaptive implementation > wouldn't entirely shadow the old one even in that situation. It seems that, with the new code, when there are a lot of people trying to commit very frequently, they tend to divide themselves into two gangs: everybody in one gang commits, then everyone in the other gang commits, then everyone in the first gang commits again, and so on. Assuming that the transactions themselves require negligible processing time, this provides 50% of the theoretically optimum throughput. For example, with two backends, one transaction commits first, and the other transaction must now wait for its WAL flush to complete before initiating its own flush. And it will start its flush the very instant the first transaction completes, before there is adequate time for that backend to complete another transaction. Of course, by the time it finishes its flush, the other transaction will be ready again, so the flush will ping-pong back and forth between those two backends forever, and they will never manage to group commit. Now, potentially, these settings solve this problem, by letting the first backend wait just a very little while for the second backend to also be ready to commit, so that every head rotation commits a transaction in each backend, rather than a transaction in one backend or the other. I'm not sure if it can actually be made to work, but I'm not willing to assume that it can't on the basis of a theoretical argument not involving actual testing. If we get to the point that the commit performance with 100 concurrent clients is 100x the single-client performance rather than 50x the single-client performance, then obviously these settings will have outlived their usefulness. But we're not there yet, so we probably only want to remove these settings if we can demonstrate that they are useless. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 14 May 2012 00:45, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, May 13, 2012 at 7:17 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: >> Have I missed something? Why do we keep around this foot-gun that now >> appears to me to be at best useless and at worst harmful? I can see >> why the temptation to keep this setting around used to exist, since it >> probably wasn't too hard to get good numbers from extremely synthetic >> pgbench runs, but I cannot see why the new adaptive implementation >> wouldn't entirely shadow the old one even in that situation. > > It seems that, with the new code, when there are a lot of people > trying to commit very frequently, they tend to divide themselves into > two gangs: everybody in one gang commits, then everyone in the other > gang commits, then everyone in the first gang commits again, and so > on. Assuming that the transactions themselves require negligible > processing time, this provides 50% of the theoretically optimum > throughput. Keeping a parameter without any clue as to whether it has benefit is just wasting people's time. We don't ADD parameters based on supposition, why should we avoid removing parameters that have no measured benefit? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, May 14, 2012 at 2:07 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Keeping a parameter without any clue as to whether it has benefit is > just wasting people's time. No, arguing that we should remove a parameter because it's useless when you haven't bothered to test whether or not it actually is useless is wasting people's time. > We don't ADD parameters based on supposition, why should we avoid > removing parameters that have no measured benefit? If they have no actual benefit, of course we should remove them. If they have no measured benefit because no one has bothered to measure, that's not a reason to remove them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 14.05.2012 02:17, Peter Geoghegan wrote: > One illuminating way of quickly explaining the new group commit code > is that it also inserts a delay at approximately the same place (well, > more places now, since the delay was previously inserted only at the > xact.c callsite of XLogFlush(), and there are plenty more sites than > that), only that delay is now just about perfectly adaptive. That > isn't quite the full truth, since we *also* have the benefit of > *knowing* that there is an active leader/flusher at all times we're > delayed. The unusual semantics of LWLockAcquireOrWait() result in the > new 9.2 code almost always just causing a delay for most backends when > group commit matters (that is, a delay before they reach their final, > "productive" iteration of the for(;;) loop for the commit, usually > without them ever actually acquiring the exclusive > lock/XlogWrite()'ing). That doesn't seem like an accurate explanation of how the code works. It doesn't insert a deliberate delay anywhere. At a high level, the algorithm is exactly the same as before. However, the new code improves the concurrency of noticing that the WAL has been flushed. If you had a machine where context switches are infinitely fast and has zero contention from accessing shared memory, the old and new code would behave the same. It was an impressive improvement, but the mechanism is completely different from commit_delay and commit_siblings. That said, I wouldn't mind removing commit_delay and commit_siblings. They're pretty much impossible to tune correctly, assuming they work as advertised. Some hard data would be nice, though, as Robert suggested. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 14 May 2012 07:30, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > That said, I wouldn't mind removing commit_delay and commit_siblings. > They're pretty much impossible to tune correctly, assuming they work as > advertised. Some hard data would be nice, though, as Robert suggested. Those parameters were already hard to get any benefit from, even in a benchmark. In a wide range of cases/settings they produce clear degradation. Any thorough testing would involve a range of different hardware types, so its unlikely to ever occur. So lets just move on. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, May 14, 2012 at 8:17 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, May 14, 2012 at 2:07 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Keeping a parameter without any clue as to whether it has benefit is >> just wasting people's time. > > No, arguing that we should remove a parameter because it's useless > when you haven't bothered to test whether or not it actually is > useless is wasting people's time. It's most certainly not, IMHO. Discussing it here is *not* a waste of time. Or if any, it's a waste of time for a couple of people. If we leave it in, and it's useless, we waste the time of thousands of users. The choice between those two should be obvious. >> We don't ADD parameters based on supposition, why should we avoid >> removing parameters that have no measured benefit? > > If they have no actual benefit, of course we should remove them. If > they have no measured benefit because no one has bothered to measure, > that's not a reason to remove them. Another option might be to as a first step remove them from the .conf file or have a "deprecated" section, maybe? But if we do that, people aren't likely to use them anyway... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Sun, May 13, 2012 at 4:17 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > This code is our pre-9.2 group commit implementation, pretty much in > its entirety: > > if (CommitDelay > 0 && enableFsync && > MinimumActiveBackends(CommitSiblings)) > pg_usleep(CommitDelay); A semantic issue, I guess, but I would say that all the code that keeps the xlogctl->LogwrtRqst updated and reads from it is also part of the group commit implementation. > > This code is placed directly before the RecordTransactionCommit() call > of XLogFlush(). It seeks to create a delay of commit_delay immediately > prior to flushing when MinimumActiveBackends(CommitSiblings), in the > hope that that delay will be enough that when we do eventually reach > XLogFlush(), we can fastpath out of it due to Postgres by then having > already flushed up to the XLogRecPtr we need flushed. I think it is more useful to put it the other way around. The hope is that when we eventually reach XLogFlush, we will find that other people have added their commit records, so that we can fsync theirs as well as our, letting them fast path out later on. Really if someone else is already doing the sleep, there is no reason for the current process to do so as well. All that will do is delay the time before the current process wakes up and realizes it has already been fsynced. Instead it should block on the other sleeping process. But that didn't help much if any when I tried it a couple years ago. It might work better now. ... > The original group commit patch that Simon and I submitted deprecated > these settings, and I fail to understand why the committed patch > didn't do that too. These days, the only way this code could possibly > result in a fastpath is here, at transam/xlog.c:2094, with the > potentially stale (and likely too stale to be useful when new group > commit is in play) LogwrtResult value: > > /* done already? */ > if (XLByteLE(record, LogwrtResult.Flush)) > break; Why is that likely too stale to be useful? It was updated just 4 lines previous. > > Now, previously, we could also fastpath a bit later, at about > transam/xlog.c:2114: > > /* Got the lock */ > LogwrtResult = XLogCtl->LogwrtResult; > if (!XLByteLE(record, LogwrtResult.Flush)) > { > ..... > } > /* control might have fastpathed and missed the above block. We're > done now. */ > > ...but now control won't even reach here unless we have the exclusive > lock on WALWriteLock (i.e. we're the leader backend during 9.2's group > commit), so fastpathing out of XLogFlush() becomes very rare in 9.2 . > > One illuminating way of quickly explaining the new group commit code > is that it also inserts a delay at approximately the same place (well, > more places now, since the delay was previously inserted only at the > xact.c callsite of XLogFlush(), and there are plenty more sites than > that), only that delay is now just about perfectly adaptive. That > isn't quite the full truth, since we *also* have the benefit of > *knowing* that there is an active leader/flusher at all times we're > delayed. I don't get what you are saying. The new code does not have any more delays than the old code did, there is still only one pg_usleep(CommitDelay), and it is still in the same place. Cheers, Jeff
On Sun, May 13, 2012 at 11:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > Keeping a parameter without any clue as to whether it has benefit is > just wasting people's time. > > We don't ADD parameters based on supposition, why should we avoid > removing parameters that have no measured benefit? Using pgbench -T30 -c 2 -j 2 on a 2 core laptop system, with a scale that fits in shared_buffers: --commit-delay=2000 --commit-siblings=0 tps = 162.924783 (excluding connections establishing) --commit-delay=0 --commit-siblings=0 tps = 89.237578 (excluding connections establishing) Cheers, Jeff
On Mon, May 14, 2012 at 3:15 AM, Magnus Hagander <magnus@hagander.net> wrote: > On Mon, May 14, 2012 at 8:17 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, May 14, 2012 at 2:07 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> Keeping a parameter without any clue as to whether it has benefit is >>> just wasting people's time. >> >> No, arguing that we should remove a parameter because it's useless >> when you haven't bothered to test whether or not it actually is >> useless is wasting people's time. > > It's most certainly not, IMHO. Discussing it here is *not* a waste of > time. Or if any, it's a waste of time for a couple of people. If we > leave it in, and it's useless, we waste the time of thousands of > users. The choice between those two should be obvious. Discussing it in general is not a waste of time, but the argument that we should remove it because there's no evidence we should keep it is completely backwards. We should add OR remove things based on evidence, not the absence of evidence. There is certainly room for discussion about what amount of evidence is adequate, but I do not think zero is the right number. Now, interestingly, Jeff Janes just did some testing, and it shows almost a 2x speedup. I think that's a much better starting point for a productive discussion. Does that change your mind at all? Is it too small a boost to be relevant? Too artificial in some other way? It doesn't seem impossible to me that the recent group commit changes made it *easier* to get a benefit out of these settings than it was before. It may be that with the old implementation, it was hopeless to get any kind of improvement out of these settings, but it no longer is. Or maybe they're still hopeless. I don't have a strong opinion about that, and welcome discussion. But I'm always going to be opposed to adding or removing things on the basis of what we didn't test. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 14 May 2012 15:09, Robert Haas <robertmhaas@gmail.com> wrote: > I don't have a strong opinion > about that, and welcome discussion. But I'm always going to be > opposed to adding or removing things on the basis of what we didn't > test. The subject of the thread is "Why do we still have commit_delay and commit_siblings?". I don't believe that anyone asserted that we should remove the settings without some amount of due-diligence testing. Simon said that thorough testing on many types of hardware was not practical, which, considering that commit_delay is probably hardly ever (never?) used in production, I'd have to agree with. With all due respect, for someone that doesn't have a strong opinion on the efficacy of commit_delay in 9.2, you seemed to have a strong opinion on the standard that would have to be met in order to deprecate it. I think we all could stand to give each other the benefit of the doubt more. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Mon, May 14, 2012 at 10:24 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > On 14 May 2012 15:09, Robert Haas <robertmhaas@gmail.com> wrote: >> I don't have a strong opinion >> about that, and welcome discussion. But I'm always going to be >> opposed to adding or removing things on the basis of what we didn't >> test. > > The subject of the thread is "Why do we still have commit_delay and > commit_siblings?". I don't believe that anyone asserted that we should > remove the settings without some amount of due-diligence testing. > Simon said that thorough testing on many types of hardware was not > practical, which, considering that commit_delay is probably hardly > ever (never?) used in production, I'd have to agree with. With all due > respect, for someone that doesn't have a strong opinion on the > efficacy of commit_delay in 9.2, you seemed to have a strong opinion > on the standard that would have to be met in order to deprecate it. > > I think we all could stand to give each other the benefit of the doubt more. I am a bit perplexed by this thread. It appeared to me that you were saying that these settings could not ever possibly be useful and therefore we ought to remove them right now, and I said we should gather some data first, because the current behavior, without using these settings, appears to be about 50% of the optimum. If you agree we need to gather some data first, then apparently we don't disagree about anything, but that wasn't mentioned in your original email or in Simon's reply to my post. There are certainly many instances where we've made changes quickly without gathering much data first, so I feel that it wasn't ridiculous on my part to think that might be the proposal on the table. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, May 14, 2012 at 8:42 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Sun, May 13, 2012 at 11:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> >> Keeping a parameter without any clue as to whether it has benefit is >> just wasting people's time. >> >> We don't ADD parameters based on supposition, why should we avoid >> removing parameters that have no measured benefit? > > Using pgbench -T30 -c 2 -j 2 on a 2 core laptop system, with a scale > that fits in shared_buffers: > > --commit-delay=2000 --commit-siblings=0 > tps = 162.924783 (excluding connections establishing) > > --commit-delay=0 --commit-siblings=0 > tps = 89.237578 (excluding connections establishing) These results are astonishingly good, and I can't reproduce them. I spent some time this morning messing around with this on the IBM POWER7 machine and my MacBook Pro. Neither of these have exceptionally good fsync performance, and in particular the MacBook Pro has really, really bad fsync performance. On the IBM POWER7 machine, I'm not able to demonstrate any performance improvement at all from fiddling with commit delay. I tried tests at 2 clients, 32 clients, and 80 clients, and I'm getting... nothing. No improvement at all. Zip. I tried a few different settings for commit_delay, too. On the MacBook Pro, with wal_sync_method=obscenely_slow^Wfsync_writethrough, I can't demonstrate any improvement at 2 clients, but at 80 clients I observe a roughly 1.8x performance gain (~50 tps -> ~90 tps). Whether this is anything to get excited about is another matter, since you'd hope to get more than 1.1 transactions per second no matter how slow fsync is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 15 May 2012 15:17, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, May 14, 2012 at 10:24 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote: >> On 14 May 2012 15:09, Robert Haas <robertmhaas@gmail.com> wrote: >>> I don't have a strong opinion >>> about that, and welcome discussion. But I'm always going to be >>> opposed to adding or removing things on the basis of what we didn't >>> test. >> >> The subject of the thread is "Why do we still have commit_delay and >> commit_siblings?". I don't believe that anyone asserted that we should >> remove the settings without some amount of due-diligence testing. >> Simon said that thorough testing on many types of hardware was not >> practical, which, considering that commit_delay is probably hardly >> ever (never?) used in production, I'd have to agree with. With all due >> respect, for someone that doesn't have a strong opinion on the >> efficacy of commit_delay in 9.2, you seemed to have a strong opinion >> on the standard that would have to be met in order to deprecate it. >> >> I think we all could stand to give each other the benefit of the doubt more. > > I am a bit perplexed by this thread. It appeared to me that you were > saying that these settings could not ever possibly be useful and > therefore we ought to remove them right now, and I said we should > gather some data first, because the current behavior, without using > these settings, appears to be about 50% of the optimum. If you agree > we need to gather some data first, then apparently we don't disagree > about anything, but that wasn't mentioned in your original email or in > Simon's reply to my post. There are certainly many instances where > we've made changes quickly without gathering much data first, so I > feel that it wasn't ridiculous on my part to think that might be the > proposal on the table. We don't have enough evidence to show that there are any gains to be had here in a real world situation. Few if any benchmarks show anything of value, and if they do it is because they are too-regular and not very real. My comments were appropriate: if I tried to suggest we add commit_delay as a feature, it would be rejected and rightly so. Some caution in its removal is appropriate, but since we've been discussing it since before your first post to hackers, probably even before mine, I figure that is way past long enough. I beg you to prove me wrong and demonstrate the value of commit_delay, since we will all benefit from that. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, May 15, 2012 at 7:47 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, May 14, 2012 at 8:42 AM, Jeff Janes <jeff.janes@gmail.com> wrote: >> On Sun, May 13, 2012 at 11:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> >>> Keeping a parameter without any clue as to whether it has benefit is >>> just wasting people's time. >>> >>> We don't ADD parameters based on supposition, why should we avoid >>> removing parameters that have no measured benefit? >> >> Using pgbench -T30 -c 2 -j 2 on a 2 core laptop system, with a scale >> that fits in shared_buffers: >> >> --commit-delay=2000 --commit-siblings=0 >> tps = 162.924783 (excluding connections establishing) >> >> --commit-delay=0 --commit-siblings=0 >> tps = 89.237578 (excluding connections establishing) > > These results are astonishingly good, and I can't reproduce them. I > spent some time this morning messing around with this on the IBM > POWER7 machine and my MacBook Pro. Neither of these have > exceptionally good fsync performance, and in particular the MacBook > Pro has really, really bad fsync performance. Did you also set --commit-siblings=0? Are you using -i -s 1, and therefor serializing on the sole entry in pgbench_branches? Could you instrument the call to pg_usleep and see if it is actually being called? (Or, simply strace-ing the process would probably tell you that). > On the IBM POWER7 machine, I'm not able to demonstrate any performance > improvement at all from fiddling with commit delay. I tried tests at > 2 clients, 32 clients, and 80 clients, and I'm getting... nothing. > No improvement at all. Zip. I tried a few different settings for > commit_delay, too. On the MacBook Pro, with > wal_sync_method=obscenely_slow^Wfsync_writethrough, If one of the methods gives sync times that matches the rotational speed of your disks, that is the one that I would use. If the sync is artificially slow because something in the kernel is gummed up, maybe whatever the problem is also interferes with other things. (Although I wouldn't expect it to, that is just a theory). I have a 5400 rpm drive, so 89 single client TPS is almost exactly to be expected. > I can't > demonstrate any improvement at 2 clients, but at 80 clients I observe > a roughly 1.8x performance gain (~50 tps -> ~90 tps). Whether this is > anything to get excited about is another matter, since you'd hope to > get more than 1.1 transactions per second no matter how slow fsync is. Yeah, you've got something much worse going on there than commit_delay can solve. With the improved group-commit code, or whatever we are calling it, if you get 50tps single-client then at 80 clients you should get almost 40x50 tps, assuming the scale is large enough to not block on row locks. Cheers, Jeff
On Tue, May 15, 2012 at 11:05 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > My comments were appropriate: if I tried to suggest we add > commit_delay as a feature, it would be rejected and rightly so. Fair point. > Some > caution in its removal is appropriate, but since we've been discussing > it since before your first post to hackers, probably even before mine, > I figure that is way past long enough. > > I beg you to prove me wrong and demonstrate the value of commit_delay, > since we will all benefit from that. Interestingly, we seem to have had this same argument 7 years ago, with different participants. http://archives.postgresql.org/pgsql-hackers/2005-06/msg01463.php What's really bothering me here is that a LOT has changed in 9.2. Besides the LWLockAcquireOrWait stuff, which improves fsync scalability quite a bit, we have also whacked around the WAL writer behavior somewhat. It's not necessarily the case that things which didn't work well before still won't work well now. On the other hand, I'll grant you that our current implementation of commit_delay is pretty boneheaded. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, May 15, 2012 at 12:07 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> These results are astonishingly good, and I can't reproduce them. I >> spent some time this morning messing around with this on the IBM >> POWER7 machine and my MacBook Pro. Neither of these have >> exceptionally good fsync performance, and in particular the MacBook >> Pro has really, really bad fsync performance. > > Did you also set --commit-siblings=0? No. > Are you using -i -s 1, and therefor serializing on the sole entry in > pgbench_branches? No. Scale factor is 10. > Could you instrument the call to pg_usleep and see if it is actually > being called? > (Or, simply strace-ing the process would probably tell you that). I'm pretty sure it is. It was on the IBM POWER7 machine, anyway, because the pg_usleep calls showed up in the perf call graph I took. >> On the IBM POWER7 machine, I'm not able to demonstrate any performance >> improvement at all from fiddling with commit delay. I tried tests at >> 2 clients, 32 clients, and 80 clients, and I'm getting... nothing. >> No improvement at all. Zip. I tried a few different settings for >> commit_delay, too. On the MacBook Pro, with >> wal_sync_method=obscenely_slow^Wfsync_writethrough, > > If one of the methods gives sync times that matches the rotational > speed of your disks, that is the one that I would use. If the sync is > artificially slow because something in the kernel is gummed up, maybe > whatever the problem is also interferes with other things. (Although > I wouldn't expect it to, that is just a theory). I have a 5400 rpm > drive, so 89 single client TPS is almost exactly to be expected. > >> I can't >> demonstrate any improvement at 2 clients, but at 80 clients I observe >> a roughly 1.8x performance gain (~50 tps -> ~90 tps). Whether this is >> anything to get excited about is another matter, since you'd hope to >> get more than 1.1 transactions per second no matter how slow fsync is. > > Yeah, you've got something much worse going on there than commit_delay > can solve. > > With the improved group-commit code, or whatever we are calling it, if > you get 50tps single-client then at 80 clients you should get almost > 40x50 tps, assuming the scale is large enough to not block on row > locks. I am definitely not getting that. Let's try this again. Increase scale factor to 40. Decrease commit_siblings to 0. With 10 clients, and commit_delay=5000, I get 109-132 tps. With commit_delay=0, I get 58-71 tps. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company