Обсуждение: 8.3 beta testing suggestions welcome
I've been lobbying management here for us to allocate some resources to testing 8.3 once it hits beta. If it is approved,it might happen on a time frame too short to get much feedback before the tests, so I'm throwing the question outhere now: what would people like us to bang on? The box most likely to be used for the testing is a bit old, but still, it is SMP and we would be throwing real-world trafficat it, so it should be of some value. It has 4 2 GHz Xeon MP CPUs, 6 GB RAM, and a RAID controller with 256 MB battery-backedRAM cache. The 230 GB database would be sitting on a 407 GB RAID 5 array. In addition to the PostgreSQL instancethere would be two Java middle tiers running on the box. One middle tier is for modifying data based on transactions received from 72 source databases; this load is about 1 milliondatabase transactions on a typical work day, with an average of maybe 20 INSERT, UPDATE, and DELETE statements pertransaction. (We don't typically have many deletes.) The other middle tier uses a login which only has SELECT rightsto support our web site. We have about 2 million web hits per day generating about 10 million database transactions. We can play the actual HTTP requests from our log through a bank of renderers to get a real mix of queriesfrom production. We're particularly interested in seeing what configuration changes we may have to make to achieve optimal performance withthe checkpoints and background writer in the new release. When we first went to PostgreSQL our biggest problem was thatdirty buffers would accumulate in shared memory until a checkpoint, and then overrun the controllers cache. This wouldcause disk reads to queue up behind the writes, and queries which normally ran in a millisecond or two were timing outat our renderers' 20 second limit. The problem went away completely when we used a very aggressive background writerconfiguration, to put the dirty pages in front of the OS file system right away, so that its algorithms and the controllercache could deal with things before they got out of hand. We could run some tests with just the read-only web load, if that is useful, or push the update load alone. We could paceinput. My guess is that the most useful tests would involve letting both run as fast as the machine can handle it withvarious configurations and see what throughput and timeout counts we get. Any thoughts or suggestions welcome, particularly about what configurations to try. -Kevin
Kevin Grittner wrote: > I've been lobbying management here for us to allocate some resources to testing 8.3 once it hits beta. If it is approved,it might happen on a time frame too short to get much feedback before the tests, so I'm throwing the question outhere now: what would people like us to bang on? That would be great! > We're particularly interested in seeing what configuration changes we may have to make to achieve optimal performance withthe checkpoints and background writer in the new release. When we first went to PostgreSQL our biggest problem was thatdirty buffers would accumulate in shared memory until a checkpoint, and then overrun the controllers cache. This wouldcause disk reads to queue up behind the writes, and queries which normally ran in a millisecond or two were timing outat our renderers' 20 second limit. The problem went away completely when we used a very aggressive background writerconfiguration, to put the dirty pages in front of the OS file system right away, so that its algorithms and the controllercache could deal with things before they got out of hand. Yes, the load distributed checkpoints definitely should help with that. I'd like to see how well it works for you with the default bgwriter settings. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > When we first went to PostgreSQL our biggest problem was that dirty buffers > would accumulate in shared memory until a checkpoint, and then overrun the > controllers cache. This would cause disk reads to queue up behind the > writes, and queries which normally ran in a millisecond or two were timing > out at our renderers' 20 second limit. The problem went away completely when > we used a very aggressive background writer configuration, to put the dirty > pages in front of the OS file system right away, so that its algorithms and > the controller cache could deal with things before they got out of hand. Sounds like a tailor-mode use case for precisely what Heikki was complaining about. He couldn't find a case in 8.3 where tuning the bgwriter to be more aggressive helped at all. With the load distributed checkpoints I think the symptoms would be different but the disease may still be there. Since checkpoints will try not to swamp your i/o bandwidth any longer you shouldn't get these terrible spikes. However the theory with bgwriter is that setting it to be very aggressive will reduce the response time even outside the checkpoints by avoiding the need for individual backends to evict dirty pages. So it would be interesting to know with 8.3 whether the average response time even outside of checkpoints is reduced by having a more aggressive bgwriter policy. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Mon, 20 Aug 2007, Kevin Grittner wrote: > The problem went away completely when we used a very aggressive > background writer configuration, to put the dirty pages in front of the > OS file system right away, so that its algorithms and the controller > cache could deal with things before they got out of hand. Can you share what the aggressive BGW settings you settled on were? Knowing that information would allow giving better suggestions on how to translate what you're currently doing into the very different settings 8.3 uses in this area. Also be helpful to know about how often you have a checkpoint with your current configuration, and how big your shared_buffers is to get a general context for the size/frequency of potential checkpoint problems. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
>>> On Tue, Aug 21, 2007 at 2:36 AM, in message <Pine.GSO.4.64.0708210326370.2093@westnet.com>, Greg Smith <gsmith@gregsmith.com> wrote: > On Mon, 20 Aug 2007, Kevin Grittner wrote: > >> The problem went away completely when we used a very aggressive >> background writer configuration, to put the dirty pages in front of the >> OS file system right away, so that its algorithms and the controller >> cache could deal with things before they got out of hand. > > Can you share what the aggressive BGW settings you settled on were? Under 8.2.4, the postgresql.conf file, minus blank lines and comments: listen_addresses = '*' max_connections = 200 shared_buffers = 160MB temp_buffers = 50MB work_mem = 10MB maintenance_work_mem = 160MB max_fsm_pages = 800000 bgwriter_lru_percent = 20.0 bgwriter_lru_maxpages = 200 bgwriter_all_percent = 10.0 bgwriter_all_maxpages = 600 wal_buffers = 160kB checkpoint_segments = 10 random_page_cost = 2.0 effective_cache_size = 5GB redirect_stderr = on log_line_prefix = '[%m] %p %q<%u %d %r> ' stats_block_level = on stats_row_level = on autovacuum = on autovacuum_naptime = 10s autovacuum_vacuum_threshold = 1 autovacuum_analyze_threshold = 1 datestyle = 'iso, mdy' lc_messages = 'C' lc_monetary = 'C' lc_numeric = 'C' lc_time = 'C' escape_string_warning = off standard_conforming_strings = on sql_inheritance = off > Also be helpful to know about how often you have a > checkpoint with your current configuration Since our problems associated with checkpoints have been eliminated I haven't been tracking them very closely. A scan of our log files on this machine show that there were two episodes in the last two months where we had 20 to 40 checkpoints in a day which were less than 30 seconds after the prior one. These were associated with times when this central copy had fallen significantly behind the source databases and replication was running full out, catching up with the sources. Outside of those events we seem to have on to four episodes on scattered days. A quick check shows that all but a few coincide with a weekly run which passes massive amounts of data to rebuild a table of cases which should not show on the public web site according to the Supreme Court's records retention rules. Is there anything you would like me to do to capture more information about the checkpoints? -Kevin
On Tue, 21 Aug 2007, Kevin Grittner wrote: > shared_buffers = 160MB > effective_cache_size = 5GB > bgwriter_lru_percent = 20.0 > bgwriter_lru_maxpages = 200 > bgwriter_all_percent = 10.0 > bgwriter_all_maxpages = 600 > checkpoint_segments = 10 The other thing I realized you haven't mentioned yet is what operating system you're using. What you've done here is make your shared_buffers smaller than even the write cache on your disk controller (256MB), so that anytime a checkpoint hits it should be able to cache writing even the entire buffer cache out if necessary. You're letting the OS handle caching everything else, which is memory not involved in the checkpoint process and therefore doesn't impact your problem situation. With checkpoint_segments at 10, you can't do much activity (relative to how fast this powerful of a server can dirty pages) before a checkpoint happens, which also limits how much any one checkpoint has to clean up. And your background writer settings are aggressive enough that the entire pool is getting scanned constantly, which is wasting lots of CPU resources but you have them to spare. This is reasonable solution for working around checkpoint issues in 8.2.4, but the fact that you're still having checkpoints spaced 30 seconds apart sometimes is one weak spot you already have seen. PostgreSQL operating with a tiny shared_buffers cache is another, and you could expect overall performance to improve if it were possible to increase that without getting killed by checkpoints. In 8.3, it should be possible for you to increase both shared_buffers and checkpoint_segments significantly and still have checkpoints happen in an orderly fashion. There is no all-scan anymore, instead there's a parameter called checkpoint_completion_target; see http://developer.postgresql.org/pgdocs/postgres/wal-configuration.html My suggestion for a starting 8.3 configuration for you would be adjusting these settings as follows: shared_buffers=1GB checkpoint_segments = 50 And then try replaying your data with checkpoint_completion_target at 0.5 (default), 0.7, and 0.9 and see how each of those works out for you (0.7 is the least useful of those if you only did two). Hopefully the data replay you have can be setup to invoke the same tightly spaced checkpoint behavior you commented about. Based on the results of those tests, it may be possible to further increase shared_buffers, and checkpoint_segments/checkpoint_timeout may need some adjustment one way or another. Note that some/all of the bgwriter_lru parameters may be going away before 8.3 is done as well, that's a loose end I'm working on right now. If your project gets approved, that's what I think would be a useful test to run. That should get some good results for the community as large as well as research how upgrading to the new version might positively impact your application. You're actually in one of the situations I'm a little concerned about. All the tests that have been done by people here have suggested using checkpoint_completion_target and removing the all scan are always net positive compared to the 8.2.4 behavior, but your situation (where you're heavily using bgwriter_all_percent = 10.0 and bgwriter_all_maxpages = 600) is one where it's possible 8.3 may be a step backwards. Not likely, just possible, and it would be great to get another data point on this during the beta. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
>>> On Tue, Aug 21, 2007 at 1:12 PM, in message <Pine.GSO.4.64.0708211316210.16511@westnet.com>, Greg Smith <gsmith@gregsmith.com> wrote: > > The other thing I realized you haven't mentioned yet is what operating > system you're using. Linux version 2.6.5-7.286-bigsmp (geeko@buildhost) (gcc version 3.3.3 (SuSE Linux)) #1 SMP Thu May 31 10:12:58 UTC 2007 SUSE LINUX Enterprise Server 9 (i586) VERSION = 9 PATCHLEVEL = 3 I know this is old, but I was happy just to get them off of Windows. :-/ > My suggestion for a starting 8.3 configuration for you would be adjusting > these settings as follows: > > shared_buffers=1GB > checkpoint_segments = 50 > > And then try replaying your data with checkpoint_completion_target at 0.5 > (default), 0.7, and 0.9 and see how each of those works out for you (0.7 > is the least useful of those if you only did two). Hopefully the data > replay you have can be setup to invoke the same tightly spaced checkpoint > behavior you commented about. Based on the results of those tests, it may > be possible to further increase shared_buffers, and > checkpoint_segments/checkpoint_timeout may need some adjustment one way or > another. > > Note that some/all of the bgwriter_lru parameters may be going away before > 8.3 is done as well, that's a loose end I'm working on right now. Thanks much. If I get the approval, I'll be sure to run these three tests. It probably makes sense to get a baseline with the current config first. I assume you want me to do these with both the update and query streams running full out? Anyone else have a configuration to suggest? -Kevin
>>> On Tue, Aug 21, 2007 at 1:12 PM, in message <Pine.GSO.4.64.0708211316210.16511@westnet.com>, Greg Smith <gsmith@gregsmith.com> wrote: > My suggestion for a starting 8.3 configuration for you would be adjusting > these settings as follows: > > shared_buffers=1GB > checkpoint_segments = 50 I'm not into the most meaningful tests yet, but I'm trying to garner what useful information I can from the load of the 8.2.5 database into a server running the cvs tip copy taken a few hours ago. After letting the cache settle in, while in the middle of a COPY of a large table, I see the attached through a couple checkpoint cycles. The pattern is pretty consistent with this over other the other checkpoint cycles I've examined. Note that the system can sustain over 20000 writes per second without the disk being a bottleneck. The current tip behavior is smoother than 8.2.x without background writer, for sure. It still isn't as smooth as 8.2.x with an aggressive backgroud writer, at least for loading a dump. Note how it lounges around at as low as 5500 writes per second for prolonged periods with occassional spikes into the 55000 range, causing I/O wait time, although these only last a few seconds. Without more rigorous tests for comparison, I'd say that this is probably a net loss for this operation -- I seem to recall seeing a pretty steady 20000 bo under 8.2.x. More info to follow as testing progresses. -Kevin
Вложения
On Fri, 30 Nov 2007, Kevin Grittner wrote: >> checkpoint_segments = 50 Here's how I'm reading your data: 16:43:11 : Checkpoint start 16:44:23 : Checkpoint ends [ 1:22 long] 1:01 passes 16:45:24 : Checkpoint start 16:46:36 : checkpoint ends [1:22 long] If you're getting a checkpoint every minute or so, the number of checkpoint segments you're using is still far too low to really take advantage of the new checkpoint spreading features. You want to make that high enough that it's several minutes between them. It's not unusual to set a much higher checkpoint_segments during bulk loading than you'd necessarily want for actual production use. One thing that would make your reports more useful: in the spots in your log where you're labeling [checkpoint starting] etc., if you could include the new message spit out by turning on checkpoint_log there it would help. That's the one that says how much data was written. Did you ever write something to save snapshots of pg_stat_bgwriter? Those would be interesting to see on the same time scale as well. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
>>> On Tue, Aug 21, 2007 at 1:12 PM, in message <Pine.GSO.4.64.0708211316210.16511@westnet.com>, Greg Smith <gsmith@gregsmith.com> wrote: > My suggestion for a starting 8.3 configuration for you would be adjusting > these settings as follows: > > shared_buffers=1GB > checkpoint_segments = 50 > > And then try replaying your data with checkpoint_completion_target at 0.5 > (default), 0.7, and 0.9 and see how each of those works out for you (0.7 > is the least useful of those if you only did two). Hopefully the data > replay you have can be setup to invoke the same tightly spaced checkpoint > behavior you commented about. Under 8.2.4, 8.2.5, 8.3beta4 and a recent compile of the CVS tip, the test machine (4 2 GHz Xeons with 6 GB RAM) was unable to create the clusters of timeouts which we saw on the production box (8 3GHz Xeons with 12 GB RAM), in spite of concerted efforts to recreate the conditions. It's possible that updates to PostgreSQL or the OS have made the difference, although I think it's more likely that the larger OS cache and/or the higher CPU power of the production box are needed to create the problem on this type of disk array. We're working on getting a couple new boxes on line, which will allow us to test using the current production box, but that's weeks away. Performance under 8.3 is better in general, and there is a very noticeable smoothing of the output spikes at the OS level. I didn't notice much difference in vmstat output with the different checkpoint_completion_target settings. Our testing methoodology involved playing transaction streams from both our replication engine and our web site, at the same time, against the test server, through the normal middle tier software we use in production. We have timestamps on all of it, which allowed us to play the streams at various percentages of the production load. Based on the timeouts from the web renderers (which cancel a request if the middle tier hasn't responded within 20 seconds), the middle tier metrics of actual run time once the request is pulled from the request queue, and the vmstat output, 8.3 shows significant performance improvement as long as the fix for the OUTER JOIN optimizer regression is used (that is, with a recent checkout from the CVS tip, or beta/RC with Tom's patch applied). -Kevin