Обсуждение: Recommended optimisations slows down PostgreSQL 8.4
I have PostgreSQL 8.4.8 on Ubuntu Linux x64. Server is a Core i7 950 with 6GB of RAM. 2GB of RAM us used by Java, somesmall amount by the kernel / services and the rest is available to PostgreSQL. Hard drive is a single 7200 RPM SATA1TB Caviar Black HDD. No other applications / processes are running when I perform my tests. I have an application that performs about 80% reads and 20% writes for a specific billrun. It takes about 60 minutes tocomplete, and I can have it perform precisely the same queries repeatedly. I have consistently showed that when shared_buffers= 24MB (the default), and wal_buffers = 64kB, the system completes the process in 50 minutes. When I bumpshared_buffers to 1500MB, the system slows down and takes 60 minutes to complete the same process. Changing that backto 24MB, but then changing wal_buffers to 16MB has the same impact - performance drops from 50 minutes to about 61 minutes. Changing those two parameters back to the defaults returns the time to 50 minutes. fsync = off for these tests - not sure if it is relevant. All other settings are at their defaults. Please explain why the system is slower with the recommended values for these two settings? The DB is about 74GB, the largesttable has 180 million rows.
On Thu, Aug 11, 2011 at 04:35:34PM -0700, Waldo Nell wrote: > I have PostgreSQL 8.4.8 on Ubuntu Linux x64. Server is a Core i7 950 with 6GB of RAM. 2GB of RAM us used by Java, somesmall amount by the kernel / services and the rest is available to PostgreSQL. Hard drive is a single 7200 RPM SATA1TB Caviar Black HDD. No other applications / processes are running when I perform my tests. > > I have an application that performs about 80% reads and 20% writes for a specific billrun. It takes about 60 minutesto complete, and I can have it perform precisely the same queries repeatedly. I have consistently showed that whenshared_buffers = 24MB (the default), and wal_buffers = 64kB, the system completes the process in 50 minutes. When Ibump shared_buffers to 1500MB, the system slows down and takes 60 minutes to complete the same process. Changing that backto 24MB, but then changing wal_buffers to 16MB has the same impact - performance drops from 50 minutes to about 61 minutes. Changing those two parameters back to the defaults returns the time to 50 minutes. > > fsync = off for these tests - not sure if it is relevant. All other settings are at their defaults. > > Please explain why the system is slower with the recommended values for these two settings? The DB is about 74GB, thelargest table has 180 million rows. One guess is that you are using the defaults for other costing parameters and they do not accurately reflect your system. This means that it will be a crap shoot as to whether a plan is faster or slower and what will affect the timing. Regards, Ken
On 2011-08-11, at 17:18 , ktm@rice.edu wrote: > One guess is that you are using the defaults for other costing parameters and they > do not accurately reflect your system. This means that it will be a crap shoot as > to whether a plan is faster or slower and what will affect the timing. Ok, but I thought the way to best optimise PostgreSQL is to start with the parameters having the biggest impact and workfrom there. To adjust multiple parameters would not give a clear indication as to the benefit of each, as they may canceleach other out. To test your theory, what other parameters should I be looking at? Here are some more with their current values: random_page_cost = 4.0 effective_cache_size = 128MB Remember this runs on SATA so random seeks are not as fast as say SSD.
On 08/11/2011 07:35 PM, Waldo Nell wrote: > Please explain why the system is slower with the recommended values for these two settings? If the other parameters are at their defaults, the server is probably executing a checkpoint every few seconds running your test. I'd wager your log is filled with checkpoint warnings, about them executing too frequently. Each time a checkpoint happens, all of shared_buffers is dumped out. So if you increase shared_buffers, but don't space the checkpoints out more, it just ends up writing the same data over and over again. Using a smaller shared_buffers lets the OS deal with that problem instead, so it actually ends up being more efficient. Basically, you can't increase shared_buffers usefully without also increasing checkpoint_segments. All three of shared_buffers, wal_buffers, and checkpoint_segments have to go up before you'll see the expected benefit from raising any of them; you can't change a parameter at a time and expect an improvement. Try this: shared_buffers=512MB wal_buffers=16MB checkpoint_segments=64 And see how that does. If that finally beats your 50 minute record, you can see if further increase to shared_buffers and checkpoint_segments continue to help from there. Effective upper limits on your server are probably around 2GB for shared_buffers and around 256 for checkpoint_segments; they could be lower if your application uses a lot of transitory data (gets read/written once, so the database cache is no better than the OS one). -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
On 2011-08-11, at 18:17 , Greg Smith wrote: > shared_buffers=512MB > wal_buffers=16MB > checkpoint_segments=64 Thanks for the advice. I tried these values... And it is even worse - went up to 63 minutes (from 60 minutes). Like I saidthis load is read mostly. My 80 / 20% might be a bit inaccurate, if so it could be more like 90% read and 10% write. I do not see any checkpoint warnings in my logs. I guess that means the OS cache is better for this particular use case than the postgresql cache?
On Thu, Aug 11, 2011 at 7:27 PM, Waldo Nell <pwnell@telkomsa.net> wrote: > > On 2011-08-11, at 17:18 , ktm@rice.edu wrote: > >> One guess is that you are using the defaults for other costing parameters and they >> do not accurately reflect your system. This means that it will be a crap shoot as >> to whether a plan is faster or slower and what will affect the timing. > > Ok, but I thought the way to best optimise PostgreSQL is to start with the parameters having the biggest impact and workfrom there. To adjust multiple parameters would not give a clear indication as to the benefit of each, as they may canceleach other out. A couple points: *) shared buffers is a highly nuanced setting that is very workload dependent. it mainly affects write heavy loads, and the pattern of writing is very important in terms of the benefits you may or may not see. it also changes checkpoint behavior -- this will typically manifest as a negative change with raising buffers but this can be mitigated. if your i/o becomes very bursty after raising this setting it's a red flag that more tuning is required. *) fsync = off: throw the book out on traditional tuning advice. with this setting (dangerously) set, the o/s is essentially responsible for i/o patterns so you should focus your tuning efforts there. the benefits of raising shared buffers don't play as much in this case. > To test your theory, what other parameters should I be looking at? Here are some more with their current values: > > random_page_cost = 4.0 > effective_cache_size = 128MB *) these settings affect query plans. changing them could have no affect or dramatic effect depending on the specific queries you have and if they or chosen badly due to overly default conservative settings. the postgresql planner has gotten pretty accurate over the years in the sense that you will want to tune these to be as close to reality as possible. In my opinion before looking at postgresql.conf you need to make sure your queries and their plans are good. fire up pgfouine and see where those 60 minutes are gettings spent. maybe you have a problem query that demands optimization. merlin
On 2011-08-12, at 09:32 , Merlin Moncure wrote: > In my opinion before looking at postgresql.conf you need to make sure > your queries and their plans are good. fire up pgfouine and see where > those 60 minutes are gettings spent. maybe you have a problem query > that demands optimization. Thanks for your advice, I am sure to look into what you said. I might just add some background information. The processused to take 266 minutes to complete - which I got down to 49 minutes. I spent a LOT of time optimising queries,implementing multithreading to utilise the extra cores and place more load on the DB that way etc. So that beingdone as best I can for now, I am focussing on the DB itself. I am a firm believer the best place to optimise is firstthe queries / code THEN the hardware / fine tuning the parameters. The fsync = off was because the production system runs on a uber expensive SAN system with multipathing over Fibre Channel,it is on UPS and backup generators in a secure datacenter, and we have daily backups we can fall back to.
Waldo Nell <pwnell@telkomsa.net> wrote: > The fsync = off was because the production system runs on a uber > expensive SAN system with multipathing over Fibre Channel, it is > on UPS and backup generators in a secure datacenter, and we have > daily backups we can fall back to. Turning fsync off in production may be OK as long as those daily backups aren't in the same building as the uber expensive SAN, and it's really OK to fall back on a daily backup if the database server crashes or locks up. By the way, I never trust a backup until I have successfully restored from it; you should probably do that at least on some periodic basis if you can't do it every time. The other thing I would point out is that if you are tuning with different table sizes, RAM sizes, or I/O performance characteristics from production, the database tuning in one environment may not have much to do with what works best in the other environment. As for why the recommended settings are having paradoxical effects in this environment -- this advice is generally based on systems with fsync on and a RAID controller with battery-backed cache configured for write-back. I don't know how well the advice generalizes to a single spindle with fsync off. -Kevin
On 08/12/2011 12:28 PM, Waldo Nell wrote: > I guess that means the OS cache is better for this particular use case > than the postgresql cache? There you go. It's not magic; the database cache has some properties that work very well for some workloads. And for others, you might as well let the OS deal with it. The fact that you have the option of adjusting the proportions here is a controversial design point, but it does let you tune to your workload in this sort of case. > The fsync = off was because the production system runs on a uber expensive SAN system with multipathing over Fibre Channel,it is> on UPS and backup generators in a secure datacenter, and we have daily backups we can fall back to. The safer alternative is to turn synchronous_commit off and increase wal_writer_delay. That may not be quite as fast as turning fsync off, but the risk level is a lot lower too. The first time someone accidentally unplugs a part of your server, you'll realize that the UPS and generators don't really provide very much protection against the things that actually happen in a data center. Having backups is great, but needing to restore from them is no fun. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
On 2011-08-12, at 10:10 , Kevin Grittner wrote: > Turning fsync off in production may be OK as long as those daily > backups aren't in the same building as the uber expensive SAN, and > it's really OK to fall back on a daily backup if the database server > crashes or locks up. By the way, I never trust a backup until I > have successfully restored from it; you should probably do that at > least on some periodic basis if you can't do it every time. Yes we have daily tape backups that are taken off site. And since we refresh QA from prod at least 4 times a month, we knowthe backups are good on a frequent basis. Very valid points. > > The other thing I would point out is that if you are tuning with > different table sizes, RAM sizes, or I/O performance characteristics > from production, the database tuning in one environment may not have > much to do with what works best in the other environment. My DB is an exact duplicate I took from production, however my testing is on different hardware and since I cannot afforda SAN for testing purposes, I am testing on a 7200rpm SATA drive so yeah I guess that is true... I will need to performancetest on production environment. > > As for why the recommended settings are having paradoxical effects > in this environment -- this advice is generally based on systems > with fsync on and a RAID controller with battery-backed cache > configured for write-back. I don't know how well the advice > generalizes to a single spindle with fsync off. Thanks, I will be sure to carry on testing on production.
Waldo Nell <pwnell@telkomsa.net> writes: > I have PostgreSQL 8.4.8 on Ubuntu Linux x64. Server is a Core i7 950 > with 6GB of RAM. 2GB of RAM us used by Java, some small amount by the > kernel / services and the rest is available to PostgreSQL. [ and the DB is 74GB, and things get slower when raising shared_buffers from 24MB to 1500MB ] One other point here is that with the DB so much larger than available RAM, you are almost certainly doing lots of I/O (unless your test case has lots of locality of reference). With small shared_buffers, the space for kernel disk cache amounts to 3 or so GB, and that's your primary buffer against duplicate I/Os. When you crank shared_buffers up to half that, you now have two buffer pools of about the same size independently trying to cache the most-used parts of the DB. This is likely to not work too well and result in much more I/O. You save some shared-buffers-to-kernel-buffers transfers with more shared_buffers, but if the amount of disk I/O goes up a lot in consequence, you'll come out way behind. regards, tom lane
* Waldo Nell (pwnell@telkomsa.net) wrote: > The fsync = off was because the production system runs on a uber expensive SAN system with multipathing over Fibre Channel,it is on UPS and backup generators in a secure datacenter, and we have daily backups we can fall back to. So, two points: #1- the uber-expensive SAN should make twiddling fsync have much less of an effect on performance than in a non-SAN/non-BBWC environment, so you might validate that you really need it off. #2- the SAN, FC, UPS, etc will be of no help if the OS or PG crash. Seems pretty harsh to resort back to a daily backup in the event the OS reboots due to some wacky NMI, or the ASR going haywire.. Thanks, Stephen