Обсуждение: Backup causing poor performance - suggestions
PostgreSQL: 8.2.4
We currently backup all of our database tables per schema using pg_dump every half hour. We have been noticing that the database performance has been very poor during the backup process. How can I improve the performance?
Server Specs:
Dedicated DB server
Database takes up 8.0 Gig of disk space
2 Xeon 5160 dual cores 3.0
16 G of memory
Two disks in raid 1 are used for the OS, database and backups. SAS 10,000 RPM drives.
OS: Linux AS 4.x 64 bit
shared_buffers = 1 GB
work_mem = 20MB
max_fsm_pages = 524288
random_page_cost=1.0
effective_cache_size=16GB
max_connections=150
All other settings are the default settings.
I have tried doing backups to a second set of disks but the performance only improved somewhat.
Does anyone have advice on how to improve my performance during backup? Would adding two quad core processors improve performance?
Thanks,
Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
On Mon, May 5, 2008 at 8:59 AM, Campbell, Lance <lance@uiuc.edu> wrote: > > PostgreSQL: 8.2.4 You should update to 8.2.7 as a matter of periodic maintenance. It's a very short and easy update. > We currently backup all of our database tables per schema using pg_dump > every half hour. We have been noticing that the database performance has > been very poor during the backup process. How can I improve the > performance? > > > > Server Specs: > > Dedicated DB server > > Database takes up 8.0 Gig of disk space > > 2 Xeon 5160 dual cores 3.0 > > 16 G of memory > > Two disks in raid 1 are used for the OS, database and backups. SAS 10,000 > RPM drives. > > OS: Linux AS 4.x 64 bit So, what kind of RAID controller are you using? And can you add more drives and / or battery backed cache to it?
Scott, The server is a Dell PowerEdge 2900 II with the standard Perc 6/I SAS controller with 256 MB cache. Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu -----Original Message----- From: Scott Marlowe [mailto:scott.marlowe@gmail.com] Sent: Monday, May 05, 2008 10:06 AM To: Campbell, Lance Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Backup causing poor performance - suggestions On Mon, May 5, 2008 at 8:59 AM, Campbell, Lance <lance@uiuc.edu> wrote: > > PostgreSQL: 8.2.4 You should update to 8.2.7 as a matter of periodic maintenance. It's a very short and easy update. > We currently backup all of our database tables per schema using pg_dump > every half hour. We have been noticing that the database performance has > been very poor during the backup process. How can I improve the > performance? > > > > Server Specs: > > Dedicated DB server > > Database takes up 8.0 Gig of disk space > > 2 Xeon 5160 dual cores 3.0 > > 16 G of memory > > Two disks in raid 1 are used for the OS, database and backups. SAS 10,000 > RPM drives. > > OS: Linux AS 4.x 64 bit So, what kind of RAID controller are you using? And can you add more drives and / or battery backed cache to it?
On Mon, May 5, 2008 at 9:10 AM, Campbell, Lance <lance@uiuc.edu> wrote: > Scott, > The server is a Dell PowerEdge 2900 II with the standard Perc 6/I SAS > controller with 256 MB cache. It's probably not gonna win any awards, but it's not too terrible. What does vmstat 1 say during your backups / normal operation? The last four or five columns have the most useful data for troubleshooting.
Campbell, Lance wrote: > We currently backup all of our database tables per schema using pg_dump > every half hour. We have been noticing that the database performance > has been very poor during the backup process. How can I improve the > performance? It sounds like the goal is to have frequent, near-real-time backups of your databases for recovery purposes. Maybe insteadof looking at pg_dump's performance, a better solution would be a replication system such as Slony, or a "warm backup"using Skype Tools. Backing up the database every half hour puts a large load on the system during the dump, and means you are re-dumping thesame data, 48 times per day. If you use a replication solution, the backup process is continuous (spread out throughthe day), and you're not re-dumping static data; the only data that moves around is the new data. I've used Slony with mixed success; depending on the complexity and size of your database, it can be quite effective. I'veheard very good reports about Skype Tools, which has both a Slony-like replicator (not as configurable as Slony, buteasier to set up and use), plus an entirely separate set of scripts that simplifies "warm standby" using WAL logging. Craig
On Mon, May 5, 2008 at 10:11 AM, Campbell, Lance <lance@uiuc.edu> wrote: > Scott, > The last 6 entries are when the system is not backing up. The system > was running fine. But the other entries are when it was backing up. > Reads seem to be fine but any operations that need to write data just > hang. Could you repost that as an attachment? the wrapping applied by your email client makes it very hard to read. Just perusing it, it doesn't look like you're CPU bound, but I/O bound. As Craig mentioned, you may do better with some form of replication solution here than pg_dumps. Given that your db can fit in memory (when you say it's 8G do you mean ON DISK, or in a backup? Big diff) then the only thing the backups should be slowing down are update queries. Select queries shouldn't even notice. However, there's a LOT of wait state, and only blocks out, not really many in, so I'm guessing that you've got a fair bit of writing going on at the same time as your backups.
On Mon, 2008-05-05 at 09:10 -0700, Craig James wrote: > Campbell, Lance wrote: > > We currently backup all of our database tables per schema using pg_dump > > every half hour. We have been noticing that the database performance > > has been very poor during the backup process. How can I improve the > > performance? > > It sounds like the goal is to have frequent, near-real-time backups of > your databases for recovery purposes. Maybe instead of looking at > pg_dump's performance, a better solution would be a replication system > such as Slony, or a "warm backup" using Skype Tools. > > Backing up the database every half hour puts a large load on the > system during the dump, and means you are re-dumping the same data, 48 > times per day. If you use a replication solution, the backup process > is continuous (spread out through the day), and you're not re-dumping > static data; the only data that moves around is the new data. > > I've used Slony with mixed success; depending on the complexity and > size of your database, it can be quite effective. I've heard very > good reports about Skype Tools, which has both a Slony-like replicator > (not as configurable as Slony, but easier to set up and use), plus an > entirely separate set of scripts that simplifies "warm standby" using > WAL logging. I think we should mention Warm Standby via pg_standby, which is part of core software and documentation. Seems strange not to mention it at all. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
On Mon, May 5, 2008 at 11:19 AM, Campbell, Lance <lance@uiuc.edu> wrote: > Scott, > When I do a df -h I see that the database takes up a total of 8Gig of > disk space. This is not the size of the backup file of the database. Ok. Just wanted to make sure. Looking at vmstat, with little or no blocks read in, it would appear your database fits in memory. > I have attached the vmstats as a text file. Are there any properties I > could adjust that would help with writing data during a backup? Are > there any properties that might help with improving pg_dump performance? With as much writing as you have going on, it might help to crank up your setting for checkpoint segments to something like 100 or more. With as much disk space as you've got you can afford it. Other than that, you might wanna look at a faster / better RAID controller in the future. One with lots of battery backed cache set to write back. > Ideally I want to keep the backups as simple a process as possible > before we have to go to the next level of backup. My next strategy is > to put some of the tables in tablespaces on a couple different disks. I > will also backup to a dedicated set of disks. If there are a few > performance tweaks I can make I will. I was hoping to wait a few years > before we have to go to a more involved backup process. It may well be cheaper to go the route of a faster RAID controller with battery backed cache, since it's a one time investment, not an ongoing maintenance issue like running PITR or slony can become. Plus it helps overall performance quite a bit.
On Mon, 5 May 2008, Campbell, Lance wrote: > We currently backup all of our database tables per schema using pg_dump > every half hour. We have been noticing that the database performance > has been very poor during the backup process. How can I improve the > performance? Uh, don't do that? pg_dump works OK doing periodic backups during relatively calm periods, it's not really appropriate to run all the time like that. If you need a backup to stay that current, you might instead consider a replication solution that does a periodic full snapshot and then just moves incrementals around from there. WAL shipping is the most obvious candidate as it doesn't necessarily require an additional server and the main components are integrated into the core now. You could just save the files necessary to recover the database just about anywhere. Most other replication solutions would require having another server just to run that which is probably not what you want. > I have tried doing backups to a second set of disks but the performance > only improved somewhat. Then the real problem you're having is probably contention against the database you're dumping from rather than having enough write capacity on the output side. This isn't a surprise with pgdump as it's not exactly gentle on the server you're dumping from. To improve things here, you'd need to add the second set of disks as storage for some of the main database. > Would adding two quad core processors improve performance? Doubt it. pg_dump is basically taking up a processor and some percentage of disk resources when you're running it. If your server has all of the other 3 processors pegged at the same time, maybe adding more processors would help, but that seems pretty unlikely. A message from Scott alluded to you showing some vmstat output, but I didn't see that show up on the list. That would give more insight here. Also, if you still have checkpoint_segments at its default (3, you didn't mention adjusting it) that could be contributing to this general problem; that should be much higher on your hardware. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Mon, 5 May 2008, Scott Marlowe wrote: > Other than that, you might wanna look at a faster / better RAID > controller in the future. One with lots of battery backed cache set to > write back. Hopefully Lance's PERC 6/I SAS already has its battery installed. The 6/I with 256MB of cache is decent enough that I'm not sure getting a better controller would be a better investment than, say, adding more disks and splitting the database I/O usefully among them. Upgrading to PG 8.3 comes to mind as another option I'd consider before getting desparate enough to add/swap controller cards, which is always a scary thing on production servers. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Mon, May 5, 2008 at 2:03 PM, Greg Smith <gsmith@gregsmith.com> wrote: > On Mon, 5 May 2008, Campbell, Lance wrote: > > > > We currently backup all of our database tables per schema using pg_dump > > every half hour. We have been noticing that the database performance > > has been very poor during the backup process. How can I improve the > > performance? > > > > Uh, don't do that? pg_dump works OK doing periodic backups during > relatively calm periods, it's not really appropriate to run all the time > like that. Wow, my reading comprehension must be dropping. I totally missed the every half hour bit and read it as every day. If you're backing up a live database every hour, then yes, it's a very good idea to switch to a hot or cold standby method with PITR