Обсуждение: Hard link backup strategy
We're considering a change to our backup strategy to reduce WAN bandwidth and archival storage space. I'm posting a description here both because it may be of value to someone else, and because someone might be able to poke a hole in it before we go ahead with implementation. All servers are Linux (SuSE Enterprise 10). Besides the PITR backups, we have some additional safety nets, which I'll describe up front. These are not changing. We have 72 counties which each have a database server holding the official record for that county. Our application framework passes transaction images captured by our middle tier on each county database server to a process which replicates to four central databases, each of which holds a copy of all counties. This process detects idle time on the servers and uses it to run a sync process to confirm that the central copies contain accurate replicas of the counties, reporting any differences and correcting the central databases to match the counties. It also saves the transaction images to two transaction repositories where they are kept for a little over a year. Our current PostgreSQL backup strategy is: (1) Our archive script copies WAL files to a directory on the database server, using cp to a one directory followed by mv to another (to prevent partial files from being processed). (2) We have a crontab job on each county database server to push WAL files to a backup server in the county using rsync. (3) We have a crontab job on our central backup server to pull the WAL files from the county database servers using rsync. (4) We do a PITR base backup once per week on each county database server, using cpio piped to gzip and split, saving the backup on the database server. (5) We have a crontab job on each county database server to push base backup files to a backup server in the county using rsync. (6) We have a crontab job on our central backup server to pull the base backup files from the county database servers using rsync. (7) We have one warm standby per county on our central backup server which applies the WAL files as they arrive. These are monitored to ensure that they are running, in recovery mode, and are reasonably current -- to detect problems with transfer of the WAL files. (8) We have a crontab job which notices the arrival of a new base backup, and uses it for a new basis for the related warm standby, staring that automatically in place of the prior one. (9) We have crontab scripts which detect when we have received a new base backup from a county which has successfully come up in warm standby mode and is the first base backup for that county for the month. It captures that base backup, and the WAL files required to start it (based on the .backup file) to a mirrored SAN archive volume, for one year retention. This runs well with little human intervention, but we're having problems with both the WAN network bandwidth and the space needed for the monthly archives. Our solution is to abandon compression, instead using hard links and rsync. The steps involving WAL files would not change, but most things involving PITR base backups would do a recursive hard link copy of a the previous backup, followed by an rsync from the new backup image. We should get our WAN traffic for base backups down to a fraction of its current volume, and the hard links look to save considerably more space than the gzip compression. For (4) we will use 'cp -rl' to copy the previous backup to a new directory, then rsync (with --delete --include='/pg_xlog/archive_status/' --exclude='/pg_xlog/*') from the production database to the hard link copy. For (5) we'll do a hard link copy on the backup server before the rsync of the directory tree for the base backup. For (6) we will do a hard link recursive copy from the previous backup, rsync from the warm standby, then rsync from the county backup. We expect to be bringing back a pretty minimal delta change set based on the differences between the warm standby and the recently completed backup of the county database, especially since these will normally be running off-hours on a weekend. For (8) we will do a full copy (no links) from the new backup image to get the starting point for the new warm standby. For (9) we will do a recursive hard link copy of the previous month's base backup and rsync the new month's backup onto it. We realize that if one of the linked files is damaged, it can affect some or all archival backups for that county. That has been deemed an acceptable risk. Comments? -Kevin
On Donnerstag 26 März 2009 Kevin Grittner wrote: > (1) Our archive script copies WAL files to a directory on the > database server, using cp to a one directory followed by mv to > another (to prevent partial files from being processed). What "partial files" do you speak about? I'd like to know, as we soon start doing WAL copies. > rsync Incredible. You mean you make the hard link both on client and server, and afterwards transfer with rsync? That only works for postgresql <8.3, right? As I understand, 8.3 will reuse the space from updated tuples and therefore lots of changes in-between will be done. Or do you use "cluster" to sort the data before copying? But then if one older record is deleted, all further data will have to be transfered anyway. I'm just wondering where the savings come from. mfg zmi -- // Michael Monnerie, Ing.BSc ----- http://it-management.at // Tel: 0660 / 415 65 31 .network.your.ideas. // PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import" // Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4 // Keyserver: wwwkeys.eu.pgp.net Key-ID: 1C1209B4
Вложения
Michael Monnerie <michael.monnerie@is.it-management.at> wrote: > On Donnerstag 26 März 2009 Kevin Grittner wrote: >> (1) Our archive script copies WAL files to a directory on the >> database server, using cp to a one directory followed by mv to >> another (to prevent partial files from being processed). > > What "partial files" do you speak about? I'd like to know, as we > soon start doing WAL copies. I don't want anything to attempt to process a WAL file while it is in the process of being copied. By copying to a separate directory on the same mount point and moving it, once complete, to the location where other software is looking for WAL files, we avoid that problem. > You mean you make the hard link both on client and server, and > afterwards transfer with rsync? That only works for postgresql > <8.3, right? As I understand, 8.3 will reuse the space from updated > tuples and therefore lots of changes in-between will be done. PostgreSQL version has nothing to do with it. PostgreSQL segments a table into 1GB files. Our largest tables are are either "insert only" or rarely have updates or deletes, so many of these table segment files are unchanged from one base backup to the next. What we're trying to accomplish is to use Linux hard links to create directory entries from multiple base backups which point to the same file body. Do a web search for "linux hard links" if this doesn't make sense to you. The benefit of rsync is that, when you use a daemon as we do, it won't send unchanged portions of a file over the wire. In preliminary tests our largest county only would only send about 10% of the full set of data if we used rsync on a snapshot of the original data directory, without using cpio or gzip first. I'm hoping that if we use the warm standby image as the base, rather than the previous week's backup, it will be a fraction of that. Basically, the hard links will be used to conserve disk space; rsync will be used to conserve network bandwidth. -Kevin
On Montag 30 März 2009 Kevin Grittner wrote: > I don't want anything to attempt to process a WAL file while it is in > the process of being copied. By copying to a separate directory on > the same mount point and moving it, once complete, to the location > where other software is looking for WAL files, we avoid that problem. Hm. So process A copies the WAL to dir1, then moves dir1/WAL to somewhere else. But if process B processes WAL, it still does after the cp. It is no problem if B reads, A can copy without a problem, even without the cp before mv. And if B writes to WAL, you have other problems. And if A deletes WAL, B can still process it as long as it keeps the file open. So at least for Linux/Unix systems, I don't understand the benefits of cp before mv. > PostgreSQL version has nothing to do with it. PostgreSQL segments a > table into 1GB files. Our largest tables are are either "insert > only" or rarely have updates or deletes OK, that's the reason. Our db's have lots of updates/deletes everywhere, that's why I didn't think about it. > Basically, the hard links will be used to conserve disk space; rsync > will be used to conserve network bandwidth. Yes, that can make sense if the 1GB parts don't change a lot. Good idea for your scenario :-) mfg zmi -- // Michael Monnerie, Ing.BSc ----- http://it-management.at // Tel: 0660 / 415 65 31 .network.your.ideas. // PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import" // Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4 // Keyserver: wwwkeys.eu.pgp.net Key-ID: 1C1209B4
Вложения
Michael Monnerie <michael.monnerie@is.it-management.at> wrote: > It is no problem if B reads, A can copy without a problem, even > without the cp before mv. B could open the file while it is being copied, and read to the current end of file before A has finished copying. If we used rsync it would, by default, copy to a temporary file name in the target directory and mv to the right name when complete, but we prefer the cp to a temp directory and explicitly mv, because of the subsequent rsync steps copyig out of the target directory. -Kevin