Обсуждение: Hard link backup strategy

Поиск
Список
Период
Сортировка

Hard link backup strategy

От
"Kevin Grittner"
Дата:
We're considering a change to our backup strategy to reduce WAN
bandwidth and archival storage space.  I'm posting a description here
both because it may be of value to someone else, and because someone
might be able to poke a hole in it before we go ahead with
implementation.  All servers are Linux (SuSE Enterprise 10).

Besides the PITR backups, we have some additional safety nets, which
I'll describe up front.  These are not changing.

We have 72 counties which each have a database server holding the
official record for that county.  Our application framework passes
transaction images captured by our middle tier on each county database
server to a process which replicates to four central databases, each of
which holds a copy of all counties.  This process detects idle time on
the servers and uses it to run a sync process to confirm that the
central copies contain accurate replicas of the counties, reporting any
differences and correcting the central databases to match the counties.
It also saves the transaction images to two transaction repositories
where they are kept for a little over a year.

Our current PostgreSQL backup strategy is:

(1)  Our archive script copies WAL files to a directory on the database
server, using cp to a one directory followed by mv to another (to
prevent partial files from being processed).

(2)  We have a crontab job on each county database server to push WAL
files to a backup server in the county using rsync.

(3)  We have a crontab job on our central backup server to pull the WAL
files from the county database servers using rsync.

(4)  We do a PITR base backup once per week on each county database
server, using cpio piped to gzip and split, saving the backup on the
database server.

(5)  We have a crontab job on each county database server to push base
backup files to a backup server in the county using rsync.

(6)  We have a crontab job on our central backup server to pull the
base backup files from the county database servers using rsync.

(7)  We have one warm standby per county on our central backup server
which applies the WAL files as they arrive.  These are monitored to
ensure that they are running, in recovery mode, and are reasonably
current -- to detect problems with transfer of the WAL files.

(8)  We have a crontab job which notices the arrival of a new base
backup, and uses it for a new basis for the related warm standby,
staring that automatically in place of the prior one.

(9)  We have crontab scripts which detect when we have received a new
base backup from a county which has successfully come up in warm standby
mode and is the first base backup for that county for the month.  It
captures that base backup, and the WAL files required to start it (based
on the .backup file) to a mirrored SAN archive volume, for one year
retention.

This runs well with little human intervention, but we're having
problems with both the WAN network bandwidth and the space needed for
the monthly archives.  Our solution is to abandon compression, instead
using hard links and rsync.  The steps involving WAL files would not
change, but most things involving PITR base backups would do a recursive
hard link copy of a the previous backup, followed by an rsync from the
new backup image.  We should get our WAN traffic for base backups down
to a fraction of its current volume, and the hard links look to save
considerably more space than the gzip compression.

For (4) we will use 'cp -rl' to copy the previous backup to a new
directory, then rsync (with --delete
--include='/pg_xlog/archive_status/' --exclude='/pg_xlog/*') from the
production database to the hard link copy.

For (5) we'll do a hard link copy on the backup server before the rsync
of the directory tree for the base backup.

For (6) we will do a hard link recursive copy from the previous backup,
rsync from the warm standby, then rsync from the county backup.  We
expect to be bringing back a pretty minimal delta change set based on
the differences between the warm standby and the recently completed
backup of the county database, especially since these will normally be
running off-hours on a weekend.

For (8) we will do a full copy (no links) from the new backup image to
get the starting point for the new warm standby.

For (9) we will do a recursive hard link copy of the previous month's
base backup and rsync the new month's backup onto it.  We realize that
if one of the linked files is damaged, it can affect some or all
archival backups for that county.  That has been deemed an acceptable
risk.

Comments?

-Kevin



Re: Hard link backup strategy

От
Michael Monnerie
Дата:
On Donnerstag 26 März 2009 Kevin Grittner wrote:
> (1)  Our archive script copies WAL files to a directory on the
> database server, using cp to a one directory followed by mv to
> another (to prevent partial files from being processed).

What "partial files" do you speak about? I'd like to know, as we soon
start doing WAL copies.

> rsync

Incredible. You mean you make the hard link both on client and server,
and afterwards transfer with rsync? That only works for postgresql
<8.3, right? As I understand, 8.3 will reuse the space from updated
tuples and therefore lots of changes in-between will be done.

Or do you use "cluster" to sort the data before copying? But then if one
older record is deleted, all further data will have to be transfered
anyway.

I'm just wondering where the savings come from.

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4


Вложения

Re: Hard link backup strategy

От
"Kevin Grittner"
Дата:
Michael Monnerie <michael.monnerie@is.it-management.at> wrote:
> On Donnerstag 26 März 2009 Kevin Grittner wrote:
>> (1)  Our archive script copies WAL files to a directory on the
>> database server, using cp to a one directory followed by mv to
>> another (to prevent partial files from being processed).
>
> What "partial files" do you speak about? I'd like to know, as we
> soon start doing WAL copies.

I don't want anything to attempt to process a WAL file while it is in
the process of being copied.  By copying to a separate directory on
the same mount point and moving it, once complete, to the location
where other software is looking for WAL files, we avoid that problem.

> You mean you make the hard link both on client and server, and
> afterwards transfer with rsync? That only works for postgresql
> <8.3, right? As I understand, 8.3 will reuse the space from updated
> tuples and therefore lots of changes in-between will be done.

PostgreSQL version has nothing to do with it.  PostgreSQL segments a
table into 1GB files.  Our largest tables are are either "insert only"
or rarely have updates or deletes, so many of these table segment
files are unchanged from one base backup to the next.  What we're
trying to accomplish is to use Linux hard links to create directory
entries from multiple base backups which point to the same file body.
Do a web search for "linux hard links" if this doesn't make sense to
you.

The benefit of rsync is that, when you use a daemon as we do, it won't
send unchanged portions of a file over the wire.  In preliminary tests
our largest county only would only send about 10% of the full set of
data if we used rsync on a snapshot of the original data directory,
without using cpio or gzip first.  I'm hoping that if we use the warm
standby image as the base, rather than the previous week's backup, it
will be a fraction of that.

Basically, the hard links will be used to conserve disk space; rsync
will be used to conserve network bandwidth.

-Kevin

Re: Hard link backup strategy

От
Michael Monnerie
Дата:
On Montag 30 März 2009 Kevin Grittner wrote:
> I don't want anything to attempt to process a WAL file while it is in
> the process of being copied.  By copying to a separate directory on
> the same mount point and moving it, once complete, to the location
> where other software is looking for WAL files, we avoid that problem.

Hm. So process A copies the WAL to dir1, then moves dir1/WAL to
somewhere else. But if process B processes WAL, it still does after the
cp. It is no problem if B reads, A can copy without a problem, even
without the cp before mv. And if B writes to WAL, you have other
problems. And if A deletes WAL, B can still process it as long as it
keeps the file open. So at least for Linux/Unix systems, I don't
understand the benefits of cp before mv.

> PostgreSQL version has nothing to do with it.  PostgreSQL segments a
> table into 1GB files.  Our largest tables are are either "insert
> only" or rarely have updates or deletes

OK, that's the reason. Our db's have lots of updates/deletes everywhere,
that's why I didn't think about it.

> Basically, the hard links will be used to conserve disk space; rsync
> will be used to conserve network bandwidth.

Yes, that can make sense if the 1GB parts don't change a lot. Good idea
for your scenario :-)

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4


Вложения

Re: Hard link backup strategy

От
"Kevin Grittner"
Дата:
Michael Monnerie <michael.monnerie@is.it-management.at> wrote:
> It is no problem if B reads, A can copy without a problem, even
> without the cp before mv.

B could open the file while it is being copied, and read to the
current end of file before A has finished copying.  If we used rsync
it would, by default, copy to a temporary file name in the target
directory and mv to the right name when complete, but we prefer the cp
to a temp directory and explicitly mv, because of the subsequent rsync
steps copyig out of the target directory.

-Kevin