Обсуждение: Backing up large databases

Поиск
Список
Период
Сортировка

Backing up large databases

От
Steve Burrows
Дата:
I am struggling to find an acceptable way of backing up a PostgreSQL 7.4 database.

The database is quite large, currently it occupies about 180GB, divided into two elements, a set of active tables and a set of archive tables which are only used for insertions.

I ran pg_dump -Fc recently, it took 23.5 hours to run, and output a single file of 126GB. Obviously as the database continues to grow it will soon be so large that it cannot be pg_dumped within a day. Running rsync to do a complete fresh copy of the pgsql file structure took 4 hours, but later that day running another iteration of rsync (which should have only copied changed files) took 3 hours, and I cannot afford to have the db down that long.

Anybody with any ideas? The database is being used as the backend for a mail server, so it has transactions 24 hours a day but is quieter at night. I want to be able to back it up or replicate it on a daily basis with minimum downtime so that the mail backlog doesn't get too large. Ideally I want the first generation of backup/replica going onto the same machine as the original because the volume of data is such that any attempt at network or tape backup of the live files will require too much downtime, once I've got a backup then I can copy that out to other NAS or tape at leisure.

If anyone has experience of safeguarding a similarly large PostgreSQL database with minimal downtime I'd be delighted to hear..  The machine is running 2 Xeons, 4GB ram and a half-terabyte RAID10 array on a DELL PERC scsi subsystem, with a load average of around 0.5 - 0.6, so it's not exactly overstretched.

Thanks,

Steve

Re: Backing up large databases

От
"Andy Shellam"
Дата:

Hi Steve,

 

Very interested to hear about your setup, as I have a similar setup (backend to a mail server/SPAM scanner) although on a much lighter load at the moment.

 

My database is only just touching a GB so nothing near the scale of yours!  I use a file-system level backup, and am currently testing a PITR continuous recovery onto a hot-standby server.

 

Tar-ing the database directory currently takes about a minute (at 1GB), so as you can estimate it’d be about 3 hours for yours.

 

My future plan for when my database grows larger, is with the use of WAL logging – have a base backup taken on a Sunday morning (our quietest time), ship this to the hot-standby once a week, and start it off in a recovery mode (using my rolling-WAL script I’m testing now.)  Then throughout the week, send the WAL logs from the live box as they become available down to the standby, which then get processed on arrival – these files are 16MB in size (I believe this can be changed).

 

The beauty of all this is it doesn’t require the database to be taken off-line, or slowed down.

 

This is coming from an 8.1 server, I believe it’d be okay for 7.4 but don’t quote me on it.

 

Regards

 

Andy

 


From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Steve Burrows
Sent: 28 April 2006 4:58 pm
To: pgsql-admin@postgresql.org
Subject: [ADMIN] Backing up large databases

 

I am struggling to find an acceptable way of backing up a PostgreSQL 7.4 database.

The database is quite large, currently it occupies about 180GB, divided into two elements, a set of active tables and a set of archive tables which are only used for insertions.

I ran pg_dump -Fc recently, it took 23.5 hours to run, and output a single file of 126GB. Obviously as the database continues to grow it will soon be so large that it cannot be pg_dumped within a day. Running rsync to do a complete fresh copy of the pgsql file structure took 4 hours, but later that day running another iteration of rsync (which should have only copied changed files) took 3 hours, and I cannot afford to have the db down that long.

Anybody with any ideas? The database is being used as the backend for a mail server, so it has transactions 24 hours a day but is quieter at night. I want to be able to back it up or replicate it on a daily basis with minimum downtime so that the mail backlog doesn't get too large. Ideally I want the first generation of backup/replica going onto the same machine as the original because the volume of data is such that any attempt at network or tape backup of the live files will require too much downtime, once I've got a backup then I can copy that out to other NAS or tape at leisure.

If anyone has experience of safeguarding a similarly large PostgreSQL database with minimal downtime I'd be delighted to hear..  The machine is running 2 Xeons, 4GB ram and a half-terabyte RAID10 array on a DELL PERC scsi subsystem, with a load average of around 0.5 - 0.6, so it's not exactly overstretched.

Thanks,

Steve
!DSPAM:14,4452344633691957362147!

Re: Backing up large databases

От
Robin Iddon
Дата:
Hi Steve,

If you can afford to move forwards to 8.x then you can benefit from PITR
(Point In Time Recovery).

See http://www.postgresql.org/docs/8.0/interactive/backup-online.html

You can do this without making a PITR live replica (as I understand it,
you're happy to trust your hardware so you don't need a hot standby
machine ready to takeover when you have hardware failure).

This mechanism allows the base snapshot(s) to be taken while the
database is online.  In fact, apart from recovery itself, there is never
a need to take the database offline.

So you would then need to (a) copy to tape/remote disk/... your periodic
base backup and (b) copy to tape/remote disk/... each archived WAL file.

Overall you will be writing much more data this way than running pg_dump
now and then, but provided you can afford the storage you can recover to
any point in time covered by your archived WAL files.

Sooner or later you are going to hit the magic 250GB point - at this
point your 0.5TB array isn't big enough to store the live data and a
base backup.  At this point you will need to look at making the base
backup happen across the network (or add more disks).  I don't think it
matters how long it takes to make the base backup, provided you are
always copying the WAL files too.

Hope this helps,

Robin

Steve Burrows wrote:

> I am struggling to find an acceptable way of backing up a PostgreSQL
> 7.4 database.
>
> The database is quite large, currently it occupies about 180GB,
> divided into two elements, a set of active tables and a set of archive
> tables which are only used for insertions.
>
> I ran pg_dump -Fc recently, it took 23.5 hours to run, and output a
> single file of 126GB. Obviously as the database continues to grow it
> will soon be so large that it cannot be pg_dumped within a day.
> Running rsync to do a complete fresh copy of the pgsql file structure
> took 4 hours, but later that day running another iteration of rsync
> (which should have only copied changed files) took 3 hours, and I
> cannot afford to have the db down that long.
>
> Anybody with any ideas? The database is being used as the backend for
> a mail server, so it has transactions 24 hours a day but is quieter at
> night. I want to be able to back it up or replicate it on a daily
> basis with minimum downtime so that the mail backlog doesn't get too
> large. Ideally I want the first generation of backup/replica going
> onto the same machine as the original because the volume of data is
> such that any attempt at network or tape backup of the live files will
> require too much downtime, once I've got a backup then I can copy that
> out to other NAS or tape at leisure.
>
> If anyone has experience of safeguarding a similarly large PostgreSQL
> database with minimal downtime I'd be delighted to hear..  The machine
> is running 2 Xeons, 4GB ram and a half-terabyte RAID10 array on a DELL
> PERC scsi subsystem, with a load average of around 0.5 - 0.6, so it's
> not exactly overstretched.
>
> Thanks,
>
> Steve



Re: Backing up large databases

От
Rafael Martinez
Дата:
On Fri, 2006-04-28 at 15:57 +0000, Steve Burrows wrote:

>
> Anybody with any ideas? The database is being used as the backend for
> a mail server, so it has transactions 24 hours a day but is quieter at
> night. I want to be able to back it up or replicate it on a daily
> basis with minimum downtime so that the mail backlog doesn't get too
> large. Ideally I want the first generation of backup/replica going
> onto the same machine as the original because the volume of data is
> such that any attempt at network or tape backup of the live files will
> require too much downtime, once I've got a backup then I can copy that
> out to other NAS or tape at leisure.
>

What about LVM2 snapshot functionality?.

We use snapshot to get a replica of the data partition for PITR purposes
(I have not tested this outside the PITR system).

To be 100% sure that the snapshot of the database is 100% consistent you
can:
1) stop the database
2) take the snapshot
3) start the database
4) mount the snapshot block device created
5) move the data in the snapshot block device (consistent copy of your
database) to another location outside the server without having downtime
in your database.
6) umount snapshot block device
7) remove snapshot block device (so you can recycle the space used)

A snapshot of a 25GB logical volume (with a 15GB database) takes 0.4sec
in our system (RHEL AS4/amd64linux), I don't think it will take much
longer with a 180GB LV, we are talking about a few seconds downtime.

Just an idea :)

More information about LVM:
http://www.tldp.org/HOWTO/LVM-HOWTO/
http://www.tldp.org/HOWTO/LVM-HOWTO/snapshotintro.html
http://www.tldp.org/HOWTO/LVM-HOWTO/snapshots_backup.html

with regards
--
Rafael Martinez, <r.m.guerrero@usit.uio.no>
Center for Information Technology Services
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/


Re: Backing up large databases

От
Дата:

You could archive the data as you perform the dump

pg_dump <your options>  | bzip2  > filename.bz2

 

bzip2 offers an excellent compression and you could end up with an archive with less than 10GB

 

Later you can reload that into another db on a different server

 

-----Original Message-----
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Steve Burrows
Sent:
Friday, April 28, 2006 11:58 AM
To: pgsql-admin@postgresql.org
Subject: [ADMIN] Backing up large databases

 

I am struggling to find an acceptable way of backing up a PostgreSQL 7.4 database.

The database is quite large, currently it occupies about 180GB, divided into two elements, a set of active tables and a set of archive tables which are only used for insertions.

I ran pg_dump -Fc recently, it took 23.5 hours to run, and output a single file of 126GB. Obviously as the database continues to grow it will soon be so large that it cannot be pg_dumped within a day. Running rsync to do a complete fresh copy of the pgsql file structure took 4 hours, but later that day running another iteration of rsync (which should have only copied changed files) took 3 hours, and I cannot afford to have the db down that long.

Anybody with any ideas? The database is being used as the backend for a mail server, so it has transactions 24 hours a day but is quieter at night. I want to be able to back it up or replicate it on a daily basis with minimum downtime so that the mail backlog doesn't get too large. Ideally I want the first generation of backup/replica going onto the same machine as the original because the volume of data is such that any attempt at network or tape backup of the live files will require too much downtime, once I've got a backup then I can copy that out to other NAS or tape at leisure.

If anyone has experience of safeguarding a similarly large PostgreSQL database with minimal downtime I'd be delighted to hear..  The machine is running 2 Xeons, 4GB ram and a half-terabyte RAID10 array on a DELL PERC scsi subsystem, with a load average of around 0.5 - 0.6, so it's not exactly overstretched.

Thanks,

Steve

Re: Backing up large databases

От
"Uwe C. Schroeder"
Дата:
Why don't you set up a second server with identical database and use Slony to
replicate the primary one there. You can then do whatever you want on the
mirror - dump it on a tape, etc.

On Friday 28 April 2006 08:57, Steve Burrows wrote:
> I am struggling to find an acceptable way of backing up a PostgreSQL 7.4
> database.
>
> The database is quite large, currently it occupies about 180GB, divided
> into two elements, a set of active tables and a set of archive tables
> which are only used for insertions.
>
> I ran pg_dump -Fc recently, it took 23.5 hours to run, and output a
> single file of 126GB. Obviously as the database continues to grow it
> will soon be so large that it cannot be pg_dumped within a day. Running
> rsync to do a complete fresh copy of the pgsql file structure took 4
> hours, but later that day running another iteration of rsync (which
> should have only copied changed files) took 3 hours, and I cannot afford
> to have the db down that long.
>
> Anybody with any ideas? The database is being used as the backend for a
> mail server, so it has transactions 24 hours a day but is quieter at
> night. I want to be able to back it up or replicate it on a daily basis
> with minimum downtime so that the mail backlog doesn't get too large.
> Ideally I want the first generation of backup/replica going onto the
> same machine as the original because the volume of data is such that any
> attempt at network or tape backup of the live files will require too
> much downtime, once I've got a backup then I can copy that out to other
> NAS or tape at leisure.
>
> If anyone has experience of safeguarding a similarly large PostgreSQL
> database with minimal downtime I'd be delighted to hear..  The machine
> is running 2 Xeons, 4GB ram and a half-terabyte RAID10 array on a DELL
> PERC scsi subsystem, with a load average of around 0.5 - 0.6, so it's
> not exactly overstretched.
>
> Thanks,
>
> Steve

--
    UC

--
Open Source Solutions 4U, LLC    1618 Kelly St
Phone:  +1 707 568 3056        Santa Rosa, CA 95401
Cell:   +1 650 302 2405        United States
Fax:    +1 707 568 6416

Re: Backing up large databases

От
Jim Nasby
Дата:
On Apr 28, 2006, at 10:34 AM, Andy Shellam wrote:
> My future plan for when my database grows larger, is with the use
> of WAL logging – have a base backup taken on a Sunday morning (our
> quietest time), ship this to the hot-standby once a week, and start
> it off in a recovery mode (using my rolling-WAL script I’m testing
> now.)  Then throughout the week, send the WAL logs from the live
> box as they become available down to the standby, which then get
> processed on arrival – these files are 16MB in size (I believe this
> can be changed).
Theoretically the size of WAL files can be changed, but in practice
no one ever does this. In other words, it's probably not a great idea.
> This is coming from an 8.1 server, I believe it’d be okay for 7.4
> but don’t quote me on it.
PITR was introduced in 8.0. There's a number of other reasons to
upgrade as well. In the meantime, Slony might be a viable alternative.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461



Re: Backing up large databases

От
Simon Riggs
Дата:
On Fri, 2006-04-28 at 15:57 +0000, Steve Burrows wrote:
> I am struggling to find an acceptable way of backing up a PostgreSQL
> 7.4 database.
>
> The database is quite large, currently it occupies about 180GB,
> divided into two elements, a set of active tables and a set of archive
> tables which are only used for insertions.
>
> I ran pg_dump -Fc recently, it took 23.5 hours to run, and output a
> single file of 126GB. Obviously as the database continues to grow it
> will soon be so large that it cannot be pg_dumped within a day.
> Running rsync to do a complete fresh copy of the pgsql file structure
> took 4 hours, but later that day running another iteration of rsync
> (which should have only copied changed files) took 3 hours, and I
> cannot afford to have the db down that long.
>
> Anybody with any ideas?

You need not backup the whole database in one go.

You can copy only changed data out of the tables using your knowledge of
the mail store. That way you'll not need to have such a long running
backup and it won't be so large either. You can then reassemble the
pieces later into a new table in case of recovery.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com


Re: Backing up large databases

От
Naomi Walker
Дата:
Simon Riggs wrote:
> On Fri, 2006-04-28 at 15:57 +0000, Steve Burrows wrote:
>> I am struggling to find an acceptable way of backing up a PostgreSQL
>> 7.4 database.

Depending on your OS and hardware setup, you may have other options,
such as Veritas snapshotting.  It is expensive, but saves time.
>>
>> The database is quite large, currently it occupies about 180GB,
>> divided into two elements, a set of active tables and a set of archive
>> tables which are only used for insertions.
>>
>> I ran pg_dump -Fc recently, it took 23.5 hours to run, and output a
>> single file of 126GB. Obviously as the database continues to grow it
>> will soon be so large that it cannot be pg_dumped within a day.
>> Running rsync to do a complete fresh copy of the pgsql file structure
>> took 4 hours, but later that day running another iteration of rsync
>> (which should have only copied changed files) took 3 hours, and I
>> cannot afford to have the db down that long.
>>
>> Anybody with any ideas?
>
> You need not backup the whole database in one go.
>
> You can copy only changed data out of the tables using your knowledge of
> the mail store. That way you'll not need to have such a long running
> backup and it won't be so large either. You can then reassemble the
> pieces later into a new table in case of recovery.
>


--
----------------------------------------------------------------------------
Naomi Walker                          Chief Information Officer
Eldorado Computing, Inc.              nwalker@eldocomp.com 602-604-3100
----------------------------------------------------------------------------
What lies behind us and what lies before us are tiny matters compared to
what lies within us.     - William Morrow
----------------------------------------------------------------------------

-- CONFIDENTIALITY NOTICE --

Information transmitted by this e-mail is proprietary to MphasiS and/or its Customers and is intended for use only by
theindividual or entity to which it is addressed, and may contain information that is privileged, confidential or
exemptfrom disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been
forwardedto you without proper authority, you are notified that any use or dissemination of this information in any
manneris strictly prohibited. In such cases, please notify us immediately at mailmaster@mphasis.com and delete this
mailfrom your records. 

Re: Backing up large databases

От
Steve Burrows
Дата:
Just to say thanks to all for your constructive and helpful replies - I will build a new server with 8.1 and slony.

Steve


Steve Burrows wrote:
I am struggling to find an acceptable way of backing up a PostgreSQL 7.4 database.

The database is quite large, currently it occupies about 180GB, divided into two elements, a set of active tables and a set of archive tables which are only used for insertions.

I ran pg_dump -Fc recently, it took 23.5 hours to run, and output a single file of 126GB. Obviously as the database continues to grow it will soon be so large that it cannot be pg_dumped within a day. Running rsync to do a complete fresh copy of the pgsql file structure took 4 hours, but later that day running another iteration of rsync (which should have only copied changed files) took 3 hours, and I cannot afford to have the db down that long.

Anybody with any ideas? The database is being used as the backend for a mail server, so it has transactions 24 hours a day but is quieter at night. I want to be able to back it up or replicate it on a daily basis with minimum downtime so that the mail backlog doesn't get too large. Ideally I want the first generation of backup/replica going onto the same machine as the original because the volume of data is such that any attempt at network or tape backup of the live files will require too much downtime, once I've got a backup then I can copy that out to other NAS or tape at leisure.

If anyone has experience of safeguarding a similarly large PostgreSQL database with minimal downtime I'd be delighted to hear..  The machine is running 2 Xeons, 4GB ram and a half-terabyte RAID10 array on a DELL PERC scsi subsystem, with a load average of around 0.5 - 0.6, so it's not exactly overstretched.

Thanks,

Steve