Обсуждение: Backup is too slow

Поиск
Список
Период
Сортировка

Backup is too slow

От
"John Jensen"
Дата:
Hi all,
I'm a bit unhappy with the time it takes to do backup of my PG7.4.6
base.
I have 13GB under the pg/data dir and it takes 30 minutes to do the
backup.

Using top and iostat I've figured out that the backup job is cpu bound
in the postmaster process. It eats up 95% cpu while the disk is at 10%
load. In fact I'm able to compress the backup file (using gzip) faster
(35 % cpu load) than the backend can deliver it.

The operating requirements is 24/7 so I can't just take the base
offline and do a file copy. I can do backup that way in 5-6 minutes
BTW.

Would it speed up the process if I did a binary backup instead ?
Are there any other fun tricks to speed up things ?

I run on a four way Linux box and it's not in production yet so there
is no cpu shortage.


The backup script is:

#! /bin/sh
if test $# -lt 2; then
  echo "Usage: dbbackup <basename> <filename>"
else
  /home/postgres/postgresql/bin/pg_dump -h <hostname> $1 | gzip -f - |
split --bytes 500m - $2.
fi


And the restore script:

#! /bin/sh
if test $# -lt 2; then
  echo "Usage: dbrestore <basename> <filename>"
else
  cat $2.* | gzip -d -f - | /home/postgres/postgresql/bin/psql -h
<hostname> -f - $1
fi


Cheers,

John

Re: Backup is too slow

От
"Spiegelberg, Greg"
Дата:
CPU may be thottled because it's performing the backup, gzip and split
all at once.  May I suggest this.

  /home/postgres/postgresql/bin/pg_dump -h <hostname> --compress=9 -f
dumpfile.gz $1
  split --bytes 500m dumpfile.gz dumpfile.gz.

If that takes too long or clobbers the system...

  /home/postgres/postgresql/bin/pg_dump -h <hostname> -f dumpfile $1
  gzip -9  dumpfile.gz
  split --bytes 500m dumpfile.gz dumpfile.gz.

Another variation may be the same as above except scp/rcp/ftp the
uncompressed dump to another idle server that performs the compress
and split for you.

One last way is to take a filesystem snapshot if your filesystem
permits it.  Since postgres stops/starts so nicely, we offline ours
when it's idle and just long enough to execute the filesystem snapshot
then bring it back online immediately after.  I suppose you could, in
theory, wait till idle and request a lock on all necessary tables,
perform a checkpoint, filesystem snapshot, then release the locks.
I'm sure Tom, Josh or someone more in the know would have imput for
this option.

Greg



-----Original Message-----
From: John Jensen [mailto:JRJ@ft.fo]
Sent: Tuesday, December 07, 2004 6:48 AM
To: pgsql-admin@postgresql.org
Subject: [ADMIN] Backup is too slow


Hi all,
I'm a bit unhappy with the time it takes to do backup of my PG7.4.6
base.
I have 13GB under the pg/data dir and it takes 30 minutes to do the
backup.

Using top and iostat I've figured out that the backup job is cpu bound
in the postmaster process. It eats up 95% cpu while the disk is at 10%
load. In fact I'm able to compress the backup file (using gzip) faster
(35 % cpu load) than the backend can deliver it.

The operating requirements is 24/7 so I can't just take the base
offline and do a file copy. I can do backup that way in 5-6 minutes
BTW.

Would it speed up the process if I did a binary backup instead ?
Are there any other fun tricks to speed up things ?

I run on a four way Linux box and it's not in production yet so there
is no cpu shortage.


The backup script is:

#! /bin/sh
if test $# -lt 2; then
  echo "Usage: dbbackup <basename> <filename>"
else
  /home/postgres/postgresql/bin/pg_dump -h <hostname> $1 | gzip -f - |
split --bytes 500m - $2.
fi


And the restore script:

#! /bin/sh
if test $# -lt 2; then
  echo "Usage: dbrestore <basename> <filename>"
else
  cat $2.* | gzip -d -f - | /home/postgres/postgresql/bin/psql -h
<hostname> -f - $1
fi


Cheers,

John

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org

Re: Backup is too slow

От
"John Jensen"
Дата:
Hi Greg & others.
I run this on a 4 cpu smp box (Dell PE6650+EMC AX100) so I already
offload pg_dump, gzip and split to other cpu's. Top confirms this:
postmaster = 95% cpu ie. it uses one cpu completely. Unless I can get
postmaster to do less work (that's what I'm looking for) or run multiple
threads (not likely) that's about the best I can get.

The job is clearly cpu bound in the postmaster process.

I'm a bit reluctant to go into the snapshot option You outline. It
looks a bit tricky but if no other options are on hand then I'll have to
bite the bullet.

/John
>>> "Spiegelberg, Greg" <gspiegelberg@cranel.com> 07-12-2004 14:33:38
>>>
CPU may be thottled because it's performing the backup, gzip and split
all at once.

< stuff deleted>

One last way is to take a filesystem snapshot if your filesystem
permits it.  Since postgres stops/starts so nicely, we offline ours
when it's idle and just long enough to execute the filesystem snapshot
then bring it back online immediately after.  I suppose you could, in
theory, wait till idle and request a lock on all necessary tables,
perform a checkpoint, filesystem snapshot, then release the locks.
I'm sure Tom, Josh or someone more in the know would have imput for
this option.

Greg

-----Original Message-----
From: John Jensen [mailto:JRJ@ft.fo]
Sent: Tuesday, December 07, 2004 6:48 AM
To: pgsql-admin@postgresql.org
Subject: [ADMIN] Backup is too slow


Hi all,
I'm a bit unhappy with the time it takes to do backup of my PG7.4.6
base.
I have 13GB under the pg/data dir and it takes 30 minutes to do the
backup.

Using top and iostat I've figured out that the backup job is cpu bound
in the postmaster process. It eats up 95% cpu while the disk is at 10%
load. In fact I'm able to compress the backup file (using gzip) faster
(35 % cpu load) than the backend can deliver it.

The operating requirements is 24/7 so I can't just take the base
offline and do a file copy. I can do backup that way in 5-6 minutes
BTW.

Would it speed up the process if I did a binary backup instead ?
Are there any other fun tricks to speed up things ?

I run on a four way Linux box and it's not in production yet so there
is no cpu shortage.


The backup script is:

#! /bin/sh
if test $# -lt 2; then
  echo "Usage: dbbackup <basename> <filename>"
else
  /home/postgres/postgresql/bin/pg_dump -h <hostname> $1 | gzip -f - |
split --bytes 500m - $2.
fi


And the restore script:

#! /bin/sh
if test $# -lt 2; then
  echo "Usage: dbrestore <basename> <filename>"
else
  cat $2.* | gzip -d -f - | /home/postgres/postgresql/bin/psql -h
<hostname> -f - $1
fi


Cheers,

John

---------------------------(end of
broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org


Re: Backup is too slow

От
William Yu
Дата:
John Jensen wrote:
> Hi Greg & others.
> I run this on a 4 cpu smp box (Dell PE6650+EMC AX100) so I already
> offload pg_dump, gzip and split to other cpu's. Top confirms this:
> postmaster = 95% cpu ie. it uses one cpu completely. Unless I can get
> postmaster to do less work (that's what I'm looking for) or run multiple
> threads (not likely) that's about the best I can get.
>
> The job is clearly cpu bound in the postmaster process.

Hmmm, when I upgraded my Opteron box to 64-bit linux, my dump->gzip ran
twice as fast which told me the gzip was a bit part of the CPU usage.
Dunno what else you can do to make it run faster. My backups -- even on
64-bit -- still take 20 minutes on a 30GB DB.

>
> I'm a bit reluctant to go into the snapshot option You outline. It
> looks a bit tricky but if no other options are on hand then I'll have to
> bite the bullet.

Snapshot is much easier if you use LVM. No need to do any postgres
trickery. Just freeze the volume at the kernel level.

Re: Backup is too slow

От
Tom Lane
Дата:
"John Jensen" <JRJ@ft.fo> writes:
> The job is clearly cpu bound in the postmaster process.

Which part of the dump process is CPU bound --- dumping schema, or data?
(Try enabling log_statement for the pg_dump run and correlating the
appearance of queries in the postmaster log with the CPU usage.)

If it's schema-bound, maybe you need to vacuum/analyze your system
catalogs a bit more aggressively.

If it's data-bound, I'm not sure what you can do other than switch to
datatypes that are cheaper to convert to text form.  It would be
interesting to find out where the problem is, though, in case there's
something we can fix for future releases.

            regards, tom lane