Обсуждение: Dump size bigger than pgdata size?

Поиск
Список
Период
Сортировка

Dump size bigger than pgdata size?

От
"Nicola Mauri"
Дата:

[sorry if this was previously asked: list searches seem to be down]

I'm using pg_dump to take a full backup of my database using a compressed format:
     $  pg_dump  -Fc  my_db > /backup/my_db.dmp

It produces a 6 GB file whereas the pgdata uses only 5 GB of disk space:
     $ ls -l /backup
     -rw-r--r--     6592715242   my_db.dmp
     $ du -b /data
     5372269196   /data

How could it be?
As far as I know, dumps should be smaller than filesystem datafile since they do not store indexes, etc.

Database contains about one-hundred-thousands binary images, some of which may be already compressed. So i tried the --compress=0 option but this produces a dump that does not fit on my disk (more than 11 GB).
I'm using postgres 8.1.2 on RHEL4.

So, what can I do to diagnose the problem?
Thanks in advance,

Nicola





Re: Dump size bigger than pgdata size?

От
"Aaron Bono"
Дата:
I would dare guess, and it seems you suspect as well, that the binary data is why you are not getting very good compression.

You may try dumping the tables individually with
--table= table
to see which tables are taking the most space in your dump.  Once you find out which tables are taking the most space, you can check to see what is in those tables and provide more details on the problem.

Personally I don't use the built in compression in pg_dump but pipe it to gzip instead (not sure if it makes any difference).  See http://manual.intl.indoglobal.com/ch06s07.html for details.

-Aaron

On 6/21/06, Nicola Mauri <Nicola.Mauri@saga.it> wrote:

[sorry if this was previously asked: list searches seem to be down]

I'm using pg_dump to take a full backup of my database using a compressed format:
     $  pg_dump  -Fc  my_db > /backup/my_db.dmp

It produces a 6 GB file whereas the pgdata uses only 5 GB of disk space:
     $ ls -l /backup
     -rw-r--r--     6592715242   my_db.dmp
     $ du -b /data
     5372269196   /data

How could it be?
As far as I know, dumps should be smaller than filesystem datafile since they do not store indexes, etc.

Database contains about one-hundred-thousands binary images, some of which may be already compressed. So i tried the --compress=0 option but this produces a dump that does not fit on my disk (more than 11 GB).
I'm using postgres 8.1.2 on RHEL4.

So, what can I do to diagnose the problem?
Thanks in advance,

Nicola



==================================================================
Aaron Bono
President                            Aranya Software Technologies, Inc.
http://www.aranya.com         We take care of your technology needs.
Phone: (816) 695-6071
==================================================================

Re: Dump size bigger than pgdata size?

От
Дата:

It might happen because of the type of data you have ( binary images). The compression for binary files is notorious horrible since there is a small chance of occurrence of  same chars

In other words it is possible since during compression there are additional chars added for checksums and redundancy

This would normally happen on small binary files, though.

 

-----Original Message-----
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Nicola Mauri
Sent:
Wednesday, June 21, 2006 10:30 AM
To: pgsql-admin@postgresql.org
Subject: [ADMIN] Dump size bigger than pgdata size?

 


[sorry if this was previously asked: list searches seem to be down]

I'm using pg_dump to take a full backup of my database using a compressed format:
     $  pg_dump  -Fc  my_db > /backup/my_db.dmp

It produces a 6 GB file whereas the pgdata uses only 5 GB of disk space:
     $ ls -l /backup
     -rw-r--r--     6592715242   my_db.dmp
     $ du -b /data
     5372269196   /data

How could it be?
As far as I know, dumps should be smaller than filesystem datafile since they do not store indexes, etc.

Database contains about one-hundred-thousands binary images, some of which may be already compressed. So i tried the --compress=0 option but this produces a dump that does not fit on my disk (more than 11 GB).
I'm using postgres 8.1.2 on RHEL4.

So, what can I do to diagnose the problem?
Thanks in advance,

Nicola




Re: Dump size bigger than pgdata size?

От
Tom Lane
Дата:
"Nicola Mauri" <Nicola.Mauri@saga.it> writes:
> I'm using pg_dump to take a full backup of my database using a compressed
> format:
>      $  pg_dump  -Fc  my_db > /backup/my_db.dmp

> It produces a 6 GB file whereas the pgdata uses only 5 GB of disk space:
> ...
> Database contains about one-hundred-thousands binary images, some of which
> may be already compressed.

Are those stored as bytea fields, or large objects, or what?

I can easily imagine bytea inflating to be larger in a dump than it is
on disk, since the text format for non-ASCII byte values looks like "\\nnn"
ie 5 bytes for only one byte on disk.  Assuming that ASCII bytes make up
about half of the data, the average expansion factor would be about 2.5x.
Compression of the dump text would buy back some of this bloat but
probably not all.

This could be avoided by using COPY BINARY format, but I don't see any
very nice way to do that in the context of pg_dump --- it needs to
intermix COPY data with SQL commands ...

            regards, tom lane

Re: Dump size bigger than pgdata size?

От
Jim Nasby
Дата:
On Jun 21, 2006, at 12:00 PM, Tom Lane wrote:
> This could be avoided by using COPY BINARY format, but I don't see any
> very nice way to do that in the context of pg_dump --- it needs to
> intermix COPY data with SQL commands ...

Would the tar or custom format allow for this? IIRC, at least tar
puts all the copied data into separate files...
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461



Re: Dump size bigger than pgdata size?

От
Tom Lane
Дата:
Jim Nasby <jnasby@pervasive.com> writes:
> On Jun 21, 2006, at 12:00 PM, Tom Lane wrote:
>> This could be avoided by using COPY BINARY format, but I don't see any
>> very nice way to do that in the context of pg_dump --- it needs to
>> intermix COPY data with SQL commands ...

> Would the tar or custom format allow for this? IIRC, at least tar
> puts all the copied data into separate files...

Well, you could sorta do that, but the case that would stop working is
pg_restore output to a plain text SQL script (and related issues such as
the ability to use the feature in the context of pg_dumpall).  Having
just gotten done fixing similar inconsistencies in pg_dump/pg_restore
for BLOBs, I'm not eager to re-introduce 'em for COPY BINARY ...

            regards, tom lane

Re: Dump size bigger than pgdata size?

От
Jim Nasby
Дата:
On Jun 22, 2006, at 9:39 PM, Tom Lane wrote:
> Jim Nasby <jnasby@pervasive.com> writes:
>> On Jun 21, 2006, at 12:00 PM, Tom Lane wrote:
>>> This could be avoided by using COPY BINARY format, but I don't
>>> see any
>>> very nice way to do that in the context of pg_dump --- it needs to
>>> intermix COPY data with SQL commands ...
>
>> Would the tar or custom format allow for this? IIRC, at least tar
>> puts all the copied data into separate files...
>
> Well, you could sorta do that, but the case that would stop working is
> pg_restore output to a plain text SQL script (and related issues
> such as
> the ability to use the feature in the context of pg_dumpall).  Having
> just gotten done fixing similar inconsistencies in pg_dump/pg_restore
> for BLOBs, I'm not eager to re-introduce 'em for COPY BINARY ...

Yeah, but how many people actually do that anyway? I can't really
come up with a use-case for it, and I'm pretty sure there's other
gains to be had by turning custom or tar format into more of a
'binary dump'. For one thing, that should ease the need to run the
newer version of pg_dump when upgrading (if we put the requisite
brains into pg_restore).

I suppose we could put support in pg_restore to convert between
BINARY and escaped as needed; or just disallow pg_restore from
dumping SQL if there's binary data (maybe have it include copy
statements that reference the specific files).
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461