Обсуждение: Space for pg_dump

Поиск
Список
Период
Сортировка

Space for pg_dump

От
SHARMILA JOTHIRAJAH
Дата:
Hi,
How much space does a pg_dump usually take?
One of my databases is 600GB.... How much  space do I need to dump this?
Thanks




Re: Space for pg_dump

От
Richard Huxton
Дата:
SHARMILA JOTHIRAJAH wrote:
> Hi,
> How much space does a pg_dump usually take?
> One of my databases is 600GB.... How much  space do I need to dump this?

That will depend on how many indexes etc. make up that 600GB. Also how
compressible your data is if you are using -Fc. Certainly less than a
"live" database, but by how much it's difficult to say.

--
  Richard Huxton
  Archonet Ltd

Re: Space for pg_dump

От
Scott Marlowe
Дата:
On Tue, Mar 31, 2009 at 7:57 AM, SHARMILA JOTHIRAJAH
<sharmi_jo@yahoo.com> wrote:
>
> Hi,
> How much space does a pg_dump usually take?
> One of my databases is 600GB.... How much  space do I need to dump this?

Note you can find out by doing:

pg_dump dbname | wc

Re: Space for pg_dump

От
SHARMILA JOTHIRAJAH
Дата:


--- On Tue, 3/31/09, Scott Marlowe <scott.marlowe@gmail.com> wrote:

> From: Scott Marlowe <scott.marlowe@gmail.com>
> Subject: Re: [GENERAL] Space for pg_dump
> To: "SHARMILA JOTHIRAJAH" <sharmi_jo@yahoo.com>
> Cc: "General postgres mailing list" <pgsql-general@postgresql.org>
> Date: Tuesday, March 31, 2009, 11:49 AM
> On Tue, Mar 31, 2009 at 7:57 AM,
> SHARMILA JOTHIRAJAH
> <sharmi_jo@yahoo.com>
> wrote:
> >
> > Hi,
> > How much space does a pg_dump usually take?
> > One of my databases is 600GB.... How much  space do I
> need to dump this?
>
> Note you can find out by doing:
>
> pg_dump dbname | wc
>
Yes...I could find the space used after creating the dump.

But I need to pre-allocate some space for storing these dumps (there are other databases too that needs to be dumped).
SoIm trying to find a space estimate ....  
Do you have a rough estimate of pg_dump in general... like 1/4 th of the database size or something like that...I just
needa rough estimate for now 

Thanks
Sharmila




Re: Space for pg_dump

От
Scott Marlowe
Дата:
On Tue, Mar 31, 2009 at 9:57 AM, SHARMILA JOTHIRAJAH
<sharmi_jo@yahoo.com> wrote:
> But I need to pre-allocate some space for storing these dumps (there are other databases too that needs to be
dumped).So Im trying to find a space estimate .... 
> Do you have a rough estimate of pg_dump in general... like 1/4 th of the database size or something like that...I
justneed a rough estimate for now 

Sadly, there is no exact maths for such things.  If your database has
tons of indexes and such, it might be 20 or 100 times bigger on disk
than it will be during backup.  If it's all compressible text with few
indexes, it might be a 1:1 or so size.  You can't really tell without
running pg_dump.  The advantage of doing pg_dump|wc -l is that the db
doesn't have to be stored somewhere.

Re: Space for pg_dump

От
Scott Marlowe
Дата:
On Tue, Mar 31, 2009 at 9:57 AM, SHARMILA JOTHIRAJAH
<sharmi_jo@yahoo.com> wrote:
> --- On Tue, 3/31/09, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>
>> From: Scott Marlowe <scott.marlowe@gmail.com>
>> Subject: Re: [GENERAL] Space for pg_dump
>> To: "SHARMILA JOTHIRAJAH" <sharmi_jo@yahoo.com>
>> Cc: "General postgres mailing list" <pgsql-general@postgresql.org>
>> Date: Tuesday, March 31, 2009, 11:49 AM
>> On Tue, Mar 31, 2009 at 7:57 AM,
>> SHARMILA JOTHIRAJAH
>> <sharmi_jo@yahoo.com>
>> wrote:
>> >
>> > Hi,
>> > How much space does a pg_dump usually take?
>> > One of my databases is 600GB.... How much  space do I
>> need to dump this?
>>
>> Note you can find out by doing:
>>
>> pg_dump dbname | wc
>>
> Yes...I could find the space used after creating the dump.
>
> But I need to pre-allocate some space for storing these dumps (there are other databases too that needs to be
dumped).So Im trying to find a space estimate .... 
> Do you have a rough estimate of pg_dump in general... like 1/4 th of the database size or something like that...I
justneed a rough estimate for now 

It's hard to say.  Why can't you RUN the example command on each db
and see for yourself?  It doesn't create a backup, per se, it just
creates one and hands it to wc to see how big it is.

Re: Space for pg_dump

От
SHARMILA JOTHIRAJAH
Дата:


--- On Tue, 3/31/09, Scott Marlowe <scott.marlowe@gmail.com> wrote:

> From: Scott Marlowe <scott.marlowe@gmail.com>
> Subject: Re: [GENERAL] Space for pg_dump
> To: "SHARMILA JOTHIRAJAH" <sharmi_jo@yahoo.com>
> Cc: "General postgres mailing list" <pgsql-general@postgresql.org>
> Date: Tuesday, March 31, 2009, 12:07 PM
> On Tue, Mar 31, 2009 at 9:57 AM,
> SHARMILA JOTHIRAJAH
> <sharmi_jo@yahoo.com>
> wrote:
> > But I need to pre-allocate some space for storing
> these dumps (there are other databases too that needs to be
> dumped). So Im trying to find a space estimate ....
> > Do you have a rough estimate of pg_dump in general...
> like 1/4 th of the database size or something like that...I
> just need a rough estimate for now
>
> Sadly, there is no exact maths for such things.  If
> your database has
> tons of indexes and such, it might be 20 or 100 times
> bigger on disk
> than it will be during backup.  If it's all
> compressible text with few
> indexes, it might be a 1:1 or so size.  You can't
> really tell without
> running pg_dump.  The advantage of doing pg_dump|wc -l
> is that the db
> doesn't have to be stored somewhere.
>
Thanks...I started pg_dump|wc -l  and its running now
Another question is that wc -l gives you the no of lines...right...
What is the size of each line...or how do you get the size from that?





Re: Space for pg_dump

От
Scott Marlowe
Дата:
On Tue, Mar 31, 2009 at 10:31 AM, SHARMILA JOTHIRAJAH
<sharmi_jo@yahoo.com> wrote:
>
>
>
> --- On Tue, 3/31/09, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>
>> From: Scott Marlowe <scott.marlowe@gmail.com>
>> wrote:
>> > But I need to pre-allocate some space for storing
>> these dumps (there are other databases too that needs to be
>> dumped). So Im trying to find a space estimate ....
>> > Do you have a rough estimate of pg_dump in general...
>> like 1/4 th of the database size or something like that...I
>> just need a rough estimate for now
>>
>> Sadly, there is no exact maths for such things.  If
>> your database has
>> tons of indexes and such, it might be 20 or 100 times
>> bigger on disk
>> than it will be during backup.  If it's all
>> compressible text with few
>> indexes, it might be a 1:1 or so size.  You can't
>> really tell without
>> running pg_dump.  The advantage of doing pg_dump|wc -l
>> is that the db
>> doesn't have to be stored somewhere.
>>
> Thanks...I started pg_dump|wc -l  and its running now
> Another question is that wc -l gives you the no of lines...right...
> What is the size of each line...or how do you get the size from that?

Whoops, pretty sure my first reply was just wc.  the -l is a habit
from using to count lines.  Do it again without the -l...  sorry.

Re: Space for pg_dump

От
hubert depesz lubaczewski
Дата:
On Tue, Mar 31, 2009 at 08:57:28AM -0700, SHARMILA JOTHIRAJAH wrote:
> > Note you can find out by doing:
> > pg_dump dbname | wc
> Yes...I could find the space used after creating the dump.
> But I need to pre-allocate some space for storing these dumps

I'm not sure if you realize that you don't need any space for the
command that Scott showed (pg_dump dbname | wc). it will not write
anything to disk.

Best regards,

depesz

--
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

Re: Space for pg_dump

От
Greg Smith
Дата:
On Tue, 31 Mar 2009, Scott Marlowe wrote:

> Sadly, there is no exact maths for such things.  If your database has
> tons of indexes and such, it might be 20 or 100 times bigger on disk
> than it will be during backup.  If it's all compressible text with few
> indexes, it might be a 1:1 or so size.

Since running an entire pgdump can take forever on a big database, what I
usually do here is start by running the disk usage query at
http://wiki.postgresql.org/wiki/Disk_Usage

That lets you better see index vs. table usage.  Then, for the bigger
tables, I do something like this:

psql -c "COPY (select * from bigtable limit 100000) to stdout" | gzip > bigtable.gz
gzip -l bigtable.gz

That lets you get sample a decent sized chunk of the table to figure out
what compression ratio you're likely to get on the data in there.  Given
all the table sizes and a compression ratio estimate, from there you can
make a fairly accurate guess of what the whole dump is going to take up,
presuming your data is fairly evenly distributed such that the first
records that come back are typical of the whole thing.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Space for pg_dump

От
Rainer Bauer
Дата:
Greg Smith wrote:

>On Tue, 31 Mar 2009, Scott Marlowe wrote:
>
>> Sadly, there is no exact maths for such things.  If your database has
>> tons of indexes and such, it might be 20 or 100 times bigger on disk
>> than it will be during backup.  If it's all compressible text with few
>> indexes, it might be a 1:1 or so size.
>
>Since running an entire pgdump can take forever on a big database, what I
>usually do here is start by running the disk usage query at
>http://wiki.postgresql.org/wiki/Disk_Usage

Interesting. However, the query gives an error if the table name contains
upper case characters, like in my case "tblConnections":

ERROR:  relation "public.tblconnections" does not exist.

Replacing all occurences of <relname> by  <'"' || relname || '"'> fixes the
error.

Rainer

Re: Space for pg_dump

От
Tom Lane
Дата:
Rainer Bauer <usenet@munnin.com> writes:
> Greg Smith wrote:
>> Since running an entire pgdump can take forever on a big database, what I
>> usually do here is start by running the disk usage query at
>> http://wiki.postgresql.org/wiki/Disk_Usage

> Interesting. However, the query gives an error if the table name contains
> upper case characters, like in my case "tblConnections":

> ERROR:  relation "public.tblconnections" does not exist.

> Replacing all occurences of <relname> by  <'"' || relname || '"'> fixes the
> error.

That still fails if the table name contains double quotes.  A proper
solution is to use the table OID --- I've corrected the example.

            regards, tom lane

Re: Space for pg_dump

От
Jan Otto
Дата:
Hi,

> Rainer Bauer <usenet@munnin.com> writes:
>> Greg Smith wrote:
>>> Since running an entire pgdump can take forever on a big database,
>>> what I
>>> usually do here is start by running the disk usage query at
>>> http://wiki.postgresql.org/wiki/Disk_Usage
>
>> Interesting. However, the query gives an error if the table name
>> contains
>> upper case characters, like in my case "tblConnections":
>
>> ERROR:  relation "public.tblconnections" does not exist.
>
>> Replacing all occurences of <relname> by  <'"' || relname || '"'>
>> fixes the
>> error.
>
> That still fails if the table name contains double quotes.  A proper
> solution is to use the table OID --- I've corrected the example.

If you have big toast tables you get wrong results with the query
suggested
at http://wiki.postgresql.org/wiki/Disk_Usage because it takes the
toasted
values not into account.

Simple example (take a look at the first row -> public.media):

SELECT nspname || '.' || relname AS "relation",
     pg_size_pretty(pg_relation_size(C.oid)) AS "size"
   FROM pg_class C
   LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
   WHERE nspname NOT IN ('pg_catalog', 'information_schema')
     AND nspname !~ '^pg_toast'
   ORDER BY pg_relation_size(C.oid) DESC
   LIMIT 20;
          relation          |    size
---------------------------+------------
  public.media              | 727 MB
  public.identifier_idx     | 342 MB
  public.media_pk           | 190 MB
  public.mediateypes_pk     | 16 kB
  public.mediaformats_uk    | 16 kB
  public.contentsizes_pk    | 16 kB
  public.contenttype_pk     | 16 kB
  public.mediaformats_pk    | 16 kB
  public.contenttypes       | 8192 bytes
  public.media_media_id_seq | 8192 bytes
  public.contentsizes       | 8192 bytes
  public.mediaformats       | 8192 bytes
  public.mediatypes         | 8192 bytes
  public.vmedia2            | 0 bytes
  public.vmedia             | 0 bytes
(15 rows)

Now a fixed query which gets the sizes of the related pg_toast_oid and
pg_toast_oid_index too:

SELECT nspname || '.' || relname AS "relation",
         pg_size_pretty(pg_relation_size(C.oid)
                 + COALESCE((SELECT pg_relation_size(C2.oid) FROM
pg_class C2 WHERE C2.relname = 'pg_toast_' || C.oid ),0::bigint)
                 + COALESCE((SELECT pg_relation_size(C3.oid) FROM
pg_class C3 WHERE C3.relname = 'pg_toast_' || C.oid || '_index'),
0::bigint)
         )  AS "size"
   FROM pg_class C
   LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
   WHERE nspname NOT IN ('pg_catalog', 'information_schema')
     AND nspname !~ '^pg_toast'
   ORDER BY pg_relation_size(C.oid)
                 + COALESCE((SELECT pg_relation_size(C2.oid) FROM
pg_class C2 WHERE C2.relname = 'pg_toast_' || C.oid ),0::bigint)
                 + COALESCE((SELECT pg_relation_size(C3.oid) FROM
pg_class C3 WHERE C3.relname = 'pg_toast_' || C.oid || '_index'),
0::bigint)
         DESC
   LIMIT 20;
          relation          |    size
---------------------------+------------
  public.media              | 164 GB
  public.identifier_idx     | 342 MB
  public.media_pk           | 190 MB
  public.contenttype_pk     | 16 kB
  public.contenttypes       | 16 kB
  public.contentsizes       | 16 kB
  public.contentsizes_pk    | 16 kB
  public.mediateypes_pk     | 16 kB
  public.mediaformats       | 16 kB
  public.mediatypes         | 16 kB
  public.mediaformats_pk    | 16 kB
  public.mediaformats_uk    | 16 kB
  public.media_media_id_seq | 8192 bytes
  public.vmedia             | 0 bytes
  public.vmedia2            | 0 bytes
(15 rows)

There is a difference of about 163 GB (which is from the toast of
public.media)
             relation             |    size
---------------------------------+------------
  pg_toast.pg_toast_6366088       | 162 GB
  pg_toast.pg_toast_6366088_index | 1832 MB
  public.media                    | 727 MB

If you have only small or no toast tables the query from the wiki will
be working for you.

regards, jan


Re: Space for pg_dump

От
Alvaro Herrera
Дата:
Jan Otto wrote:

> If you have big toast tables you get wrong results with the query
> suggested
> at http://wiki.postgresql.org/wiki/Disk_Usage because it takes the
> toasted
> values not into account.


> Now a fixed query which gets the sizes of the related pg_toast_oid and
> pg_toast_oid_index too:

Note that there is also the pg_total_relation_size function which will
report the total table size, including toast table and indexes.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Space for pg_dump

От
Jan Otto
Дата:
On Apr 20, 2009, at 7:35 PM, Alvaro Herrera wrote:

> Jan Otto wrote:
>
>> If you have big toast tables you get wrong results with the query
>> suggested
>> at http://wiki.postgresql.org/wiki/Disk_Usage because it takes the
>> toasted
>> values not into account.
>
>
>> Now a fixed query which gets the sizes of the related pg_toast_oid
>> and
>> pg_toast_oid_index too:
>
> Note that there is also the pg_total_relation_size function which will
> report the total table size, including toast table and indexes.

Ahhh, it was a long day... sometimes i think too complicated ;-)

regards, jan


Re: Space for pg_dump

От
Greg Smith
Дата:
On Mon, 20 Apr 2009, Jan Otto wrote:

> If you have big toast tables you get wrong results with the query suggested
> at http://wiki.postgresql.org/wiki/Disk_Usage because it takes the toasted
> values not into account.

I can't recall why I wrote that to filter out things in the pg_toast
namespace in the first place.  I just took the easy way out here--the
query has been updated to not exclude relations in that namespace anymore
and I link to the TOAST docs for more details.

I consider combining all the values together, as you did in your example
code and as pg_total_relation_size does, as a different type of report.
Accordingly, I just updated with examples of both types, as well as
something to work against pre-8.1 databases.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD