Обсуждение: big database with very small dump !?
Hello all,
I'm finding it very strange that my pg takes 9Giga on disk but
pg_dumpall produces a 250Mega dump. 'VACUUM FULL' was executed
yesterday.
Is this normal ? Should I be worried ?
details bellow:
------------------------------------------
# pg_dumpall --oids --clean > pg_dumpall.sql
# ls -lh
total 232M
-rw-r--r-- 1 postgres postgres 231M Aug 11 15:46
pg_dumpall.sql
# du -sh /var/pgsql/data/
9.4G /var/pgsql/data
------------------------------------------
thx
joao
On Mon, 11 Aug 2008, Joao Ferreira gmail wrote: > I'm finding it very strange that my pg takes 9Giga on disk but > pg_dumpall produces a 250Mega dump. 'VACUUM FULL' was executed > yesterday. If you've been running VACUUM FULL, it's probably so-called "index bloat". Try running the query at http://wiki.postgresql.org/wiki/Disk_Usage to figure out where all your space has gone inside the database. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Mon, Aug 11, 2008 at 10:30 AM, Joao Ferreira gmail <joao.miguel.c.ferreira@gmail.com> wrote: > Hello all, > > I'm finding it very strange that my pg takes 9Giga on disk but > pg_dumpall produces a 250Mega dump. 'VACUUM FULL' was executed > yesterday. > > Is this normal ? Should I be worried ? It really depends. If you've got a lot of indexes then maybe that's normal. But most the time you don't see more than about 4 times the space used in the db as in the dump. It's likely you've got index bloat. If you reload a pg_dump of the database in question into another server how much space does that take up? Look into using CLUSTER or REINDEX to fix the space usage.
I'd do du -sh /var/pgsql/data/base rather then /var/pgsql/data depending on how your pgsql server logging is setup, there are other folders and/or files that might take considerable disk space under ./data/ you may want to exclude those. I find this query useful for something like this as well: select datname,pg_size_pretty(pg_database_size(oid)) from pg_database ; V. Joao Ferreira gmail wrote: > Hello all, > > I'm finding it very strange that my pg takes 9Giga on disk but > pg_dumpall produces a 250Mega dump. 'VACUUM FULL' was executed > yesterday. > > Is this normal ? Should I be worried ? > > > details bellow: > ------------------------------------------ > # pg_dumpall --oids --clean > pg_dumpall.sql > # ls -lh > total 232M > -rw-r--r-- 1 postgres postgres 231M Aug 11 15:46 > pg_dumpall.sql > # du -sh /var/pgsql/data/ > 9.4G /var/pgsql/data > ------------------------------------------ > > > thx > joao > > > > >
On Mon, 2008-08-11 at 12:43 -0700, Vlad Kosilov wrote:
> ./data/ you may want to exclude those. I find this query useful for
> something like this as well:
>
> select datname,pg_size_pretty(pg_database_size(oid)) from pg_database ;
>
Hello Vlad,
I ran your query and I got the 9Gigas!
I guess it should be related to index bloating, then. Do you agree ?
thx
Joao
postgres=# select datname,pg_size_pretty(pg_database_size(oid)) from
pg_database;
datname | pg_size_pretty
---------------+----------------
postgres | 3617 kB
egbert | 9585 MB
asterisk | 3993 kB
turba | 3673 kB
edgereporting | 3617 kB
template1 | 3617 kB
template0 | 3537 kB
(7 rows)
postgres=#
> V.
>
> Joao Ferreira gmail wrote:
> > Hello all,
> >
> > I'm finding it very strange that my pg takes 9Giga on disk but
> > pg_dumpall produces a 250Mega dump. 'VACUUM FULL' was executed
> > yesterday.
> >
> > Is this normal ? Should I be worried ?
> >
> >
> > details bellow:
> > ------------------------------------------
> > # pg_dumpall --oids --clean > pg_dumpall.sql
> > # ls -lh
> > total 232M
> > -rw-r--r-- 1 postgres postgres 231M Aug 11 15:46
> > pg_dumpall.sql
> > # du -sh /var/pgsql/data/
> > 9.4G /var/pgsql/data
> > ------------------------------------------
> >
> >
> > thx
> > joao
> >
> >
> >
> >
> >
>
Hello Greg, Vlad, Scott and all,
thanks for the feedback.
O forgot to mention that I execute REINDEX on all tables and INDEXes
every week (right after executing VACUUM FULL).
Is this enough to eliminate the possibility of "index bloat" ?
and, yes, my database has some crazy indexes. I use these indexes, and I
keep them REINDEXed to keep query execution time down. see bellow.
could these indexes be the real reason for taking up all that space ?
thanks
joao
egbert=# \d timeslots;
Table "public.timeslots"
Column | Type | Modifiers
-----------+---------+-----------
str1 | text |
str2 | text |
...
...
str20 | text |
val1 | real |
...
...
val6 | real |
var | text |
count | integer |
total | real |
timeslot | integer | not null
timestamp | integer | not null
tsws | integer | not null
tses | integer | not null
Indexes:
"timeslots_strs_var_ts_key" UNIQUE, btree (str1, str2, str3, str4,
str5, str6, str7, str8, str9, str10, str11, str12, str13, str14, str15,
str16, str17, str18, str19, str20, var, timeslot) CLUSTER
"timeslots_timeslot_index" btree (timeslot)
"timeslots_timestamp_index" btree ("timestamp")
"timeslots_var_index" btree (var)
egbert=#
------------------------------------------------------------
On Mon, 2008-08-11 at 12:45 -0400, Greg Smith wrote:
> On Mon, 11 Aug 2008, Joao Ferreira gmail wrote:
>
> > I'm finding it very strange that my pg takes 9Giga on disk but
> > pg_dumpall produces a 250Mega dump. 'VACUUM FULL' was executed
> > yesterday.
>
> If you've been running VACUUM FULL, it's probably so-called "index bloat".
> Try running the query at http://wiki.postgresql.org/wiki/Disk_Usage to
> figure out where all your space has gone inside the database.
>
> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, M
On Mon, 2008-08-11 at 12:45 -0400, Greg Smith wrote: > On Mon, 11 Aug 2008, Joao Ferreira gmail wrote: > > > I'm finding it very strange that my pg takes 9Giga on disk but > > pg_dumpall produces a 250Mega dump. 'VACUUM FULL' was executed > > yesterday. > > If you've been running VACUUM FULL, it's probably so-called "index bloat". > Try running the query at http://wiki.postgresql.org/wiki/Disk_Usage to > figure out where all your space has gone inside the database. > egbert=# SELECT nspname || '.' || relname AS "relation", egbert-# pg_size_pretty(pg_relation_size(nspname || '.' || relname)) AS "size" egbert-# FROM pg_class C egbert-# LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) egbert-# WHERE nspname NOT IN ('pg_catalog', 'information_schema') egbert-# AND nspname !~ '^pg_toast' egbert-# AND pg_relation_size(nspname || '.' || relname)>0 egbert-# ORDER BY pg_relation_size(nspname || '.' || relname) DESC egbert-# LIMIT 20; relation | size ----------------------------------+--------- public.timeslots_strs_var_ts_key | 5643 MB #this is a UNIQUE clause public.timeslots | 2660 MB #this is the only table public.timeslots_timestamp_index | 583 MB #this is an index public.timeslots_var_index | 314 MB #this is an index public.timeslots_timeslot_index | 275 MB "this is an index (5 rows) so it seems that the UNIQUE clause is taking up more space than the data itself... stil I have 2660 MB of data but the dump is about 10x smaller !!! any hints ? > -- > * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD >
In response to Joao Ferreira gmail <joao.miguel.c.ferreira@gmail.com>: > > On Mon, 2008-08-11 at 12:43 -0700, Vlad Kosilov wrote: > > ./data/ you may want to exclude those. I find this query useful for > > something like this as well: > > > > select datname,pg_size_pretty(pg_database_size(oid)) from pg_database ; > > > > Hello Vlad, > > I ran your query and I got the 9Gigas! > > I guess it should be related to index bloating, then. Do you agree ? No, the index size is included in pg_database_size(). Perhaps do a du -hd1 /var/pgsql/data to see which directories have all the space. (or du -h --max-depth=1 /var/pgsql/data on Linux) -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023 **************************************************************** IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ****************************************************************
On Mon, 2008-08-11 at 10:58 -0600, Scott Marlowe wrote:
> It's likely you've got index bloat. If you reload a pg_dump of the
> database in question into another server how much space does that take
> up?
right. just loaded the dump into a clean database and everything came
down about 10 times...
----------------------------------------------
NOW: (injected dump into fresh Pg):
relation | size
----------------------------------+--------
public.timeslots | 549 MB
public.timeslots_strs_var_ts_key | 482 MB
public.timeslots_var_index | 59 MB
public.timeslots_timeslot_index | 37 MB
public.timeslots_timestamp_index | 37 MB
(5 rows)
------------------------------------------------
BEFORE:
relation | size
----------------------------------+---------
public.timeslots_strs_var_ts_key | 5643 MB
public.timeslots | 2660 MB
public.timeslots_timestamp_index | 583 MB
public.timeslots_var_index | 314 MB
public.timeslots_timeslot_index | 275 MB
I'm confused here....
on the fresh database the whole set only takes 1.3G
on the original db, even after VACUUM FULL and REINDEX it takes 9G.
can I really do anything about it ?
If I try cluster, I'm guessing I'll choose the big index and forget
about the smaller ones... is this right ?
thanks
joao
thx
> Look into using CLUSTER or REINDEX to fix the space usage.
>
Joao Ferreira gmail wrote: > On Mon, 2008-08-11 at 12:45 -0400, Greg Smith wrote: >> On Mon, 11 Aug 2008, Joao Ferreira gmail wrote: >> >>> I'm finding it very strange that my pg takes 9Giga on disk but >>> pg_dumpall produces a 250Mega dump. 'VACUUM FULL' was executed >>> yesterday. >> If you've been running VACUUM FULL, it's probably so-called "index bloat". >> Try running the query at http://wiki.postgresql.org/wiki/Disk_Usage to >> figure out where all your space has gone inside the database. >> > > > egbert=# SELECT nspname || '.' || relname AS "relation", > egbert-# pg_size_pretty(pg_relation_size(nspname || '.' || relname)) > AS "size" > egbert-# FROM pg_class C > egbert-# LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) > egbert-# WHERE nspname NOT IN ('pg_catalog', 'information_schema') > egbert-# AND nspname !~ '^pg_toast' > egbert-# AND pg_relation_size(nspname || '.' || relname)>0 > egbert-# ORDER BY pg_relation_size(nspname || '.' || relname) DESC > egbert-# LIMIT 20; > > relation | size > ----------------------------------+--------- > public.timeslots_strs_var_ts_key | 5643 MB #this is a UNIQUE clause > public.timeslots | 2660 MB #this is the only table > public.timeslots_timestamp_index | 583 MB #this is an index > public.timeslots_var_index | 314 MB #this is an index > public.timeslots_timeslot_index | 275 MB "this is an index > (5 rows) > > > so it seems that the UNIQUE clause is taking up more space than the data > itself... > > stil I have 2660 MB of data but the dump is about 10x smaller !!! > > any hints ? I would try running a cluster on the table. This will usually clean up things and free diskspace both in the table and the indexes. It does require quite extensive locking though, so might not be an option if you can't afford having the database unavailable for a few (10-15) minutes. -- Tommy Gildseth
Joao Ferreira gmail wrote: > On Mon, 2008-08-11 at 10:58 -0600, Scott Marlowe wrote: > > If I try cluster, I'm guessing I'll choose the big index and forget > about the smaller ones... is this right ? > CLUSTER will sort out all the indexes, even though you're just clustering on on. -- Tommy Gildseth DBA, Gruppe for databasedrift Universitetet i Oslo, USIT m: +47 45 86 38 50 t: +47 22 85 29 39
Hi guys, If found the reason for all this problem. explanation: vacuum reindex cron scripts were not being executed. I executed the operations by hand and the values became normal. thank you all for the fine discussion. joao On Tue, 2008-08-12 at 13:49 +0200, Tommy Gildseth wrote: > Joao Ferreira gmail wrote: > > On Mon, 2008-08-11 at 10:58 -0600, Scott Marlowe wrote: > > > > If I try cluster, I'm guessing I'll choose the big index and forget > > about the smaller ones... is this right ? > > > > CLUSTER will sort out all the indexes, even though you're just > clustering on on. > >