Обсуждение: When does VACUUM FULL not clean out all deleted data?

Поиск
Список
Период
Сортировка

When does VACUUM FULL not clean out all deleted data?

От
James Cloos
Дата:
Today I did a backup, dump, drop, create and restore on a db which
seemed to be taking up too much disk space after a vacuum full.

The db was using about six gigs (as backed up) and now uses about two
thirds of a gig after the restore (plus many inserts and a few deletes).

Right after the restore the db took up less than ten percent as much
space as the backup.  (For the backup I stopped pg and used rsync on
the /var/lib/postgresql/data directory.)

Why was the db using that extra five plus gigs?

Some more background on the table:

Some time back I did alter most of the tables to replace various int8
columns with int4.  When I did that the disk space used (after a vacuum
full) was reduced almost in half; it didn't appear that anything failed
to be reclaimed....

When I did that int8 to int4 change, the table ended up at about 12 or
13 gigs.  That was for about 250k to 300k entries, where each entry
generates one row in a few tables, two rows (including an auto-toasted
bytea column) in one table and usually around 50 rows in another table.

The other day, while doing some testing, I had stopped the incoming
application and allowed all of the volatile data to expire.  So the
only data in the db was in the non-volatile tables:  about 7k rows
in two tables and about 1k in another, plus less than 6 in a couple
more.  A vacuum full at that point got the table down to five and
a half gigs.  Which is why I thought to try out a dump & restore,
though I didn't actually do so until it had been back in production
for a day or two.

-JimC
--
James Cloos <cloos@jhcloos.com>         OpenPGP: 1024D/ED7DAEA6

Re: When does VACUUM FULL not clean out all deleted data?

От
Tom Lane
Дата:
James Cloos <cloos@jhcloos.com> writes:
> Right after the restore the db took up less than ten percent as much
> space as the backup.  (For the backup I stopped pg and used rsync on
> the /var/lib/postgresql/data directory.)

> Why was the db using that extra five plus gigs?

Smells like a hadn't-been-vacuuming-often-enough problem.

Please note that vacuum-full-once-in-a-long-while is *not* a substitute
for frequent-plain-vacuum, because VACUUM FULL isn't good at reducing
index space usage --- in fact it usually makes it worse.  Your numbers
didn't separate table vs index bloat, but I'm suspicious that a lot of
your problem was the latter.

            regards, tom lane

Re: When does VACUUM FULL not clean out all deleted data?

От
James Cloos
Дата:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

>> Why was the db using that extra five plus gigs?

Tom> Smells like a hadn't-been-vacuuming-often-enough problem.

Sorry, I forgot to mention:  autovacuum was enabled and I also ran
this script every night via cron:

#!/bin/bash
FULL=''
test $(date +%w) -eq 0 && FULL='FULL'
exec psql -U dbm -c "VACUUM ${FULL} VERBOSE ANALYZE" dbm

The app at least felt faster with the daily vacuum, even though
autovacuum was running.  I don't have a good explanation for that,
either.

-JimC
--
James Cloos <cloos@jhcloos.com>         OpenPGP: 1024D/ED7DAEA6

Re: When does VACUUM FULL not clean out all deleted data?

От
"Joshua D. Drake"
Дата:
James Cloos wrote:
>>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>
>>> Why was the db using that extra five plus gigs?
>
> Tom> Smells like a hadn't-been-vacuuming-often-enough problem.
>
> Sorry, I forgot to mention:  autovacuum was enabled and I also ran
> this script every night via cron:
>
> #!/bin/bash
> FULL=''
> test $(date +%w) -eq 0 && FULL='FULL'
> exec psql -U dbm -c "VACUUM ${FULL} VERBOSE ANALYZE" dbm
>
> The app at least felt faster with the daily vacuum, even though
> autovacuum was running.  I don't have a good explanation for that,
> either.

Autovacuum by default won't do enough work for many production
databases. You still have to configure it.

Joshua D. Drake


Re: When does VACUUM FULL not clean out all deleted data?

От
"Scott Marlowe"
Дата:
On Jan 4, 2008 9:35 AM, James Cloos <cloos@jhcloos.com> wrote:
> >>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>
> >> Why was the db using that extra five plus gigs?
>
> Tom> Smells like a hadn't-been-vacuuming-often-enough problem.
>
> Sorry, I forgot to mention:  autovacuum was enabled and I also ran
> this script every night via cron:
>
> #!/bin/bash
> FULL=''
> test $(date +%w) -eq 0 && FULL='FULL'
> exec psql -U dbm -c "VACUUM ${FULL} VERBOSE ANALYZE" dbm

Note that routine vacuum FULL is not a good idea actually.  vacuum
full is something you run when things have gone wrong (i.e. not enough
regular vacuuming) and it is usually best followed by reindexing all
your indexes.

It's far better to keep track of bloat and run vacuum full, if at all,
by hand, and only when needed.  Running it regularly with no
reindexing often results in index bloat which makes the database
slower not faster.

Re: When does VACUUM FULL not clean out all deleted data?

От
James Cloos
Дата:
>>>>> "Joshua" == Joshua D Drake <jd@commandprompt.com> writes:

Joshua> Autovacuum by default won't do enough work for many production
Joshua> databases. You still have to configure it.

OK. Thanks.  Good to know.

-JimC
--
James Cloos <cloos@jhcloos.com>         OpenPGP: 1024D/ED7DAEA6


Re: When does VACUUM FULL not clean out all deleted data?

От
James Cloos
Дата:
>>>>> "Scott" == Scott Marlowe <scott.marlowe@gmail.com> writes:

Scott> Note that routine vacuum FULL is not a good idea actually.
Scott> vacuum full is something you run when things have gone wrong
Scott> (i.e. not enough regular vacuuming) and it is usually best
Scott> followed by reindexing all your indexes.

I see.

Scott> It's far better to keep track of bloat and run vacuum full, if at
Scott> all, by hand, and only when needed.  Running it regularly with no
Scott> reindexing often results in index bloat which makes the database
Scott> slower not faster.

That, then, probably explains the extra five gigs.  There are quite a
few indices in the app's schema.

Thanks.

-JimC
--
James Cloos <cloos@jhcloos.com>         OpenPGP: 1024D/ED7DAEA6

Re: When does VACUUM FULL not clean out all deleted data?

От
Geoffrey
Дата:
Scott Marlowe wrote:

> Note that routine vacuum FULL is not a good idea actually.  vacuum
> full is something you run when things have gone wrong (i.e. not enough
> regular vacuuming) and it is usually best followed by reindexing all
> your indexes.
>
> It's far better to keep track of bloat and run vacuum full, if at all,
> by hand, and only when needed.  Running it regularly with no
> reindexing often results in index bloat which makes the database
> slower not faster.

I've been trying to research this issue.  I can't really find
documentation that defines when to 'vacuum full.'  That is other then
this posting on the list and the following from the man page:

'The FULL option is not recommended for routine use, but may be useful
in special cases. An example is when you have deleted most of the rows
in a table and would like the table to physically shrink to occupy less
disk space. VACUUM FULL will usually shrink the table more than a plain
VACUUM would.'

So, I'm wondering if anyone can point me to more specific information as
to when, if ever, you should 'vacuum full?'

The more research I do, the more it appears that you should only 'vacuum
full' when you run into space issues.

--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
  - Benjamin Franklin

Re: When does VACUUM FULL not clean out all deleted data?

От
"Joshua D. Drake"
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, 07 Jan 2008 10:53:11 -0500
Geoffrey <lists@serioustechnology.com> wrote:

> So, I'm wondering if anyone can point me to more specific information
> as to when, if ever, you should 'vacuum full?'

1. If you over run your fsm pages
  * Except that vacuum full won't reclaim the index usage so use
CLUSTER instead.
2. If you are only repairing bloat on a smaller table.
  * It may be faster to backup and restore the table. Since vacuum full
is an exclusive lock anyway it doesn't matter.

Sincerely,

Joshua D. Drake


- -- 
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ 
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHgl1rATb/zqfZUUQRAod6AKCUkRKy+gqOkb4xpgVXiP8BCIePKQCgo2yS
ZRmnIVwrYHFoJ8juI08KL6c=
=Jb1i
-----END PGP SIGNATURE-----

Re: When does VACUUM FULL not clean out all deleted data?

От
Geoffrey
Дата:
Joshua D. Drake wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Mon, 07 Jan 2008 10:53:11 -0500
> Geoffrey <lists@serioustechnology.com> wrote:
>
>> So, I'm wondering if anyone can point me to more specific information
>> as to when, if ever, you should 'vacuum full?'
>
> 1. If you over run your fsm pages
>   * Except that vacuum full won't reclaim the index usage so use
> CLUSTER instead.
> 2. If you are only repairing bloat on a smaller table.
>   * It may be faster to backup and restore the table. Since vacuum full
> is an exclusive lock anyway it doesn't matter.

Thank you Joshua.

--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
  - Benjamin Franklin