Обсуждение: Impact of vacuum full...

Поиск
Список
Период
Сортировка

Impact of vacuum full...

От
Erik Jones
Дата:
Hello, I was wondering if someone could enlighten me as to the impact to
the entire database of running VACUUM FULL against a single table.  The
reason I ask is that at company we work for we have a very large number
of queue type tables that fill up and empty out on a regular basis, on
the average every two days per table would be a good estimate.  These
tables, when filled, can contain anywhere from less than 50 to upwards
of 500,000 entries.  We recently moved out db over to a new box as the
old one wasn't cutting it (we haven't been able to run a successful
VACUUM FULL on the whole db in over a year and after the transfer our db
was smaller by 50GB!!!).  Thus we are looking into a more aggressive
vacuuming strategy for these table in order to keep our database lean
and clean for the purpose of quick and fast dumps and general
performance issues otherwise.

Also, what kind of times am I  looking at to do VACUUM FULL's on single
tables of different sizes?

Thank  you for any advice/input you may have...

--
erik jones <erik@myemma.com>
software development
emma(r)


Re: Impact of vacuum full...

От
Csaba Nagy
Дата:
Erik,

On Fri, 2006-07-21 at 17:13, Erik Jones wrote:
> Hello, I was wondering if someone could enlighten me as to the impact to
> the entire database of running VACUUM FULL against a single table.  The
> reason I ask is that at company we work for we have a very large number
> of queue type tables that fill up and empty out on a regular basis, on
> the average every two days per table would be a good estimate.

We also do have here some queue-like tables, though they tend to be
small and our clean-empty rate is minutes not days. I solved this kind
of problem using the CLUSTER command, which completely rebuilds the
table, and as an added benefit it will be physically ordered using the
index you have chosen to cluster on.

I think clustering is faster than vacuum full, and cleans your indexes
too (which vacuum full won't do). One interesting detail is that CLUSTER
is not respecting MVCC, i.e. it will clean all dead tuples regardless if
there are older transactions running which could see them. This might be
a problem for you, but for my queue-like tables was a big help to stay
clean, as CLUSTER is able to shrink them even in the presence
long-running transactions which normally would prevent cleaning dead
tuples back to the oldest running transaction.

For small tables CLUSTER is a nice feature... but beware that it locks
the table exclusively, so if you have a big table you might have a long
down-time during the clustering operation where the table is not
accessible. That said, I use it for fairly big tables too occasionally
when I need to clean up stuff...

Cheers,
Csaba.



Re: Impact of vacuum full...

От
Erik Jones
Дата:
Csaba Nagy wrote:
> Erik,
>
> On Fri, 2006-07-21 at 17:13, Erik Jones wrote:
>
>> Hello, I was wondering if someone could enlighten me as to the impact to
>> the entire database of running VACUUM FULL against a single table.  The
>> reason I ask is that at company we work for we have a very large number
>> of queue type tables that fill up and empty out on a regular basis, on
>> the average every two days per table would be a good estimate.
>>
>
> We also do have here some queue-like tables, though they tend to be
> small and our clean-empty rate is minutes not days. I solved this kind
> of problem using the CLUSTER command, which completely rebuilds the
> table, and as an added benefit it will be physically ordered using the
> index you have chosen to cluster on.
>
> I think clustering is faster than vacuum full, and cleans your indexes
> too (which vacuum full won't do). One interesting detail is that CLUSTER
> is not respecting MVCC, i.e. it will clean all dead tuples regardless if
> there are older transactions running which could see them. This might be
> a problem for you, but for my queue-like tables was a big help to stay
> clean, as CLUSTER is able to shrink them even in the presence
> long-running transactions which normally would prevent cleaning dead
> tuples back to the oldest running transaction.
>
> For small tables CLUSTER is a nice feature... but beware that it locks
> the table exclusively, so if you have a big table you might have a long
> down-time during the clustering operation where the table is not
> accessible. That said, I use it for fairly big tables too occasionally
> when I need to clean up stuff...
>
That is an excellent idea, however, what are the effects of CLUSTER on
empty tables?  Considering that most of our queue tables sit empty until
their used, our main concern is keep the disk space that they use
available and our 'cleaning' activities will be done whenever the tables
empty out (unless they are scheduled for use within, say, an hour) I
have to wonder at whether or not CLUSTER would do anything to an empty
table with no actual data to cluster.

--
erik jones <erik@myemma.com>
software development
emma(r)


Re: Impact of vacuum full...

От
Scott Marlowe
Дата:
On Fri, 2006-07-21 at 10:13, Erik Jones wrote:
> Hello, I was wondering if someone could enlighten me as to the impact to
> the entire database of running VACUUM FULL against a single table.  The
> reason I ask is that at company we work for we have a very large number
> of queue type tables that fill up and empty out on a regular basis

HOLD ON! Do you empty them by doing something like

delete from table

with no where clause?

If so, then try truncating the table.  That will clean it completely and
reclaim all the dead space, plus it's faster than delete anyway.

If that doesn't help, look at scheduling more aggressive plain vacuums
(no just autovacuum, but cron job vacuum on specific tables that you
know have a high turnover).

Vacuum full is basically admitting your regular vacuum schedule isn't /
can't be aggressive enough.

Re: Impact of vacuum full...

От
Erik Jones
Дата:
Scott Marlowe wrote:
> On Fri, 2006-07-21 at 10:13, Erik Jones wrote:
>
>> Hello, I was wondering if someone could enlighten me as to the impact to
>> the entire database of running VACUUM FULL against a single table.  The
>> reason I ask is that at company we work for we have a very large number
>> of queue type tables that fill up and empty out on a regular basis
>>
>
> HOLD ON! Do you empty them by doing something like
>
> delete from table
>
> with no where clause?
>
> If so, then try truncating the table.  That will clean it completely and
> reclaim all the dead space, plus it's faster than delete anyway.
>
> If that doesn't help, look at scheduling more aggressive plain vacuums
> (no just autovacuum, but cron job vacuum on specific tables that you
> know have a high turnover).
>
> Vacuum full is basically admitting your regular vacuum schedule isn't /
> can't be aggressive enough.
>
No!!!  The table is filled and entries are deleted one at a time, or in
groups, but definitely not all at once.    So, then what is the
difference between scheduling regular vacuum on specific tables v.
scheduling vacuum full on specific tables?  Basically, what I want to do
is to ensure that when I clean out a table row or rows at a time, the
space is immediately freed up.

--
erik jones <erik@myemma.com>
software development
emma(r)


Re: Impact of vacuum full...

От
Bill Moran
Дата:
On Fri, 21 Jul 2006 10:35:17 -0500
Erik Jones <erik@myemma.com> wrote:

> Csaba Nagy wrote:
[snip]
> That is an excellent idea, however, what are the effects of CLUSTER on
> empty tables?  Considering that most of our queue tables sit empty until
> their used, our main concern is keep the disk space that they use
> available and our 'cleaning' activities will be done whenever the tables
> empty out (unless they are scheduled for use within, say, an hour) I
> have to wonder at whether or not CLUSTER would do anything to an empty
> table with no actual data to cluster.

Based on that description, have you considered using TRUNCATE to clear out
the tables when you're done using them?  Truncate is faster than DELETE
and I believe it's the equivalient of dropping and recreating the table,
which means it will free up the space.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com

Re: Impact of vacuum full...

От
Bill Moran
Дата:
On Fri, 21 Jul 2006 11:40:32 -0500
Erik Jones <erik@myemma.com> wrote:

> No!!!  The table is filled and entries are deleted one at a time, or in
> groups, but definitely not all at once.    So, then what is the
> difference between scheduling regular vacuum on specific tables v.
> scheduling vacuum full on specific tables?  Basically, what I want to do
> is to ensure that when I clean out a table row or rows at a time, the
> space is immediately freed up.

Why would you want to do that?  PostgreSQL is not designed to operate in
that fashon, and it's ineffecient.  You're forcing PostgreSQL to constantly
increase and decrease the amount of disk space allocated for the table,
which isn't the best way to do things.

Much better is to schedule frequent VACUUMs so that the table size reaches
an equilibrium.  This way there is always a little free space in the table
so that rows can be added/updated without increasing the table size.  This
is how PostgreSQL is designed to run.

If you don't have enough disk space to do this, then you didn't purchase
large enough drives to hold your DB efficiently.  You can do the VACUUM
FULL as you describe, but it's non-optimal.

To answer your original question directly: VACUUM FULL is expensive.  It
needs to lock out the table for the duration of its work, and other
transactions will block during the operation.  Depending on the table
size, it could be prohibitively time-consuming.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com

Re: Impact of vacuum full...

От
Scott Marlowe
Дата:
On Fri, 2006-07-21 at 11:40, Erik Jones wrote:
> Scott Marlowe wrote:
> > On Fri, 2006-07-21 at 10:13, Erik Jones wrote:
> >
> >> Hello, I was wondering if someone could enlighten me as to the impact to
> >> the entire database of running VACUUM FULL against a single table.  The
> >> reason I ask is that at company we work for we have a very large number
> >> of queue type tables that fill up and empty out on a regular basis
> >>
> >
> > HOLD ON! Do you empty them by doing something like
> >
> > delete from table
> >
> > with no where clause?
> >
> > If so, then try truncating the table.  That will clean it completely and
> > reclaim all the dead space, plus it's faster than delete anyway.
> >
> > If that doesn't help, look at scheduling more aggressive plain vacuums
> > (no just autovacuum, but cron job vacuum on specific tables that you
> > know have a high turnover).
> >
> > Vacuum full is basically admitting your regular vacuum schedule isn't /
> > can't be aggressive enough.
> >
> No!!!  The table is filled and entries are deleted one at a time, or in
> groups, but definitely not all at once.    So, then what is the
> difference between scheduling regular vacuum on specific tables v.
> scheduling vacuum full on specific tables?  Basically, what I want to do
> is to ensure that when I clean out a table row or rows at a time, the
> space is immediately freed up.

Oh, ok.  Misunderstood based on your description there.

With regular vacuum, postgresql marks the freed tuples as available, and
the next time someone writes to the table it uses the freed up space.
Eventually, the table should stop growing and reach a kind of "stable
state" where it averages some percentage free (20 to 50% is generally
considered optimal).

If the space used by your table continues to grow, this points to a
possible problem with not having a large enough free space map.

Since regular vacuums are MUCH cheaper in terms of locking and such, it
might be practical to schedule a plain vacuum at the end of any large
deletes that you currently run.

I'd use regular cronned vacuums on the tables that you know grown a lot
(or just hit the whole db and not worry about it) and run occasional
vacuum verbose / vacuum full verbose by hand to see if you have problems
with your Free Space Map being too small.

Re: Impact of vacuum full...

От
Erik Jones
Дата:
Scott Marlowe wrote:
> On Fri, 2006-07-21 at 11:40, Erik Jones wrote:
>
>> Scott Marlowe wrote:
>>
>>> On Fri, 2006-07-21 at 10:13, Erik Jones wrote:
>>>
>>>
>>>> Hello, I was wondering if someone could enlighten me as to the impact to
>>>> the entire database of running VACUUM FULL against a single table.  The
>>>> reason I ask is that at company we work for we have a very large number
>>>> of queue type tables that fill up and empty out on a regular basis
>>>>
>>>>
>>> HOLD ON! Do you empty them by doing something like
>>>
>>> delete from table
>>>
>>> with no where clause?
>>>
>>> If so, then try truncating the table.  That will clean it completely and
>>> reclaim all the dead space, plus it's faster than delete anyway.
>>>
>>> If that doesn't help, look at scheduling more aggressive plain vacuums
>>> (no just autovacuum, but cron job vacuum on specific tables that you
>>> know have a high turnover).
>>>
>>> Vacuum full is basically admitting your regular vacuum schedule isn't /
>>> can't be aggressive enough.
>>>
>>>
>> No!!!  The table is filled and entries are deleted one at a time, or in
>> groups, but definitely not all at once.    So, then what is the
>> difference between scheduling regular vacuum on specific tables v.
>> scheduling vacuum full on specific tables?  Basically, what I want to do
>> is to ensure that when I clean out a table row or rows at a time, the
>> space is immediately freed up.
>>
>
> Oh, ok.  Misunderstood based on your description there.
>
> With regular vacuum, postgresql marks the freed tuples as available, and
> the next time someone writes to the table it uses the freed up space.
> Eventually, the table should stop growing and reach a kind of "stable
> state" where it averages some percentage free (20 to 50% is generally
> considered optimal).
>
> If the space used by your table continues to grow, this points to a
> possible problem with not having a large enough free space map.
>
> Since regular vacuums are MUCH cheaper in terms of locking and such, it
> might be practical to schedule a plain vacuum at the end of any large
> deletes that you currently run.
>
> I'd use regular cronned vacuums on the tables that you know grown a lot
> (or just hit the whole db and not worry about it) and run occasional
> vacuum verbose / vacuum full verbose by hand to see if you have problems
> with your Free Space Map being too small.
>
Awesome!  Thanks, guys, for all of your input/advice.  That's pretty
much how I thought stuff worked after reading the docs but was
confused/misled by other inputs.  With regards to the Free Space Map and
max_fsm_relations: is using the value of "SELECT COUNT(*) FROM
pg_class;" plus some room for growth a good way to set that?

--
erik jones <erik@myemma.com>
software development
emma(r)


Re: Impact of vacuum full...

От
Erik Jones
Дата:
Scott Marlowe wrote:
> On Fri, 2006-07-21 at 15:49, Erik Jones wrote:
>
>> Scott Marlowe wrote:
>>
>>> I'd use regular cronned vacuums on the tables that you know grown a lot
>>> (or just hit the whole db and not worry about it) and run occasional
>>> vacuum verbose / vacuum full verbose by hand to see if you have problems
>>> with your Free Space Map being too small.
>>>
>>>
>> Awesome!  Thanks, guys, for all of your input/advice.  That's pretty
>> much how I thought stuff worked after reading the docs but was
>> confused/misled by other inputs.  With regards to the Free Space Map and
>> max_fsm_relations: is using the value of "SELECT COUNT(*) FROM
>> pg_class;" plus some room for growth a good way to set that?
>>
>
> I always use vacuum verbose to see that.  At the end, it'll have a part
> that looks like this:
>
> INFO:  free space map: 35 relations, 18903 pages stored; 17504 total
> pages needed
> DETAIL:  Allocated FSM size: 5000 relations + 100000 pages = 894 kB
> shared memory.
> VACUUM
>
>
> So, on this machine, we can handle 5000 relations of 100,000 total
> pages, and we're only uses the space of 35 relations and ~20,000 pages.
>
> If the pages needed exceeds the allocated size, you've got problems.
>
> You've got to run the database for a while to see what the state will be
> like over time.
>
Well, just to give you guys an idea of the size of db we're working with
(and, hopefully to make you postgres developers proud) a 'SELECT
COUNT(*) FROM pg_class;' returns a # over 300k for the # of relations in
the db.  I really can't go too much furthur into that or the reason we
have so many due to my nda and such, but I'm sure you can see why this
has been such a big issue for me.  Thanks again for all of your help,
and I'll be back soon with some questions regarding ANALYZE that I've
got cooking up in the back of my head...

--
erik jones <erik@myemma.com>
software development
emma(r)


Re: Impact of vacuum full...

От
Scott Marlowe
Дата:
On Fri, 2006-07-21 at 15:49, Erik Jones wrote:
> Scott Marlowe wrote:
> >
> > I'd use regular cronned vacuums on the tables that you know grown a lot
> > (or just hit the whole db and not worry about it) and run occasional
> > vacuum verbose / vacuum full verbose by hand to see if you have problems
> > with your Free Space Map being too small.
> >
> Awesome!  Thanks, guys, for all of your input/advice.  That's pretty
> much how I thought stuff worked after reading the docs but was
> confused/misled by other inputs.  With regards to the Free Space Map and
> max_fsm_relations: is using the value of "SELECT COUNT(*) FROM
> pg_class;" plus some room for growth a good way to set that?

I always use vacuum verbose to see that.  At the end, it'll have a part
that looks like this:

INFO:  free space map: 35 relations, 18903 pages stored; 17504 total
pages needed
DETAIL:  Allocated FSM size: 5000 relations + 100000 pages = 894 kB
shared memory.
VACUUM


So, on this machine, we can handle 5000 relations of 100,000 total
pages, and we're only uses the space of 35 relations and ~20,000 pages.

If the pages needed exceeds the allocated size, you've got problems.

You've got to run the database for a while to see what the state will be
like over time.