Обсуждение: poor VACUUM performance on large tables

Поиск
Список
Период
Сортировка

poor VACUUM performance on large tables

От
Jan Peterson
Дата:
Hello,

We have been experiencing poor performance of VACUUM in our production
database.  Relevant details of our implementation are as follows:

1.  We have a database that grows to about 100GB.
2.  The database is a mixture of large and small tables.
3.  Bulk data (stored primarily in pg_largeobject, but also in various
TOAST tables) comprises about 45% of our data.
4.  Some of our small tables are very active, with several hundred
updates per hour.
5.  We have a "rolling delete" function that purges older data on a
periodic basis to keep our maximum database size at or near 100GB.

Everything works great until our rolling delete kicks in.  Of course,
we are doing periodic VACUUMS on all tables, with frequent VACUUMs on
the more active tables.  The problem arises when we start deleting the
bulk data and have to VACUUM pg_largeobject and our other larger
tables.  We have seen VACUUM run for several hours (even tens of
hours).  During this VACUUM process, our smaller tables accumulate
dead rows (we assume because of the transactional nature of the
VACUUM) at a very rapid rate.  Statistics are also skewed during this
process and we have observed the planner choosing sequential scans on
tables where it is obvious that an index scan would be more efficient.

We're looking for ways to improve the performance of VACUUM.  We are
already experimenting with Hannu Krosing's patch for VACUUM, but it's
not really helping (we are still faced with doing a database wide
VACUUM about once every three weeks or so as we approach the
transaction id rollover point... this VACUUM has been measured at 28
hours in an active environment).

Other things we're trying are partitioning tables (rotating the table
that updates happen to and using a view to combine the sub-tables for
querying).  Unfortunately, we are unable to partition the
pg_largeobject table, and that table alone can take up 40+% of our
database storage.  We're also looking at somehow storing our large
objects externally (as files in the local file system) and
implementing a mechanism similar to Oracle's bfile functionality.  Of
course, we can't afford to give up the transactional security of being
able to roll back if a particular update doesn't succeed.

Does anyone have any suggestions to offer on good ways to proceed
given our constraints?  Thanks in advance for any help you can
provide.

        -jan-
--
Jan L. Peterson
<jan.l.peterson@gmail.com>

Re: poor VACUUM performance on large tables

От
"Thomas F. O'Connell"
Дата:
On Sep 4, 2005, at 1:16 AM, Jan Peterson wrote:

> Hello,
>
> We have been experiencing poor performance of VACUUM in our production
> database.  Relevant details of our implementation are as follows:
>
> 1.  We have a database that grows to about 100GB.
> 2.  The database is a mixture of large and small tables.
> 3.  Bulk data (stored primarily in pg_largeobject, but also in various
> TOAST tables) comprises about 45% of our data.
> 4.  Some of our small tables are very active, with several hundred
> updates per hour.
> 5.  We have a "rolling delete" function that purges older data on a
> periodic basis to keep our maximum database size at or near 100GB.
>
> Everything works great until our rolling delete kicks in.  Of course,
> we are doing periodic VACUUMS on all tables, with frequent VACUUMs on
> the more active tables.  The problem arises when we start deleting the
> bulk data and have to VACUUM pg_largeobject and our other larger
> tables.  We have seen VACUUM run for several hours (even tens of
> hours).  During this VACUUM process, our smaller tables accumulate
> dead rows (we assume because of the transactional nature of the
> VACUUM) at a very rapid rate.  Statistics are also skewed during this
> process and we have observed the planner choosing sequential scans on
> tables where it is obvious that an index scan would be more efficient.
>
> We're looking for ways to improve the performance of VACUUM.  We are
> already experimenting with Hannu Krosing's patch for VACUUM, but it's
> not really helping (we are still faced with doing a database wide
> VACUUM about once every three weeks or so as we approach the
> transaction id rollover point... this VACUUM has been measured at 28
> hours in an active environment).
>
> Other things we're trying are partitioning tables (rotating the table
> that updates happen to and using a view to combine the sub-tables for
> querying).  Unfortunately, we are unable to partition the
> pg_largeobject table, and that table alone can take up 40+% of our
> database storage.  We're also looking at somehow storing our large
> objects externally (as files in the local file system) and
> implementing a mechanism similar to Oracle's bfile functionality.  Of
> course, we can't afford to give up the transactional security of being
> able to roll back if a particular update doesn't succeed.
>
> Does anyone have any suggestions to offer on good ways to proceed
> given our constraints?  Thanks in advance for any help you can
> provide.
>
>         -jan-

Do you have your Free Space Map settings configured appropriately?
See section 16.4.3.2 of the docs:

http://www.postgresql.org/docs/8.0/static/runtime-config.html#RUNTIME-
CONFIG-RESOURCE

You'll want to run a VACUUM VERBOSE and note the numbers at the end,
which describe how many pages are used and how many are needed.
max_fsm_pages should be set according to that, and you can set
max_fsm_relations based on it, too, although typically one knows
roughly how many relations are in a database.

http://www.postgresql.org/docs/8.0/static/sql-vacuum.html

Finally, have you experimented with pg_autovacuum, which is located
in contrib in the source tarballs (and is integrated into the backend
in 8.1 beta and beyond)? You don't really say how often you're
running VACUUM, and it might be that you're not vacuuming often enough.

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-469-5150
615-469-5151 (fax)

Re: poor VACUUM performance on large tables

От
Tom Lane
Дата:
Jan Peterson <jan.l.peterson@gmail.com> writes:
> We have been experiencing poor performance of VACUUM in our production
> database.

Which PG version, exactly?

> Everything works great until our rolling delete kicks in.  Of course,
> we are doing periodic VACUUMS on all tables, with frequent VACUUMs on
> the more active tables.  The problem arises when we start deleting the
> bulk data and have to VACUUM pg_largeobject and our other larger
> tables.  We have seen VACUUM run for several hours (even tens of
> hours).

Plain VACUUM (not FULL) certainly ought not take that long.  (If you're
using VACUUM FULL, the answer is going to be "don't do that".)  What
maintenance_work_mem (or vacuum_mem in older releases) are you running
it under?  Can you get VACUUM VERBOSE output from some of these cases
so we can see which phase(s) are eating the time?  It'd also be
interesting to watch the output of vmstat or local equivalent --- it
might just be that your I/O capability is nearly saturated and VACUUM is
pushing the system over the knee of the response curve.  If so, the
vacuum delay options of 8.0 would be worth experimenting with.

> Statistics are also skewed during this
> process and we have observed the planner choosing sequential scans on
> tables where it is obvious that an index scan would be more efficient.

That's really pretty hard to believe; VACUUM doesn't affect the
statistics until the very end.  Can you give some specifics of how
the "statistics are skewed"?

            regards, tom lane

Re: poor VACUUM performance on large tables

От
Jan Peterson
Дата:
Thomas F. O'Connell:
>Do you have your Free Space Map settings configured appropriately?

Our current FSM settings are:
max_fsm_pages = 500000          # min max_fsm_relations*16, 6 bytes each
max_fsm_relations = 1000        # min 100, ~50 bytes each

> You'll want to run a VACUUM VERBOSE and note the numbers at the end,
> which describe how many pages are used and how many are needed.
> max_fsm_pages should be set according to that, and you can set
> max_fsm_relations based on it, too, although typically one knows
> roughly how many relations are in a database.

Here are the last two lines from a VACUUM VERBOSE FULL we did when the
database was totally full:

INFO:  free space map: 82 relations, 532349 pages stored; 632352 total
pages needed
DETAIL:  Allocated FSM size: 1000 relations + 500000 pages = 2995 kB
shared memory.
VACUUM

Based on this, it looks like we could stand to bump up our FSM another
couple hundred thousand.  Does it buy us anything to reduce the number
of FSM relations from the default of 1000?

> have you experimented with pg_autovacuum

We're not using pg_autovacuum.  We have our own mechanism that works
basically the same as pg_autovacuum, but split into two separate
threads, one for large tables and one for small tables.  We consider
tables to be "large" if their size exceeds 100MB.  Tables are selected
for vacuuming if they've changed "enough" (I can get you actual
metrics for what is "enough", but I don't know off the top of my
head).  Our main reason for splitting out small vs. large tables was
that the large tables take a long time to VACUUM and we didn't want
our small tables to go a long time between VACUUMs.  Of course, due to
the transactional nature of VACUUM, we aren't really gaining much
here, anyway (this was one of the things we were hoping to address
with Hannu's patch, but there are issues with his patch on 8.0.2 that
we haven't tracked down yet).

Tom Lane:
> Which PG version, exactly?

We're currently running 8.0.2.

> Plain VACUUM (not FULL) certainly ought not take that long.  (If you're
> using VACUUM FULL, the answer is going to be "don't do that".)

Heh, we're definitely not doing a VACUUM FULL.  We're doing VACUUM
ANALYZE {tablename} exclusively, except when we get close to the
transaction id wraparound threshold when we do a VACUUM ANALYZE of the
entire database.

> What maintenance_work_mem (or vacuum_mem in older releases) are
> you running it under?

It looks like we are using the defaults for work_mem (1024) and
maintenance_work_mem (16384).  We could certainly bump these up.  Is
there a good way to determine what settings would be reasonable?  I'll
note, however, that we had experimented with bumping these previously
and not noticed any change in performance.

> Can you get VACUUM VERBOSE output from some of these cases
> so we can see which phase(s) are eating the time?

I'll get some, but it will take a few more days as we have recently
reset our test environment.  I can get some sample runs of VACUUM
VERBOSE on pg_largeobject in a few hours (it takes a few hours to run)
and will post them when I have them.

> It'd also be interesting to watch the output of vmstat or local
> equivalent --- it might just be that your I/O capability is nearly
> saturated and VACUUM is pushing the system over the knee
> of the response curve.  If so, the vacuum delay options of 8.0
> would be worth experimenting with.

We've been monitoring I/O rates with iostat and we're generally
running around 90% I/O usage after we kick into the rolling delete
stage (before we reach that stage, we're running around 20%-50% I/O
usage).  We are definitely I/O bound, hence trying to find a way to
make VACUUM process less data.

Our system (the database is on an appliance system) is a dual CPU box,
and we're burning about 25% of our CPU time in I/O waits (again, after
our rolling delete kicks in).  A higher performance I/O subsystem is
something we could try.

Our biggest concern with increasing the vacuum delay options is the
length of time it currently takes to VACUUM our large tables (and
pg_largeobject).  Holding a transaction open for these long periods
degrades performance in other places.

> > Statistics are also skewed during this
> > process and we have observed the planner choosing sequential scans on
> > tables where it is obvious that an index scan would be more efficient.
>
> That's really pretty hard to believe; VACUUM doesn't affect the
> statistics until the very end.  Can you give some specifics of how
> the "statistics are skewed"?

I don't have any hard evidence for this, but we have noticed that at
certain times a particular query which we run will run for an
extremely long time (several hours).  Re-running the query with
EXPLAIN always shows it using an index scan and it runs very quickly.
We haven't been able to catch it with an EXPLAIN in the state where it
will take a long time (it's not deterministic).  Our assumption is
that the planner is taking the wrong path because we can't figure out
any other reason why the query would take such a long time.  We'll run
some more experiments and try to reproduce this behavior.  Is there
anything specific that would help track this down (other than getting
EXPLAIN output showing the bogus execution plan)?

Thanks for your help.

        -jan-
--
Jan L. Peterson
<jan.l.peterson@gmail.com>

Re: poor VACUUM performance on large tables

От
Tom Lane
Дата:
Jan Peterson <jan.l.peterson@gmail.com> writes:
> Based on this, it looks like we could stand to bump up our FSM another
> couple hundred thousand.  Does it buy us anything to reduce the number
> of FSM relations from the default of 1000?

Not a lot; as the comment says, those slots are only about 50 bytes
each.  (I think the true figure is closer to 70, according to some
measurements I did recently on CVS tip, but in any case it's less than
100 bytes apiece.)  Still, a byte saved is a byte earned ...

> It looks like we are using the defaults for work_mem (1024) and
> maintenance_work_mem (16384).  We could certainly bump these up.  Is
> there a good way to determine what settings would be reasonable?

I'd bump up maintenance_work_mem by a factor of 10 and see if it makes a
difference.  It should reduce the number of passes over the indexes when
vacuuming up lots of deleted rows.  If you have lots of RAM you might be
able to increase it more, but try that for starters.

            regards, tom lane