Обсуждение: tuning autovacuum

Поиск
Список
Период
Сортировка

tuning autovacuum

От
Euler Taveira de Oliveira
Дата:
Hi,

There are some releases that autovacuum was enabled by default and, up to now 
there is an easy way to estimate the number of autovacuum workers. I tune it 
observing if the number of slots are saturated for a period of time. I'm 
having a hard time trying to do this. I want to add a LOG message such as

LOG: maximum number of autovacuum workers reached
HINT: Consider increasing autovacuum_max_workers (currently 5).

And also a view (say pg_stat_autovacuum) to expose some autovacuum information 
such as (i) number of autovacuum workers (ii) number of tables that needs 
analyze/vacuum and are scheduled to (iii) number of autovacuum count (iv) 
number of autoanalyze count. While I am in this topic, it would be nice to 
expose the analyze/vacuum count and threshold per table. This information 
should go to pg_stat_*_tables but it already has too much fields. Maybe it is 
time to split autovacuum information into another statistic view?

Comments?


--   Euler Taveira de Oliveira - Timbira       http://www.timbira.com.br/  PostgreSQL: Consultoria, Desenvolvimento,
Suporte24x7 e Treinamento
 


Re: tuning autovacuum

От
Magnus Hagander
Дата:
<p>On Jun 9, 2011 12:01 AM, "Euler Taveira de Oliveira" <<a
href="mailto:euler@timbira.com">euler@timbira.com</a>>wrote:<br /> ><br /> > Hi,<br /> ><br /> > There
aresome releases that autovacuum was enabled by default and, up to now there is an easy way to estimate the number of
autovacuumworkers. I tune it observing if the number of slots are saturated for a period of time. I'm having a hard
timetrying to do this. I want to add a LOG message such as<br /> ><br /> > LOG: maximum number of autovacuum
workersreached<br /> > HINT: Consider increasing autovacuum_max_workers (currently 5).<p>That would be very
useful.<br/><p>> And also a view (say pg_stat_autovacuum) to expose some autovacuum information such as (i) number
ofautovacuum workers (ii) number of tables that needs analyze/vacuum and are scheduled to (iii) number of<p>Part of
thatis on my personal todo already, so I'd be happy to review that :)<p>> autovacuum count (iv) number of
autoanalyzecount. While I am in this topic, it would be nice to expose the analyze/vacuum count and threshold per
table.This information should go to pg_stat_*_tables but it already has too much fields. Maybe it is time to split
autovacuuminformation into another statistic view?<p>That is configuration information and not statistics, so IMHO it
doesnot belong in pg_stat_*.<p>And if relation parameters are to be exposed more than they are now it should be done
forall, not just autovacuum.<p>/Magnus<br /> 

Re: tuning autovacuum

От
Robert Haas
Дата:
On Wed, Jun 8, 2011 at 5:54 PM, Euler Taveira de Oliveira
<euler@timbira.com> wrote:
> LOG: maximum number of autovacuum workers reached
> HINT: Consider increasing autovacuum_max_workers (currently 5).
>
> Comments?

Is the hint correct?  I mean, what if there were 100 small tables that
needed vacuuming all at the same time.  We'd hit this limit no matter
how high you set autovacuum_max_workers, but it wouldn't be right to
set it to 101 just because every once in a blue moon you might trip
over the limit.

I think it'd be really useful to expose some more data in this area
though.  One random idea is - remember the time at which a table was
first observed to need vacuuming. Clear the timestamp when it gets
vacuumed.  Then you can do:

SELECT blahblah FROM wumpity WHERE
time_at_which_we_first_noticed_it_needed_vacuuming < now() - '1
hour'::interval;

...or something of the sort.  That way you can alert if autovacuum
starts to fall too far behind, but you get to pick the definition of
"too far behind".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: tuning autovacuum

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> I think it'd be really useful to expose some more data in this area
> though.  One random idea is - remember the time at which a table was
> first observed to need vacuuming. Clear the timestamp when it gets
> vacuumed.  Then you can do:

As far as I recall that logic, there is no delay between when we know
that a table needs vacuumed and when we do it.  I don't see the point of
introducing any such delay, either.
        regards, tom lane


Re: tuning autovacuum

От
Robert Haas
Дата:
On Wed, Jun 8, 2011 at 9:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I think it'd be really useful to expose some more data in this area
>> though.  One random idea is - remember the time at which a table was
>> first observed to need vacuuming. Clear the timestamp when it gets
>> vacuumed.  Then you can do:
>
> As far as I recall that logic, there is no delay between when we know
> that a table needs vacuumed and when we do it.  I don't see the point of
> introducing any such delay, either.

Well, if there are more tables that need vacuuming than there are
workers available at any given time, there will be a delay.  We
probably don't keep track of that delay at present, but we could.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: tuning autovacuum

От
Euler Taveira de Oliveira
Дата:
Em 08-06-2011 20:35, Robert Haas escreveu:
> Is the hint correct?  I mean, what if there were 100 small tables that
> needed vacuuming all at the same time.  We'd hit this limit no matter
> how high you set autovacuum_max_workers, but it wouldn't be right to
> set it to 101 just because every once in a blue moon you might trip
> over the limit.
>
I think so. You are picturing a scene with only one message. It is the same 
case of the too-frequent-checkpoint messages; i.e., you should look if those 
messages have some periodicity.

> I think it'd be really useful to expose some more data in this area
> though.  One random idea is - remember the time at which a table was
> first observed to need vacuuming. Clear the timestamp when it gets
> vacuumed.  Then you can do:
>
Hmmm. But this fine grained information alone doesn't help tuning the number 
of autovacuum workers. I consider counters easier to implement and simpler to 
analyze. But the timestamp idea has its merit because we already have a 
similar statistic (last timestamp table was vacuumed or analyzed).


--   Euler Taveira de Oliveira - Timbira       http://www.timbira.com.br/  PostgreSQL: Consultoria, Desenvolvimento,
Suporte24x7 e Treinamento
 


Re: tuning autovacuum

От
Robert Haas
Дата:
On Wed, Jun 8, 2011 at 10:55 PM, Euler Taveira de Oliveira
<euler@timbira.com> wrote:
> Em 08-06-2011 20:35, Robert Haas escreveu:
>> Is the hint correct?  I mean, what if there were 100 small tables that
>> needed vacuuming all at the same time.  We'd hit this limit no matter
>> how high you set autovacuum_max_workers, but it wouldn't be right to
>> set it to 101 just because every once in a blue moon you might trip
>> over the limit.
>>
> I think so. You are picturing a scene with only one message. It is the same
> case of the too-frequent-checkpoint messages; i.e., you should look if those
> messages have some periodicity.

Yeah, maybe.  I'm just not sure there would be an easy way for users
to judge when they should or should not make a change.

>> I think it'd be really useful to expose some more data in this area
>> though.  One random idea is - remember the time at which a table was
>> first observed to need vacuuming. Clear the timestamp when it gets
>> vacuumed.  Then you can do:
>>
> Hmmm. But this fine grained information alone doesn't help tuning the number
> of autovacuum workers. I consider counters easier to implement and simpler
> to analyze. But the timestamp idea has its merit because we already have a
> similar statistic (last timestamp table was vacuumed or analyzed).

Well, it won't directly tell you how many you need.  But certainly if
you see things getting further and further behind, you know you need
more.

Or, alternatively, you need to reduce vacuum_cost_delay.  IME, that's
actually the most common cause of this problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: tuning autovacuum

От
Greg Smith
Дата:
Robert Haas wrote:
> Well, if there are more tables that need vacuuming than there are
> workers available at any given time, there will be a delay.  We
> probably don't keep track of that delay at present, but we could.
>   

There are at least four interesting numbers to collect each time 
autovacuum runs:

1) This one, when was the threshold crossed.  I believe one of the AV 
workers would have to pause periodically to update these if they're all 
busy doing work.
2) What time did the last autovacuum start at
3) How many dead rows were there at the point when it started
4) When did the last autovacuum end (currently the only value stored)

There may be a 5th piece of state I haven't looked at yet worth 
exposing/saving, something related to how much work was skipped by the 
partial vacuum logic introduced in 8.4.  I haven't looked at that code 
enough to know which is the right metric to measure its effectiveness 
by, but I have tis gut feel it's eventually going to be critical for 
distinguishing between the various common types of vacuum-heavy 
workloads that show up.

All of these need to be stored in a system table/view, so that an admin 
can run a query to answer questions like:

-What is AV doing right now?
-How far behind is AV on tables it needs to clean but hasn't even 
started on?
-How long is the average AV taking on my big tables?
-As I change the AV parameters, what does it do to the runtimes against 
my big tables?

As someone who is found by a lot of people whose problems revolve around 
databases with heavy writes or update churn, limitations in the current 
state of tracking what autovacuum does have been moving way up my 
priority list the last year.  I now have someone who is always running 
autovacuum on the same table, 24x7.  It finishes every two days, and 
when it does the 20% threshold is already crossed for it to start 
again.  The "wait until a worker was available" problem isn't there, but 
I need a good wasy to track all of the other three things to have a hope 
of improving their situation.  Right now getting the data I could use 
takes parsing log file output and periodic dumps of pg_stat_user_tables, 
then stitching the whole mess together.

You can't run a heavily updated database in the TB+ range and make sense 
of what autovacuum is doing without a large effort matching output from 
log_autovacuum_min_duration and the stats that are visible in 
pg_stat_user_tables.  It must get easier than that to support the sort 
of bigger tables it's possible to build now.  And if this data starts 
getting tracked, we can start to move toward AV parameters that are 
actually aiming at real-world units, too.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us




Re: tuning autovacuum

От
Alvaro Herrera
Дата:
Excerpts from Tom Lane's message of mié jun 08 21:50:22 -0400 2011:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I think it'd be really useful to expose some more data in this area
> > though.  One random idea is - remember the time at which a table was
> > first observed to need vacuuming. Clear the timestamp when it gets
> > vacuumed.  Then you can do:
> 
> As far as I recall that logic, there is no delay between when we know
> that a table needs vacuumed and when we do it.  I don't see the point of
> introducing any such delay, either.

Autovacuum checks each table twice.  When it first connects to a
database it grabs a complete list of relations needing vacuum.  Then it
starts vacuuming, and before processing each relation, it rechecks.

So there *is* a delay, which corresponds to how long it took to process
the tables that preceded it in the list.  Robert's suggestion would seem
to make sense.  I'm not sure how to implement it: do we want some more
(highly volatile) data points in pgstat?  Do we need some other
mechanism?  This seems like a use case for pg_class_nt (see
http://archives.postgresql.org/pgsql-patches/2006-06/msg00114.php)


In any case, given the "rebalancing" feature of vacuum_cost_delay (which
increases the delay the more workers there are), the only "solution" to
the problem of falling behind is reducing the delay parameter.  If you
just add more workers, they start working more slowly.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: tuning autovacuum

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> In any case, given the "rebalancing" feature of vacuum_cost_delay (which
> increases the delay the more workers there are), the only "solution" to
> the problem of falling behind is reducing the delay parameter.  If you
> just add more workers, they start working more slowly.

Yeah.  Note also that if you're not running a pretty recent minor
release, you're exposed to this bug:

Author: Tom Lane <tgl@sss.pgh.pa.us>
Branch: master [b58c25055] 2010-11-19 22:28:20 -0500
Branch: REL9_0_STABLE Release: REL9_0_2 [b5efc0940] 2010-11-19 22:28:25 -0500
Branch: REL8_4_STABLE Release: REL8_4_6 [fab2af30d] 2010-11-19 22:28:30 -0500
Branch: REL8_3_STABLE Release: REL8_3_13 [6cb9d5113] 2010-11-19 22:28:35 -0500
   Fix leakage of cost_limit when multiple autovacuum workers are active.      When using default
autovacuum_vac_cost_limit,autovac_balance_cost relied   on VacuumCostLimit to contain the correct global value ... but
afterthe   first time through in a particular worker process, it didn't, because we'd   trashed it in previous
iterations. Depending on the state of other autovac   workers, this could result in a steady reduction of the effective
 cost_limit setting as a particular worker processed more and more tables,   causing it to go slower and slower.
Spottedby Simon Poole (bug #5759).   Fix by saving and restoring the GUC variables in the loop in do_autovacuum.
Inpassing, improve a few comments.      Back-patch to 8.3 ... the cost rebalancing code has been buggy since it was
putin.
 

        regards, tom lane


Re: tuning autovacuum

От
Robert Haas
Дата:
On Thu, Jun 9, 2011 at 12:25 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Excerpts from Tom Lane's message of mié jun 08 21:50:22 -0400 2011:
>> Robert Haas <robertmhaas@gmail.com> writes:
>> > I think it'd be really useful to expose some more data in this area
>> > though.  One random idea is - remember the time at which a table was
>> > first observed to need vacuuming. Clear the timestamp when it gets
>> > vacuumed.  Then you can do:
>>
>> As far as I recall that logic, there is no delay between when we know
>> that a table needs vacuumed and when we do it.  I don't see the point of
>> introducing any such delay, either.
>
> Autovacuum checks each table twice.  When it first connects to a
> database it grabs a complete list of relations needing vacuum.  Then it
> starts vacuuming, and before processing each relation, it rechecks.
>
> So there *is* a delay, which corresponds to how long it took to process
> the tables that preceded it in the list.

There's that, plus as of 9.1 autovacuum will skip tables that it can't
immediately take ShareUpdateExclusiveLock on (except in case of
impending wraparound).  So in the worst case we might have tables left
over from a previous run of the autovacuum worker that never got
processed.  And, of course, in any version, it's also possible for
autovacuum to get booted off the table by a conflicting lock request
that arrives midway through; the system might get shut down in the
middle of the run; autovacuum might error out in medias res; etc.

> Robert's suggestion would seem
> to make sense.  I'm not sure how to implement it: do we want some more
> (highly volatile) data points in pgstat?  Do we need some other
> mechanism?  This seems like a use case for pg_class_nt (see
> http://archives.postgresql.org/pgsql-patches/2006-06/msg00114.php)

I haven't looked at the implementation, but I like the concept.  It's
tempting to just shove everything in pgstat, but we already have
scalability limits there.

> In any case, given the "rebalancing" feature of vacuum_cost_delay (which
> increases the delay the more workers there are), the only "solution" to
> the problem of falling behind is reducing the delay parameter.  If you
> just add more workers, they start working more slowly.

Unless, of course, you already have the delay set to zero, in which
case starting more workers might be all you can do.  The case where
this really matters is with big tables, I think.  If you have #(big
tables) > #(autovacuum workers), it seems like you could end up in a
situation where the big tables pin down all the workers and no small
tables can get vacuumed for a long time.  Perhaps we can tune the
algorithm to compensate for that in some way, but I think we need more
raw data to work with first.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: tuning autovacuum

От
Bernd Helmle
Дата:

--On 9. Juni 2011 11:53:22 -0400 Greg Smith <greg@2ndquadrant.com> wrote:

> There are at least four interesting numbers to collect each time autovacuum
> runs:
>
> 1) This one, when was the threshold crossed.  I believe one of the AV workers
> would have to pause periodically to update these if they're all busy doing
> work.
> 2) What time did the last autovacuum start at
> 3) How many dead rows were there at the point when it started
> 4) When did the last autovacuum end (currently the only value stored)

This is what i've found me thinking about repeatingly in the past, too. I'd go 
further and expose the info or details issued by VACUUM VERBOSE into the view, 
too, at least the number of pages visited and cleaned (or dead but not yet 
cleaned). Customers are heavily interested in these numbers and i've found 
pgfouine to provide those numbers very useful. To have this information in a 
view would make monitoring infrastructure for this much easier (depending how 
easy or expensive tracking of those informations is, i didn't look into the 
code yet).

-- 
Thanks
Bernd


Re: tuning autovacuum

От
Greg Smith
Дата:
On 06/09/2011 04:43 PM, Bernd Helmle wrote:
> I'd go further and expose the info or details issued by VACUUM VERBOSE 
> into the view, too, at least the number of pages visited and cleaned 
> (or dead but not yet cleaned). Customers are heavily interested in 
> these numbers and i've found pgfouine to provide those numbers very 
> useful.

Agreed there.  The fact that VACUUM VERBOSE reports them suggests 
they're not too terribly difficult to track either.

What we'd probably need to do with those is handle them like the other 
stats in the system:  store a total number for visited/cleaned/dead for 
each relation, then increment the total as each vacuum finishes.  That 
way, you could point a standard monitoring system at it and see trends.  
Just saving the last snapshot of data there isn't as useful.

I'm seeing these as being like the counters in pg_stat_bgwriter; while 
it's easy to think of VACUUM "what work happened?" data as info you just 
want the last snapshot of, a continuous incrementing counter can do that 
and a lot of other things too.  Anyone who is extracting useful data 
from pg_stat_bgwriter can use the same logic to track this data, even if 
it only moves forward in big chunks as vacuum completes.  And it may be 
feasible to update it in the middle, too.

Stepping into implementation for a second, the stats that are showing up 
in pg_stat_user_tables are being driven by a PgStat_MsgVacuum message 
coming out of the stats collector when it finishes.  While that's the 
obvious place to put some more stuff, that's not necessarily the right 
way to build a better monitoring infrastructure.  Two things to consider:

-It's not really aimed at being called multiple times for one operation 
("needs vacuum", "started vacuum", "finished vacuum"
-There is a mix of things that make sense as long-term counters and 
things that update as snapshots--the timestamps are the main thing there.

I haven't thought about it enough to have a real opinion on whether you 
can squeeze everything into the existing message by adding more fields, 
or if another type of message is necessary.  Just pointing out that it's 
not trivially obvious which approach is better.

What is unambiguous is that all this new data is really going to need a 
new view for it, pg_stat_vacuum or something like that.  The fields that 
are already in pg_stat_user_tables can stay there as deprecated for a 
while, but this all wants to be in its own new view.

This would really be a nice medium sized feature that DBAs would love, 
and it would help adoption on big sites.  I have some ideas on how to 
get some funding to develop it because I keep running into this, but if 
someone wants to run with the idea I'd be happy to just help instead.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us




Re: tuning autovacuum

От
Tom Lane
Дата:
Greg Smith <greg@2ndQuadrant.com> writes:
> What we'd probably need to do with those is handle them like the other 
> stats in the system:  store a total number for visited/cleaned/dead for 
> each relation, then increment the total as each vacuum finishes.

As Robert said, we're already seeing scalability problems with the
pg_stats subsystem.  I'm not eager to add a bunch more per-table
counters, at least not without some prior work to damp down the ensuing
performance hit.
        regards, tom lane


Re: tuning autovacuum

От
Greg Smith
Дата:
On 06/09/2011 05:41 PM, Tom Lane wrote:
> As Robert said, we're already seeing scalability problems with the
> pg_stats subsystem.  I'm not eager to add a bunch more per-table
> counters, at least not without some prior work to damp down the ensuing
> performance hit.
>    

That's fair.  Anyone who is running into the sort of autovacuum issues 
prompting this discussion would happily pay the overhead to get better 
management of that; it's one of the easiest things to justify more 
per-table stats on IMHO.  Surely the per-tuple counters are vastly more 
of a problem than these messages could ever be.

But concerns about stats overload are why I was highlighting issues 
around sending multiple messages per vacuum, and why incremental updates 
as it runs are unlikely to work out.  Balancing that trade-off, getting 
enough data to help but not so such the overhead is obnoxious, is the 
non obvious tricky part of the design here.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us




Re: tuning autovacuum

От
Tom Lane
Дата:
Greg Smith <greg@2ndQuadrant.com> writes:
> That's fair.  Anyone who is running into the sort of autovacuum issues 
> prompting this discussion would happily pay the overhead to get better 
> management of that; it's one of the easiest things to justify more 
> per-table stats on IMHO.  Surely the per-tuple counters are vastly more 
> of a problem than these messages could ever be.

No, it's the total number of counters that I'm concerned about, not
so much when they get updated.
        regards, tom lane


Re: tuning autovacuum

От
Bruce Momjian
Дата:
Robert Haas wrote:
> On Wed, Jun 8, 2011 at 10:55 PM, Euler Taveira de Oliveira
> <euler@timbira.com> wrote:
> > Em 08-06-2011 20:35, Robert Haas escreveu:
> >> Is the hint correct? ?I mean, what if there were 100 small tables that
> >> needed vacuuming all at the same time. ?We'd hit this limit no matter
> >> how high you set autovacuum_max_workers, but it wouldn't be right to
> >> set it to 101 just because every once in a blue moon you might trip
> >> over the limit.
> >>
> > I think so. You are picturing a scene with only one message. It is the same
> > case of the too-frequent-checkpoint messages; i.e., you should look if those
> > messages have some periodicity.
> 
> Yeah, maybe.  I'm just not sure there would be an easy way for users
> to judge when they should or should not make a change.
> 
> >> I think it'd be really useful to expose some more data in this area
> >> though. ?One random idea is - remember the time at which a table was
> >> first observed to need vacuuming. Clear the timestamp when it gets
> >> vacuumed. ?Then you can do:
> >>
> > Hmmm. But this fine grained information alone doesn't help tuning the number
> > of autovacuum workers. I consider counters easier to implement and simpler
> > to analyze. But the timestamp idea has its merit because we already have a
> > similar statistic (last timestamp table was vacuumed or analyzed).
> 
> Well, it won't directly tell you how many you need.  But certainly if
> you see things getting further and further behind, you know you need
> more.
> 
> Or, alternatively, you need to reduce vacuum_cost_delay.  IME, that's
> actually the most common cause of this problem.

This thread from June died because there was concern about the overhead
of additional autovacuum statistics, and I have to say I am not
super-excited about it either because most users will not use them.

Ideally we would have something like checkpoint_warning that warns users
in the log when there are too few autovacuum workers and cleanup is
being delayed.

The big trick is how to accurately measure this.  The amount of time
that a table waits to be vacuumed probably isn't relevant enough --- it
might have been days since it was last vacuumed, and waiting 10 minutes
isn't a big deal, so it is hard to say what _scale_ we would give users
for that warning that would make sense.  We could compare it to the time
since the last autovacuum, but if the table is suddently heavily
modified, that doesn't help either.

I think it has to drive off of the 'n_dead_tuples' statistic value for the
table.  I was toying with the idea of comparing the n_dead_tuples value
at the time the table is first scanned for autovacuum consideration, and
the value at the time an autovacuum worker actually starts scanning the
table.

The problem there is that if someone does a massive DELETE in that time
interval, or does an UPDATE on all the rows, it would think that
autovacuum should have been there to mark some dead rows, but it was
not.  In the case of DELETE, having autovacuum work earlier would not
have helped, but it would have helped in the UPDATE case.

We could look at table size growth during that period.  If the
autovacuum had run earlier, we would have used that dead space, but is
wasn't recorded by autovacuum yet, but again, it seems vague.

Ideas?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: tuning autovacuum

От
Josh Berkus
Дата:
> Ideally we would have something like checkpoint_warning that warns users
> in the log when there are too few autovacuum workers and cleanup is
> being delayed.

I don't think that any table-stats based approach is going to work.  I
think you need to measure the queue of tables which need autovacuuming.So you do something like:

If > 10% of tables and > 10 tables need autovac/autoanalyze for more
than one polling interval in a row, then emit a warning.

Note that there are solutions other than adding workers; the user could
also lower the polling interval, decrease vacuum_delay, or do other
things to make autovac faster.

This would require tracking stats about the size of the autovac queue.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: tuning autovacuum

От
Robert Haas
Дата:
On Fri, Oct 14, 2011 at 12:59 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> Ideally we would have something like checkpoint_warning that warns users
>> in the log when there are too few autovacuum workers and cleanup is
>> being delayed.
>
> I don't think that any table-stats based approach is going to work.  I
> think you need to measure the queue of tables which need autovacuuming.
>  So you do something like:
>
> If > 10% of tables and > 10 tables need autovac/autoanalyze for more
> than one polling interval in a row, then emit a warning.
>
> Note that there are solutions other than adding workers; the user could
> also lower the polling interval, decrease vacuum_delay, or do other
> things to make autovac faster.
>
> This would require tracking stats about the size of the autovac queue.

Right.  It's my feeling that that's exactly what we need to do.  It's
similar to what we already do for checkpoint spreading, except applied
to a different system maintenance activity.  What would be really neat
is if we could not just detect the problem, but actually adjust the
cost delay on the fly to try to fix it - again, like we do with
checkpoints.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company