Обсуждение: autovacuum scheduling starvation and frenzy

Поиск
Список
Период
Сортировка

autovacuum scheduling starvation and frenzy

От
Jeff Janes
Дата:
In testing 9.4 with some long running tests, I noticed that autovacuum launcher/worker sometimes goes a bit nuts.  It vacuums the same database repeatedly without respect to the nap time.

As far as I can tell, the behavior is the same in older versions, but I haven't tested that.

This is my understanding of what is happening:

If you have a database with a large table in it that has just passed autovacuum_freeze_max_age, all future workers will be funnelled into that database until the wrap-around completes.  But only one of those workers can actually vacuum the one table which is holding back the frozenxid. Maybe the 2nd worker to come along will find other useful work to do, but eventually all the vacuuming that needs doing is already in progress, and so each worker starts up, gets directed to this database, finds it can't help, and exits.  So all other databases are entirely starved of autovacuuming for the entire duration of the wrap-around vacuuming of this one large table.

Also, the launcher decides when to launch the next worker by looking at the scheduled time of the least-recently-vacuumed database (with the implicit intention that that is the one that will get chosen to vacuum next).  But since the worker gets redirected to the wrap-around database instead of the least-recently-vacuumed database, the least-recently-vacuumed database never gets it schedule updated and always looks like it is chronologically overdue.  That means the launcher keeps launching new workers as fast as the previous ones exit, ignoring the nap time. So there is one long running worker actually making progress, plus a frenzy of workers all attacking the same database, finding that there is nothing they can do.

I think that a database more than autovacuum_freeze_max_age should get first priority, but only if its next scheduled vacuum time is in the past.  If it can beneficially use more than one vacuum worker, they would usually accumulate there naturally within a few naptimes iterations[1].  And if it can't usefully use more than one worker, don't prevent other databases from using them.

[1] you could argue that all other max_workers processes could become pinned down in long running vacuums of other nonrisk databases between the time that the database crosses autovacuum_freeze_max_age (and has its first worker started), and the time its nap time expires and so it becomes eligible for a second one.  But that seems like a weak argument, as it could just have easily happened that all of them got pinned down in nonrisk databases a few transactions *before* the database crosses autovacuum_freeze_max_age in the first place.

Does this analysis and proposal seem sound?

Cheers,

Jeff

Re: autovacuum scheduling starvation and frenzy

От
Alvaro Herrera
Дата:
Jeff Janes wrote:

> If you have a database with a large table in it that has just passed
> autovacuum_freeze_max_age, all future workers will be funnelled into that
> database until the wrap-around completes.  But only one of those workers
> can actually vacuum the one table which is holding back the frozenxid.
> Maybe the 2nd worker to come along will find other useful work to do, but
> eventually all the vacuuming that needs doing is already in progress, and
> so each worker starts up, gets directed to this database, finds it can't
> help, and exits.  So all other databases are entirely starved of
> autovacuuming for the entire duration of the wrap-around vacuuming of this
> one large table.

Bah.  Of course :-(

Note that if you have two databases in danger of wraparound, the oldest
will always be chosen until it's no longer in danger.  Ignoring the
second one past freeze_max_age seems bad also.

This code is in autovacuum.c, do_start_worker().  Not sure what does
your proposal look like in terms of code.  I think that instead of
trying to get a single target database in that foreach loop, we could
try to build a prioritized list (in-wraparound-danger first, then
in-multixid-wraparound danger, then the one with the oldest autovac time
of all the ones that remain); then recheck the wrap-around condition by
seeing whether there are other workers in that database that started
after the wraparound condition appeared.  If there are, move down the
list.  The first in the list not skipped is chosen for vacuuming.

(Do we need to consider the situation that all databases were skipped by
the above logic, and if so then perhaps pick up the first DB in the
list?)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: autovacuum scheduling starvation and frenzy

От
Jeff Janes
Дата:
On Thu, May 15, 2014 at 12:55 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Jeff Janes wrote:

> If you have a database with a large table in it that has just passed
> autovacuum_freeze_max_age, all future workers will be funnelled into that
> database until the wrap-around completes.  But only one of those workers
> can actually vacuum the one table which is holding back the frozenxid.
> Maybe the 2nd worker to come along will find other useful work to do, but
> eventually all the vacuuming that needs doing is already in progress, and
> so each worker starts up, gets directed to this database, finds it can't
> help, and exits.  So all other databases are entirely starved of
> autovacuuming for the entire duration of the wrap-around vacuuming of this
> one large table.

Bah.  Of course :-(

Note that if you have two databases in danger of wraparound, the oldest
will always be chosen until it's no longer in danger.  Ignoring the
second one past freeze_max_age seems bad also.

I'm not sure how bad that is.  If you really do want to get the frozenxid advanced as soon as possible, it makes sense to focus on one at a time, rather than splitting the available IO throughput between two of them.  So I wouldn't go out of my way to enable two to run at the same time, nor go out of my way to prevent it.

If most wrap around scans were done as part of a true emergency it would make sense to forbid all other vacuums (but only if you also automatically disabled autovacuum_vacuum_cost_delay as part of the emergency) so as not to divide up the IO throughput.  But most are not emergencies, as 200,000,000 is a long way from 2,000,000,000.  

 

This code is in autovacuum.c, do_start_worker().  Not sure what does
your proposal look like in terms of code.  

I wasn't sure either, I was mostly trying the analyze the situation.  But I decided just moving the "skipit" chunk of code to above the wrap-around code might work for experimental purposes, as attached.  It has been running for a  few of hours that way and I no longer see the frenzies occurring whenever pgbench_history gets vacuumed..  

But I can't figure out why we sometimes use adl_next_worker and sometimes use last_autovac_time, which makes me question how much I really understand this code.


 
I think that instead of
trying to get a single target database in that foreach loop, we could
try to build a prioritized list (in-wraparound-danger first, then
in-multixid-wraparound danger, then the one with the oldest autovac time
of all the ones that remain); then recheck the wrap-around condition by
seeing whether there are other workers in that database that started
after the wraparound condition appeared.

I think we would want to check for one worker that is still running, and at least one other worker that started and completed since the wraparound threshold was exceeded.  If there are multiple tables in the database that need full scanning, it would make sense to have multiple workers.  But if a worker already started and finished without increasing the frozenxid and, another attempt probably won't accomplish much either.  But I have no idea how to do that bookkeeping, or how much of an improvement it would be over something simpler.

Cheers,

Jeff
Вложения

Re: autovacuum scheduling starvation and frenzy

От
Jeff Janes
Дата:
On Thu, May 15, 2014 at 4:06 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Thu, May 15, 2014 at 12:55 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:
>>
>> Jeff Janes wrote:
>>
>> > If you have a database with a large table in it that has just passed
>> > autovacuum_freeze_max_age, all future workers will be funnelled into
>> > that
>> > database until the wrap-around completes.  But only one of those workers
>> > can actually vacuum the one table which is holding back the frozenxid.
>> > Maybe the 2nd worker to come along will find other useful work to do,
>> > but
>> > eventually all the vacuuming that needs doing is already in progress,
>> > and
>> > so each worker starts up, gets directed to this database, finds it can't
>> > help, and exits.  So all other databases are entirely starved of
>> > autovacuuming for the entire duration of the wrap-around vacuuming of
>> > this
>> > one large table.
>>
>> Bah.  Of course :-(
>>
>> Note that if you have two databases in danger of wraparound, the oldest
>> will always be chosen until it's no longer in danger.  Ignoring the
>> second one past freeze_max_age seems bad also.
>
>
> I'm not sure how bad that is.  If you really do want to get the frozenxid
> advanced as soon as possible, it makes sense to focus on one at a time,
> rather than splitting the available IO throughput between two of them.  So I
> wouldn't go out of my way to enable two to run at the same time, nor go out
> of my way to prevent it.
>
> If most wrap around scans were done as part of a true emergency it would
> make sense to forbid all other vacuums (but only if you also automatically
> disabled autovacuum_vacuum_cost_delay as part of the emergency) so as not to
> divide up the IO throughput.  But most are not emergencies, as 200,000,000
> is a long way from 2,000,000,000.
>
>
>>
>>
>> This code is in autovacuum.c, do_start_worker().  Not sure what does
>> your proposal look like in terms of code.
>
>
> I wasn't sure either, I was mostly trying the analyze the situation.  But I
> decided just moving the "skipit" chunk of code to above the wrap-around code
> might work for experimental purposes, as attached.  It has been running for
> a  few of hours that way and I no longer see the frenzies occurring whenever
> pgbench_history gets vacuumed..

I didn't add this patch to the commitfest, because it was just a point
for discussion and not actually proposed for application.  But It
doesn't seem to have provoked much discussion either.

Should I go add this to the next commitfest?

I do see it listed as a resolved item in
https://wiki.postgresql.org/wiki/PostgreSQL_9.4_Open_Items

But I can't find a commit that would resolve it, so does that mean the
resolution was that the behavior was not new in 9.4 and so didn't need
to be fixed for it?

Cheers,

Jeff



Re: autovacuum scheduling starvation and frenzy

От
Tom Lane
Дата:
Jeff Janes <jeff.janes@gmail.com> writes:
> I didn't add this patch to the commitfest, because it was just a point
> for discussion and not actually proposed for application.  But It
> doesn't seem to have provoked much discussion either.

> Should I go add this to the next commitfest?

> I do see it listed as a resolved item in
> https://wiki.postgresql.org/wiki/PostgreSQL_9.4_Open_Items

> But I can't find a commit that would resolve it, so does that mean the
> resolution was that the behavior was not new in 9.4 and so didn't need
> to be fixed for it?

It looks to me like Robert added that item to the "open items" page,
but he put it at the bottom --- ie in the "already resolved items"
list:

https://wiki.postgresql.org/index.php?title=PostgreSQL_9.4_Open_Items&diff=22417&oldid=22380

Probably this was a mistake and it should have gone into the still-to-do
list.
        regards, tom lane



Re: autovacuum scheduling starvation and frenzy

От
Robert Haas
Дата:
On Mon, Jun 23, 2014 at 7:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> I didn't add this patch to the commitfest, because it was just a point
>> for discussion and not actually proposed for application.  But It
>> doesn't seem to have provoked much discussion either.
>
>> Should I go add this to the next commitfest?
>
>> I do see it listed as a resolved item in
>> https://wiki.postgresql.org/wiki/PostgreSQL_9.4_Open_Items
>
>> But I can't find a commit that would resolve it, so does that mean the
>> resolution was that the behavior was not new in 9.4 and so didn't need
>> to be fixed for it?
>
> It looks to me like Robert added that item to the "open items" page,
> but he put it at the bottom --- ie in the "already resolved items"
> list:
>
> https://wiki.postgresql.org/index.php?title=PostgreSQL_9.4_Open_Items&diff=22417&oldid=22380
>
> Probably this was a mistake and it should have gone into the still-to-do
> list.

Yeah.  Oops.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: autovacuum scheduling starvation and frenzy

От
Alvaro Herrera
Дата:
Jeff Janes wrote:

> > I think that instead of
> > trying to get a single target database in that foreach loop, we could
> > try to build a prioritized list (in-wraparound-danger first, then
> > in-multixid-wraparound danger, then the one with the oldest autovac time
> > of all the ones that remain); then recheck the wrap-around condition by
> > seeing whether there are other workers in that database that started
> > after the wraparound condition appeared.
>
> I think we would want to check for one worker that is still running, and at
> least one other worker that started and completed since the wraparound
> threshold was exceeded.  If there are multiple tables in the database that
> need full scanning, it would make sense to have multiple workers.  But if a
> worker already started and finished without increasing the frozenxid and,
> another attempt probably won't accomplish much either.  But I have no idea
> how to do that bookkeeping, or how much of an improvement it would be over
> something simpler.

How about something like this:

* if autovacuum is disabled, then don't check these conditions; the only
reason we're in do_start_worker() in that case is that somebody
signalled postmaster that some database needs a for-wraparound emergency
vacuum.

* if autovacuum is on, and the database was processed less than
autovac_naptime/2 ago, and there are no workers running in that database
now, then ignore the database.

Otherwise, consider it for xid-wraparound vacuuming.  So if we launched
a worker recently, but it already finished, we would start another one.
(If the worker finished, the database should not be in need of a
for-wraparound vacuum again, so this seems sensible).  Also, we give
priority to a database in danger sooner than the full autovac_naptime
period; not immediately after the previous worker started, which should
give room for other databases to be processed.

The attached patch implements that.  I only tested it on HEAD, but
AFAICS it applies cleanly to 9.4 and 9.3; fairly sure it won't apply to
9.2.  Given the lack of complaints, I'm unsure about backpatching
further back than 9.3 anyway.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: autovacuum scheduling starvation and frenzy

От
Alvaro Herrera
Дата:
Alvaro Herrera wrote:

> The attached patch implements that.  I only tested it on HEAD, but
> AFAICS it applies cleanly to 9.4 and 9.3; fairly sure it won't apply to
> 9.2.  Given the lack of complaints, I'm unsure about backpatching
> further back than 9.3 anyway.

FWIW my intention is to make sure this patch is in 9.4beta3.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: autovacuum scheduling starvation and frenzy

От
Robert Haas
Дата:
On Tue, Sep 30, 2014 at 5:59 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Jeff Janes wrote:
>> > I think that instead of
>> > trying to get a single target database in that foreach loop, we could
>> > try to build a prioritized list (in-wraparound-danger first, then
>> > in-multixid-wraparound danger, then the one with the oldest autovac time
>> > of all the ones that remain); then recheck the wrap-around condition by
>> > seeing whether there are other workers in that database that started
>> > after the wraparound condition appeared.
>>
>> I think we would want to check for one worker that is still running, and at
>> least one other worker that started and completed since the wraparound
>> threshold was exceeded.  If there are multiple tables in the database that
>> need full scanning, it would make sense to have multiple workers.  But if a
>> worker already started and finished without increasing the frozenxid and,
>> another attempt probably won't accomplish much either.  But I have no idea
>> how to do that bookkeeping, or how much of an improvement it would be over
>> something simpler.
>
> How about something like this:
>
> * if autovacuum is disabled, then don't check these conditions; the only
> reason we're in do_start_worker() in that case is that somebody
> signalled postmaster that some database needs a for-wraparound emergency
> vacuum.
>
> * if autovacuum is on, and the database was processed less than
> autovac_naptime/2 ago, and there are no workers running in that database
> now, then ignore the database.
>
> Otherwise, consider it for xid-wraparound vacuuming.  So if we launched
> a worker recently, but it already finished, we would start another one.
> (If the worker finished, the database should not be in need of a
> for-wraparound vacuum again, so this seems sensible).  Also, we give
> priority to a database in danger sooner than the full autovac_naptime
> period; not immediately after the previous worker started, which should
> give room for other databases to be processed.
>
> The attached patch implements that.  I only tested it on HEAD, but
> AFAICS it applies cleanly to 9.4 and 9.3; fairly sure it won't apply to
> 9.2.  Given the lack of complaints, I'm unsure about backpatching
> further back than 9.3 anyway.

This kind of seems like throwing darts at the wall.  It could be
better if we are right to skip the database already being vacuumed for
wraparound, or worse if we're not.

I'm not sure that we should do this at all, or at least not without
testing it extensively first.  We could easily shoot ourselves in the
foot.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: autovacuum scheduling starvation and frenzy

От
Alvaro Herrera
Дата:
Robert Haas wrote:

> This kind of seems like throwing darts at the wall.  It could be
> better if we are right to skip the database already being vacuumed for
> wraparound, or worse if we're not.

Well, it only skips the DB for half the naptime interval, so that other
databases have a chance to be chosen before that.  If you set up a
nonsensical interval such as one day, this might be problematic.

(I'm not sure I understand the darts analogy.)

Maybe instead of some interval we could have a flag that alternates
between on and off: let one other database be chosen, then the one in
danger, then some other database again.  But if you have large numbers
of databases, this isn't a very solution; you only waste half the
workers rather than all of them .. meh.

Here's another idea: have a counter of the number of tables that are in
danger of xid/multixact wraparound; only let that many workers process
the database in a row.  Of course, the problem is how to determine how
many tables are in danger when we haven't even connected to the database
in the first place.  We could try to store a counter in pgstats, ugh.
Or have the first for-wraparound worker store a number in shared memory
which launcher can read.  Double ugh.


> I'm not sure that we should do this at all, or at least not without
> testing it extensively first.  We could easily shoot ourselves in the
> foot.

Well, we need to do *something*, because having workers directed towards
a database on which they can't do any good causes problems too -- other
databases accumulate bloat in the meantime.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: autovacuum scheduling starvation and frenzy

От
Robert Haas
Дата:
On Wed, Oct 1, 2014 at 11:44 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Robert Haas wrote:
>> This kind of seems like throwing darts at the wall.  It could be
>> better if we are right to skip the database already being vacuumed for
>> wraparound, or worse if we're not.
>
> Well, it only skips the DB for half the naptime interval, so that other
> databases have a chance to be chosen before that.  If you set up a
> nonsensical interval such as one day, this might be problematic.
>
> (I'm not sure I understand the darts analogy.)

I guess I meant: this seems pretty hit-or-miss.  I don't see why we
should expect it to be better than what we have now.  Sure, maybe
there's a table in some other database that needs to be vacuumed for
bloat more urgently than a table in the wraparound database needs to
be vacuumed to prevent XID wraparound.  But the reverse could be true
also - in which case your patch could cause a cluster that would
merely have bloated to instead shut down.

The way to really know would be for the AV launcher to have knowledge
of how many tables there are in each database that are beyond the
wraparound theshold and not already been vacuumed.  Then we could skip
wraparound databases where that number is 0, and give priority to
those where it isn't.  I guess this is more or less what you said in
the portion of your email I'm not quoting here, but like you I'm not
quite sure how to implement that.  Still, I'm reluctant to just go
change the behavior; I think it's optimistic to think that any
algorithm for making decisions without real knowledge will be better
than any other.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company