Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
От | Kyotaro HORIGUCHI |
---|---|
Тема | Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru |
Дата | |
Msg-id | 20181009.181536.142257785.horiguchi.kyotaro@lab.ntt.co.jp обсуждение исходный текст |
Ответ на | Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru (Michael Paquier <michael@paquier.xyz>) |
Ответы |
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
(Andrew Dunstan <andrew.dunstan@2ndquadrant.com>)
|
Список | pgsql-hackers |
At Fri, 5 Oct 2018 15:35:04 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20181005063504.GB14664@paquier.xyz> > On Fri, Oct 05, 2018 at 12:16:03PM +0900, Michael Paquier wrote: > > So, I have come back to this stuff, and finished with the attached > > instead, so as the assertion is in a single place. I find that > > clearer. The comments have also been improved. Thoughts? > > And so... I have been looking at committing this thing, and while > testing in-depth I have been able to trigger a case where an autovacuum > has been able to be not aggressive but anti-wraparound, which is exactly > what should not be possible, no? I have simply created an instance with > autovacuum_freeze_max_age = 200000, then ran pgbench with > autovacuum_freeze_table_age=200000 set for each table, and also ran > installcheck-world in parallel. This has been able to trigger the > assertion pretty quickly. I investigated it and in short, it can happen. It is a kind of race consdition between two autovacuum processes. do_autovacuum() looks into pg_class (using a snapshot) and vacuum_set_xid_limits() looks into relcache. If concurrent vacuum happens and one has finished the relation, another gets relcache invalidation and relfrozenxid is updated. If this happens between do_autovacuum() and vacuum_set_xid_limits(), the latter sees newer relfrozenxid than the former. The problem happens when it moves by more than 5% of autovacuum_freeze_max_age. If lazy_vacuum_rel() sees the situation, the relation is already aggressively vacuumed by a cocurrent worker. We can just ingore the state safely but also we know that the vacuum is useless. 1. Just allow the case there (and add comment). Causes redundant anti-wraparound vacuum. 2. Skip the relation by the condition. I think that we can safely skip the relation in the case. (attached) 3. Ensure that do_autovacuum always sees the same relfrozenxid with vacuum_set_xid_limits(). 4. Prevent concurrent acuuming of the same relation rigorously, somehow. Thoughts? regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index 8996d366e9..436d573454 100644 --- a/src/backend/commands/vacuumlazy.c +++ b/src/backend/commands/vacuumlazy.c @@ -249,6 +249,21 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params, if (options & VACOPT_DISABLE_PAGE_SKIPPING) aggressive = true; + /* + * When is_wraparound, relfrozenxid is old enough and aggressive must be + * true here. However, if another concurrent vacuum process has processed + * this relation, relfrozenxid in relcache can be moved forward enough so + * that aggressive is calculated as false here. This relation no longer + * needs to be vacuumed. Bail out. + */ + if (params->is_wraparound && !aggressive) + { + elog(DEBUG1, "relation %d has been vacuumd ocncurrently, skip", + onerel->rd_id); + return; + } + + vacrelstats = (LVRelStats *) palloc0(sizeof(LVRelStats)); vacrelstats->old_rel_pages = onerel->rd_rel->relpages;
В списке pgsql-hackers по дате отправления: