Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
От | Alvaro Herrera |
---|---|
Тема | Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower. |
Дата | |
Msg-id | 20160315204026.GA29216@alvherre.pgsql обсуждение исходный текст |
Ответ на | Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower. (David Gould <daveg@sonic.net>) |
Ответы |
Re: BUG #13750: Autovacuum slows down with large numbers of
tables. More workers makes it slower.
(David Gould <daveg@sonic.net>)
|
Список | pgsql-bugs |
David Gould wrote: > On Mon, 29 Feb 2016 18:33:50 -0300 > Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > > Hi David, did you ever post an updated version of this patch? > > No. Let me fix that now. I've attached my current revision of the patch > based on master. This version is significantly better than the original > version and resolves multiple issues: Thanks for the patch. I spent some time studying it and testing it. I think there's a minor bug in this latest version: the "continue" when the "for" loop exits because of all relids being less than highest_oid_claimed should be a "break" instead. Otherwise the worker will uselessly loop over the whole list of OIDs only to find each time that it has nothing to do. After seeing it work under pgbench inserting and deleting large numbers of tuples into a large number of tables, I figured that the algorithm you described doesn't actually work in certain corner cases: consider what happens if table with oid INT_MAX is huge and very bloated. At some point one worker is going to be stuck working on it for a long time, and from that point until the big vacuum is finished, all workers will see that one worker is processing a table with maximal OID and decide they can skip all tables in their work lists. I tested this by creating one table after all the other test tables were created (so it'd have a higher OID than the others -- but keep an eye on OID wraparound in your test install), setting a nonzero vacuum_cost_limit, then filling it with dead tuples in pgbench via "begin; insert into deadtups select * from generate_series(1, 100000); rollback". I can see that one autovacuum process is stuck with it for over a minute, and while this one runs there is no other autovacuum activity in the database. We need to cater for this scenario, because it is going to bit somebody. I think it might work to still use the OID-sorted list of tables to process, but instead of pruning based just on the highest OID across all workers, we need to consider only the OID ranges that we've seen processed while the current worker has been running. Seems fiddly :-( Another thing that makes me uneasy about this approach is that we've had complaints that some tables that have just enough dead tuples generated between each autovacuum run on a DB are vacuumed by all workers all the time, cause vacuum-starvation of other tables that accumulate more dead tuples. It seems to me we should prioritise tables to vacuum depending on how much they actually need vacuum. If we depend on an algorithm that sorts by OID, we will never be able to prioritise anything. One seemingly trivial way to attack this problem is to forget the sorting idea and just keep track in shared memory of the last (say) 100 tables that have been vacuumed by each worker, alongside with a timestamp for each. Using that, each worker can determine whether each table on its own list has been vacuumed after it obtained its list, and skip it if so. Since there's no specific sort order required, we could in the future change the ordering in which tables are processed for other reasons. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-bugs по дате отправления:
Следующее
От: David GouldДата:
Сообщение: Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.