Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
От | David Gould |
---|---|
Тема | Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower. |
Дата | |
Msg-id | 20151031000108.20fa7635@engels обсуждение исходный текст |
Ответ на | Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower. (Jeff Janes <jeff.janes@gmail.com>) |
Список | pgsql-bugs |
On Fri, 30 Oct 2015 22:16:04 -0700 Jeff Janes <jeff.janes@gmail.com> wrote: > On Fri, Oct 30, 2015 at 9:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > >> Tom Lane wrote: > >>> Good point ... shouldn't we have already checked the stats before ever > >>> deciding to try to claim the table? > > > >> The second check is there to allow for some other worker (or manual > >> vacuum) having vacuumed it after we first checked, but which had > >> finished before we check the array of current jobs. > > > > I wonder whether that check costs more than it saves. > > A single autovacuum worker can run for hours or days. I don't think > we should start vacuuming a TB size table because it needed vacuuming > days ago, when the initial to-do list was built up, but no longer > does. So some kind of recheck is needed. > > I thought of making the recheck first use whichever snapshot we > currently have hanging around, and then only if it still needs > vacuuming force a fresh snapshot and re-re-check. The problem with > that is that any previous snapshot is aggressively destroyed at the > end of each vacuum or analyze by the EOXact code. So we don't > actually have a snapshot hanging around to use until we go to the work > of re-parsing the database stats file. So you have to take special > steps to exempt AutoVacuumWorker from EOXact code clearing out the > stats, and I don't know what the repercussions of that might be. > > We could also relax the freshness requirements of even the final > re-check, perhaps dynamically. No point in re-parsing a 40MB stats > file to avoid unnecessary vacuuming a 16KB table. But parsing a 8KB > stats files to avoid unnecessary vacuuming of a 1TB table is well > worth it. But that runs into the same problem as above. Once you > have destroyed your previous stats snapshot, you no longer have the > ability to accept stale stats any longer. I've been thinking about this and frankly the whole scheme is a bit suspect. There are a couple of problems: 1. In an 8TB database with 200,000 tables it is not reasonable to build a list in advance of all the tables to vacuum. Inevitably some workers will be working off very stale lists. 2. Autovacuum workers do not need a statistics snapshot. They only need the current statistics for the table they are considering vacuuming. I think the following might be too big to back patch or maybe even for 9.5, but I think a better overall scheme would look something like: - The stats collector accepts a new request message, "give me the stats for this: (dboid, table oid)" and handles it by replying to the sender (autovacuum worker) with the current statistics entry for that table. - Instead of the worker building a big list of work up front, it just keeps track of a shared highwater mark of tables worked on and does an index scan for the next higher oid in pg_class. This is similar to my patch in that vacuuming is ordered so that workers never contend, but instead of working from a list they look up the next table in pg_class each time using the oid index. For each table that might need vacuuming it would send the stats collector the request for that tables statistics and use the reply to make determine what action to take. This avoids the stale work list and the need for rechecking. Also, it eliminates all the flogging of the stats files (I suspect SSD owners and laptop users will rejoice!). -dg -- David Gould 510 282 0869 daveg@sonic.net If simplicity worked, the world would be overrun with insects.
В списке pgsql-bugs по дате отправления: