Re: Why is INSERT-driven autovacuuming based on pg_class.reltuples?
От | Justin Pryzby |
---|---|
Тема | Re: Why is INSERT-driven autovacuuming based on pg_class.reltuples? |
Дата | |
Msg-id | 20220130170050.GS23027@telsasoft.com обсуждение исходный текст |
Ответ на | Re: Why is INSERT-driven autovacuuming based on pg_class.reltuples? (Peter Geoghegan <pg@bowt.ie>) |
Список | pgsql-hackers |
On Thu, Jan 27, 2022 at 01:59:38PM -0800, Peter Geoghegan wrote: > On Thu, Jan 27, 2022 at 12:20 PM Peter Geoghegan <pg@bowt.ie> wrote: > > Both VACUUM and ANALYZE update pg_class.reltuples. But this code seems > > to assume that it's only something that VACUUM can ever do. Why > > wouldn't we expect a plain ANALYZE to have actually been the last > > thing to update pg_class.reltuples for an append-only table? Wouldn't > > that lead to less frequent (perhaps infinitely less frequent) > > vacuuming for an append-only table, relative to the documented > > behavior of autovacuum_vacuum_insert_scale_factor? > > PgStat_StatTabEntry.inserts_since_vacuum will continue to grow and > grow as more tuples are inserted, until VACUUM actually runs, no > matter what. That largely explains why this bug was missed before now: > it's inevitable that inserts_since_vacuum will become large at some > point -- even large relative to a bogus scaled > pg_class.reltuples-at-ANALYZE threshold (unless ANALYZE hasn't been > run since the last VACUUM, in which case pg_class.reltuples will be at > the expected value anyway). And so we'll eventually get to the point > where so many unvacuumed inserted tuples have accumulated that an > insert-driven autovacuum still takes place. Maybe I'm missed your point, but I think this may not rise to the level of being a "bug". If we just vacuumed an insert-only table, and then insert some more, and autoanalyze runs and updates reltuples, what's wrong with vac_ins_scale_factor * reltuples + vac_ins_base_thresh ? You're saying reltuples should be the number of tuples at the last vacuum instead of the most recent value from either vacuum or analyze ? It's true that the vacuum threshold may be hit later than if reltuples hadn't been updated by ANALYZE. If that's what you're referring to, that's the behavior of scale factor in general. If a table is growing in size at a constant rate, analyze will run at decreasing frequency. With the default 20% scale factor, it'll first run at 1.2x the table's initial size, then at 1.44 (not 1.4), then at 1.728 (not 1.6), then at 2.0736 (not 1.8). That's not necessarily desirable, but it's not necessarily wrong, either. If your table doubles in size, you might have to adjust these things. Maybe there should be another knob allowing perfect, "geometric" (or other) frequency, but the behavior is not new in this patch. We talked about that here. https://www.postgresql.org/message-id/flat/20200305172749.GK684%40telsasoft.com#edac59123843f9f8e1abbc2b570c76f1 With the default values, analyze happens after 10% growth, and vacuum happens after 20% (which ends up being 22% of the initial table size). The goal of this patch was to have inserts trigger autovacuum *at all*. This behavior may be subtle or non-ideal, but not a problem? The opposite thing could also happen - *vacuum* could update reltuples, causing the autoanalyze threshold to be hit a bit later. -- Justin
В списке pgsql-hackers по дате отправления: