Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
От | Peter Geoghegan |
---|---|
Тема | Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation |
Дата | |
Msg-id | CAH2-Wz=pmxLZE+CjU0GgeogQ_gnQqoYo8ov87q6LweXKCUAtgw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
(Robert Haas <robertmhaas@gmail.com>)
|
Список | pgsql-hackers |
On Fri, Jan 20, 2023 at 5:47 AM Robert Haas <robertmhaas@gmail.com> wrote: > Yeah, this is a major reason why I'm very leery about changes in this > area. A lot of autovacuum behavior is emergent, in the sense that it > wasn't directly intended by whoever wrote the code. It's just a > consequence of other decisions that probably seemed very reasonable at > the time they were made but turned out to have surprising and > unpleasant consequences. I certainly agree with your general description of the ways things are. To a large degree we're living in a world where DBAs have already compensated for some of the autovacuum shortcomings discussed on this thread. For example, by setting autovacuum_vacuum_scale_factor (and even autovacuum_vacuum_insert_scale_factor) to very low values, to compensate for the issues with random sampling of dead tuples by analyze, and to compensate for the way that VACUUM doesn't reason correctly about how the number of dead tuples changes as VACUUM runs. They might not have thought of it that way -- it could have happened as a byproduct of tuning a production system through trial and error -- but it still counts as compensating for a defect in autovacuum scheduling IMV. It's actually quite likely that even a strict improvement to (say) autovacuum scheduling will cause some number of regressions, since now what were effectively mitigations become unnecessary. This is somewhat similar to the dynamic with optimizer improvements, where (say) a selectivity estimate function that's better by every available metric can still easily cause regressions that really cannot be blamed on the improvement itself. I personally believe that it's a price worth paying when it comes to the issues with autovacuum statistics, particularly the dead tuple count issues. Since much of the behavior that we sometimes see is just absurdly bad. We have both water tight theoretical arguments and practical experiences pointing in that direction. > In this particular case, I think that there is a large risk that > postponing auto-cancellation will make things significantly worse, > possibly drastically worse, for a certain class of users - > specifically, those whose vacuums often get auto-cancelled. I agree that that's a real concern for the autocancellation side of things. That seems quite different to the dead tuples accounting issues, in that nobody would claim that the existing behavior is flagrantly wrong (just that it sometimes causes problems instead of preventing them). > That's why I really liked your idea of decoupling auto-cancellation > from XID age. Such an approach can still avoid disabling > auto-cancellation just because autovacuum_freeze_max_age has been hit, > but it can also disable it much earlier when it detects that doing so > is necessary to make progress. To be clear, I didn't think that that's what Andres was proposing, and my recent v5 doesn't actually do that. Even in v5, it's still fundamentally impossible for autovacuums that are triggered by the tuples inserted or dead tuples thresholds to not be autocancellable. ISTM that it doesn't make sense to condition the autocancellation behavior on table XID age in the context of dead tuple VACUUMs. It could either be way too early or way too late at that point. I was rather hoping to not have to build the infrastructure required for fully decoupling the autocancellation behavior from the triggering condition (dead tuples vs table XID age) in the scope of this thread/patch, though I can see the appeal of that. The only reason why I'm using table age at all is because that's how it works already, rightly or wrongly. If nothing else, t's pretty clear that there is no theoretical or practical reason why it has to be exactly the same table age as the one for launching autovacuums to advance relfrozenxid/relminmxid. In v5 of the patch, the default is to use 1.8x of the threshold that initially makes autovacuum.c want to launch autovacuums to deal with table age. That's based on a suggestion from Andres, but I'd be almost as happy with a default as low as 1.1x or even 1.05x. That's going to make very little difference to those users that really rely on the no-auto-cancellation behavior, while at the same time making things a lot safer for scenarios like the Joyent/Manta "DROP TRIGGER" outage (not perfectly safe, by any means, but meaningfully safer). -- Peter Geoghegan
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Robert HaasДата:
Сообщение: Re: almost-super-user problems that we haven't fixed yet