Обсуждение: What specific circumstances trigger Autovacuum wraparound

Поиск
Список
Период
Сортировка

What specific circumstances trigger Autovacuum wraparound

От
Daniel Lagerman
Дата:
Hello!


And I think i understand how the settings are supposed work, however I do have a question in regards to Autovaccum wrap-around and how it is triggered. I have a pg 9.4.3 server that has one table with calculate age of about 320 million, this is the oldest table in this database and therefore the age of the DB is the same as the table. This is calculated with the SQL provided on the page above.

Before I get into the settings I have I will explain the behavior. Right now there is 6 Autovaccum wraparound running, I have 6 workers configured so nothing strange about that, but what puzzles me is that only 3 out of 6 tables has not recently completed an Autovaccum and only one table was over the old limit when it started.

The settings that are set are as follows:
vacuum_freeze_min_age = 10000000
autovacuum_freeze_max_age = 600000000
vacuum_freeze_table_age = 400000000
vacuum_cost_delay = 10
vacuum_cost_limit = 2000
autovacuum_vacuum_cost_limit = 200
autovacuum_vacuum_cost_delay = 20

All other settings for vacuum/autovacuum are PG 9.4.3 default. And server has been restarted after settings were changed.

If I read the article above and I quote "autovacuum will be invoked on it approximately once every autovacuum_freeze_max_age minus vacuum_freeze_min_age transactions" to me it indicates that this would be around every 590 million transactions after I did my change. But maybe it is to late once it has triggered? With the default values it would be around 150 million which seems to add up more that what I see.

I also read and I quote " so it can be useful to try to maximize the interval between forced autovacuums on very large static tables. Obviously one can do this either by increasing autovacuum_freeze_max_age or decreasing vacuum_freeze_min_age "

In my scenario the table is not static, but I interpret that you could delay vacuum by changing these parameters.

What also puzzles is that even once PG completes an Autovaccum (all table except 3 right now) the Autovacuum wraparound just loops over every table once more. Some tables have been Autovaccumed wraparound 4-5 times since the last restart of PG server 2 days ago. It seems to me that due to that the database itself is over the default limit Autovacuum is running on a loop for all tables to try and get it below that limit.

There are no normal vacuums as I can see, which means I can't run anything like flexible freeze during off hours (middle of the night) without doing a manual cancel query on the Autovaccum which might not be a good idea.

I did attempt to run flexible freeze for 45 hours during the weekend (with very aggressive settings compared to Autovaccum),  and it did not complete the table which is around 1 TB large in that time. I'm worried that these two vacuums are fighting/resetting each other and will never complete if I keep doing that, so I have stopped that for now.

I have another server, same specs, where the same table is actually larger in disk size, than the server i'm currently working on, where I do not see this behavior and vacuum actually completed in a couple of days of running.

Hopefully I have not missed anything very obvious. I'm most interested in and explanation of why PG is behaving as it is, as it's always good to understand why something is happening even if you can't change it and also in regards to cancel Autovacuum to run flexible freeze during off hours in order to try and speed up the process, is it setting me back or not.

Best regards

Daniel


Re: What specific circumstances trigger Autovacuum wraparound

От
"David G. Johnston"
Дата:
On Monday, June 4, 2018, Daniel Lagerman <spjheruur@gmail.com> wrote:
 I have a pg 9.4.3 server that has one table with calculate age of about 320 million, this is the oldest table in this database and therefore the age of the DB is the same as the table.

It would be advisable to simply upgrade to 9.4.18 and see if any of the various vacuum related bug fixes solves your problem.  That's about the extent of my useful advice for this...

David J.

Re: What specific circumstances trigger Autovacuum wraparound

От
Daniel Lagerman
Дата:
Hello David!

Thanks for taking the time to respond. Upgrading is not in scope right now for this server even though that is certainly something I will do down the line.
I talked to a colleague of mine and he dug up something very interesting and probably answers the behavior I'm seeing, from PG archives. So I thought I would post it here so that if this question comes up again its more easy to find it in the future for PG 9.4 specifically.

There is another setting that controls Autovacuum called autovacuum_multixact_freeze_max_age, in PG 9.5 it is easy to get the correct value for the table to see if you hit the default limit of 400 million, however in PG 9.4 and earlier the function mxid_age() does not exist, so you can't query this value directly, hence this reference from this PG Archive post is crucial:

And in that thread we can see Alvaro Herrera write the following paragraph:

"A better way not involving mxid_age() would be to use pg_controldata to
extract the current value of the mxid counter, then subtract the current
relminmxid from that value."

Given that statement we took a look at this server and the table in question. We could see that NextMultiXactId:  2640838929 and the relminmxid for that specific table was 1791982183. Now if you take 2640838929 - 1791982183 = 848856746 you can see that this value is the double of the default value of 400 million and would explain why vacuum is running:

"As a safety device, a whole-table vacuum scan will occur for any table whose multixact-age is greater than autovacuum_multixact_freeze_max_age. Whole-table vacuum scans will also occur progressively for all tables, starting with those that have the oldest multixact-age, if the amount of used member storage space exceeds the amount 50% of the addressible storage space. Both of these kinds of whole-table scans will occur even if autovacuum is nominally disabled."

This calculation is also confirmed by looking at the git commit of mxid_age() this is exactly what this function returns and even though the 5 other tables are not over this value we suspect that due to that we are using more than 50% of the addressable storage space Autovacuum takes to "opportunity" to run for these tables also.

Now you could of course increase this value also, not sure what the max is, but I rather want to complete vacuum. So the plan for me now is that during the weekend, where there is no activity at all except for a backup, is to increase the work memory from relatively low 1024 MB to use almost all memory I have on this server, set the server to use one worker and then see if Autovacuum by itself can solve the cleanup by itself or at least close the gap so that are confident that we will catch up.

Best regards

Daniel


On Mon, Jun 4, 2018 at 11:47 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Monday, June 4, 2018, Daniel Lagerman <spjheruur@gmail.com> wrote:
 I have a pg 9.4.3 server that has one table with calculate age of about 320 million, this is the oldest table in this database and therefore the age of the DB is the same as the table.

It would be advisable to simply upgrade to 9.4.18 and see if any of the various vacuum related bug fixes solves your problem.  That's about the extent of my useful advice for this...

David J.