I have been debugging a problem on a 9.3.10 Postgres database cluster with over 1200 databases. 10 workers, increased maintenance_work_mem, auto vacuum settings to run more frequently than default. What I will notice is that autovacuum will run for a week or so and traverse databases as expected. I will be able to see that age(datfrozenxid) for all 1200 databases will stay close to autovacuum_freeze_max_age as desired.
Then, suddenly I will see it get “stuck”. Autovacuum launcher will not launch worker processes even though databases start to age past autovacuum_freeze_max_age. If I create a list of databases and sort by age(datfrozenxid), connect to the database with the oldest and execute a simple: "vacuum freeze pg_database;”, autovacuum springs back into action.
It’s never the same database where autovacuum seems to get “stuck”. I’m attempting to gather more debugging information, but, also can’t understand why simply doing a “vacuum freeze pg_database” breaks up the jam.
Any thoughts?
Shawn