Обсуждение: BUG #16620: Autovacuum does not process certain databases after migration from postgresql 10

Поиск
Список
Период
Сортировка

BUG #16620: Autovacuum does not process certain databases after migration from postgresql 10

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      16620
Logged by:          Max Vikharev
Email address:      bm.kinder@gmail.com
PostgreSQL version: 12.3
Operating system:   Linux
Description:

Hi,

There is a cluster with 50 databases. The sizes of tables are very
different, from megabytes to terabytes.

We have been successfully used the aggressive autovacuum for a long time.
After upgrade from 10 to 12.3 we've noticed that autovacuum always process
relations from one database but does not reach others.

I've used
https://www.depesz.com/2020/01/29/which-tables-should-be-auto-vacuumed-or-auto-analyzed/comment-page-1/
to detect existance of relations that should be autovacuumed. There are a
lot of them. At the same time i see that there are free autovacuum workers
that should do this work. But tables are not processed for a long time.

I have logs (log_autovacuum_min_duration = 0) that confirms that only single
database is constantly being autovacuumed. Within pg_stat_user_tables i see
that last time autovacuum / autoanalyze for  other databases happened in the
same day when "vacuum to prevent wraparound" worked.

Please help to investigate the problem. Why my relations does not processed
and how can i fix it.

Example table
-[ RECORD 1 ]-------+------------------------------
n_tup_ins           | 25599733
n_tup_upd           | 2206685374
n_tup_del           | 19938580
n_tup_hot_upd       | 1863307770
n_live_tup          | 186473676
n_dead_tup          | 18177988
n_mod_since_analyze | 3250337
last_autovacuum     | 2020-08-26 16:43:02.321971+03
last_autoanalyze    | 2020-08-26 14:04:19.132566+03
vacuum_count        | 1
autovacuum_count    | 4078
analyze_count       | 8
autoanalyze_count   | 357

Autovacuum settings
"autovacuum_max_workers" => 5,
"autovacuum_naptime" => "30s",
"autovacuum_vacuum_threshold" => 5000
"autovacuum_vacuum_scale_factor" => 0.00001
"autovacuum_vacuum_cost_delay" => "10ms"
"autovacuum_vacuum_cost_limit" => 6000,
"autovacuum_analyze_threshold" => 5000,
"autovacuum_analyze_scale_factor" => "0.01"


Re: BUG #16620: Autovacuum does not process certain databases after migration from postgresql 10

От
Max Vikharev
Дата:
Problem gone after restarting the cluster. 
Autovacuum started to process relations in other databases.

I dont know how to reproduce the issue, we will monitor it. 
If there any way to debug it when it occurs again - let me know.

ср, 16 сент. 2020 г. в 16:40, PG Bug reporting form <noreply@postgresql.org>:
The following bug has been logged on the website:

Bug reference:      16620
Logged by:          Max Vikharev
Email address:      bm.kinder@gmail.com
PostgreSQL version: 12.3
Operating system:   Linux
Description:       

Hi,

There is a cluster with 50 databases. The sizes of tables are very
different, from megabytes to terabytes.

We have been successfully used the aggressive autovacuum for a long time.
After upgrade from 10 to 12.3 we've noticed that autovacuum always process
relations from one database but does not reach others.

I've used
https://www.depesz.com/2020/01/29/which-tables-should-be-auto-vacuumed-or-auto-analyzed/comment-page-1/
to detect existance of relations that should be autovacuumed. There are a
lot of them. At the same time i see that there are free autovacuum workers
that should do this work. But tables are not processed for a long time.

I have logs (log_autovacuum_min_duration = 0) that confirms that only single
database is constantly being autovacuumed. Within pg_stat_user_tables i see
that last time autovacuum / autoanalyze for  other databases happened in the
same day when "vacuum to prevent wraparound" worked.

Please help to investigate the problem. Why my relations does not processed
and how can i fix it.

Example table
-[ RECORD 1 ]-------+------------------------------
n_tup_ins           | 25599733
n_tup_upd           | 2206685374
n_tup_del           | 19938580
n_tup_hot_upd       | 1863307770
n_live_tup          | 186473676
n_dead_tup          | 18177988
n_mod_since_analyze | 3250337
last_autovacuum     | 2020-08-26 16:43:02.321971+03
last_autoanalyze    | 2020-08-26 14:04:19.132566+03
vacuum_count        | 1
autovacuum_count    | 4078
analyze_count       | 8
autoanalyze_count   | 357

Autovacuum settings
"autovacuum_max_workers" => 5,
"autovacuum_naptime" => "30s",
"autovacuum_vacuum_threshold" => 5000
"autovacuum_vacuum_scale_factor" => 0.00001
"autovacuum_vacuum_cost_delay" => "10ms"
"autovacuum_vacuum_cost_limit" => 6000,
"autovacuum_analyze_threshold" => 5000,
"autovacuum_analyze_scale_factor" => "0.01"

Re: BUG #16620: Autovacuum does not process certain databases after migration from postgresql 10

От
Tom Lane
Дата:
Max Vikharev <bm.kinder@gmail.com> writes:
> Problem gone after restarting the cluster.
> Autovacuum started to process relations in other databases.

Hmm, interesting.

> I dont know how to reproduce the issue, we will monitor it.
> If there any way to debug it when it occurs again - let me know.

Did you by any chance capture the contents of pg_database.datfrozenxid
and datminmxid and compare them to the pg_class.relfrozenxid and
relminmxid fields in the problematic databases?

It's not hard to imagine that if the pg_database fields somehow
didn't get updated correctly during pg_upgrade, that would prevent
autovacuum from processing some databases to prevent wraparound.
However, that doesn't explain failure to examine those databases
at all, so I'm a bit at a loss.

Another thing to check is whether the stats collector is working.
Specifically look at whether counts in pg_stat_all_tables are
incrementing in the problem databases.

My guess is that somehow pg_upgrade left something in a slightly
hosed state, and that restarting de-hosed it, so that you aren't
going to see this again ... at least not till your next upgrade.
But I don't know exactly what the something could be.

            regards, tom lane



Re: BUG #16620: Autovacuum does not process certain databases after migration from postgresql 10

От
Max Vikharev
Дата:
Hi,

The problem occurred again. 

I see that only one database is processed by autovacuum daemon while there are tables in other databases that should be processed and there are free autovacuum workers.

Here is information about forzenxid you requested.
renamed_db=# select datname, datfrozenxid, datminmxid from pg_database where datname = 'renamed_db';
datname | datfrozenxid | datminmxid
-----------+--------------+------------
renamed_db | 2665923884 | 504434938
(1 row)
renamed_db=# select relfrozenxid, relminmxid from pg_class where relname = 'renamed_table_in_renamed_db';
 relfrozenxid | relminmxid
--------------+------------
2742239890 | 504663548
(1 row)
I don't think this is because of pg_upradate because autovacuum worked correctly since the last restart. 
It looks like autovacuum hangs on the one database.

Im sure that the problem will be solved by restarting. 
I would like to find reasons before.
Please help investigate the problem.

чт, 17 сент. 2020 г. в 04:19, Tom Lane <tgl@sss.pgh.pa.us>:
Max Vikharev <bm.kinder@gmail.com> writes:
> Problem gone after restarting the cluster.
> Autovacuum started to process relations in other databases.

Hmm, interesting.

> I dont know how to reproduce the issue, we will monitor it.
> If there any way to debug it when it occurs again - let me know.

Did you by any chance capture the contents of pg_database.datfrozenxid
and datminmxid and compare them to the pg_class.relfrozenxid and
relminmxid fields in the problematic databases?

It's not hard to imagine that if the pg_database fields somehow
didn't get updated correctly during pg_upgrade, that would prevent
autovacuum from processing some databases to prevent wraparound.
However, that doesn't explain failure to examine those databases
at all, so I'm a bit at a loss.

Another thing to check is whether the stats collector is working.
Specifically look at whether counts in pg_stat_all_tables are
incrementing in the problem databases.

My guess is that somehow pg_upgrade left something in a slightly
hosed state, and that restarting de-hosed it, so that you aren't
going to see this again ... at least not till your next upgrade.
But I don't know exactly what the something could be.

                        regards, tom lane

Re: BUG #16620: Autovacuum does not process certain databases after migration from postgresql 10

От
Max Vikharev
Дата:
Hi there,

Last time the problem was solved again by restarting the cluster.
Last saturday (after two weeks) the problem occurred again. Currently only one database is processed by autovacuum daemon while there are tables in other databases that should be processed and there are free autovacuum workers.
We noticed that just before the last other database was processed were few deadlocks in the log on the hanged database. Also there was wal-e backup started in one hour after it.

Any suggestions on how to find the sources of the problem?

пт, 25 сент. 2020 г. в 16:53, Max Vikharev <bm.kinder@gmail.com>:
Hi,

The problem occurred again. 

I see that only one database is processed by autovacuum daemon while there are tables in other databases that should be processed and there are free autovacuum workers.

Here is information about forzenxid you requested.
renamed_db=# select datname, datfrozenxid, datminmxid from pg_database where datname = 'renamed_db';
datname | datfrozenxid | datminmxid
-----------+--------------+------------
renamed_db | 2665923884 | 504434938
(1 row)
renamed_db=# select relfrozenxid, relminmxid from pg_class where relname = 'renamed_table_in_renamed_db';
 relfrozenxid | relminmxid
--------------+------------
2742239890 | 504663548
(1 row)
I don't think this is because of pg_upradate because autovacuum worked correctly since the last restart. 
It looks like autovacuum hangs on the one database.

Im sure that the problem will be solved by restarting. 
I would like to find reasons before.
Please help investigate the problem.

чт, 17 сент. 2020 г. в 04:19, Tom Lane <tgl@sss.pgh.pa.us>:
Max Vikharev <bm.kinder@gmail.com> writes:
> Problem gone after restarting the cluster.
> Autovacuum started to process relations in other databases.

Hmm, interesting.

> I dont know how to reproduce the issue, we will monitor it.
> If there any way to debug it when it occurs again - let me know.

Did you by any chance capture the contents of pg_database.datfrozenxid
and datminmxid and compare them to the pg_class.relfrozenxid and
relminmxid fields in the problematic databases?

It's not hard to imagine that if the pg_database fields somehow
didn't get updated correctly during pg_upgrade, that would prevent
autovacuum from processing some databases to prevent wraparound.
However, that doesn't explain failure to examine those databases
at all, so I'm a bit at a loss.

Another thing to check is whether the stats collector is working.
Specifically look at whether counts in pg_stat_all_tables are
incrementing in the problem databases.

My guess is that somehow pg_upgrade left something in a slightly
hosed state, and that restarting de-hosed it, so that you aren't
going to see this again ... at least not till your next upgrade.
But I don't know exactly what the something could be.

                        regards, tom lane