Обсуждение: relfrozenxid not getting reset even after manual VACUUM

Поиск
Список
Период
Сортировка

relfrozenxid not getting reset even after manual VACUUM

От
Armand du Plessis
Дата:
Hi there, 

We're running into a scenario where despite doing a manual vacuum as a superuser the relfrozenxid for one relation now dangerously close to wraparound is not getting reset. 

It's a Postgres 9.2.3 cluster. We shutdown other access to the machine while running the VACUUM to ensure it could complete quick enough with an aggressive vacuum (vacuum_cost_limit 10000 and no delay). The previous autovacuum was running for days without completing. 

There's also no old transactions in either pg_prepared_xacts or pg_stat_activity. 

production=# SELECT relname, age(relfrozenxid) as xid_age FROM pg_class WHERE relkind = 'r' ORDER BY age(relfrozenxid) DESC LIMIT 10;
      relname      |  xid_age   
-------------------+------------
 messages          | 2050996318

(This is a similar scenario to one I've posted about a while back (http://www.postgresql.org/message-id/3238.1369055503@sss.pgh.pa.us) - In that case I advanced the autovacuum_freeze_max_age with the intention to replace the problematic table. That hasn't happened in time and now sitting with a real problem)

Below is the vacuum settings at the time of the manual vacuum:

              name               |  setting   
---------------------------------+------------
 autovacuum                      | on
 autovacuum_analyze_scale_factor | 0.01
 autovacuum_analyze_threshold    | 50
 autovacuum_freeze_max_age       | 1750000000
 autovacuum_max_workers          | 2
 autovacuum_naptime              | 60
 autovacuum_vacuum_cost_delay    | 50
 autovacuum_vacuum_cost_limit    | 200
 autovacuum_vacuum_scale_factor  | 2
 autovacuum_vacuum_threshold     | 50
 log_autovacuum_min_duration     | 0
 vacuum_cost_delay               | 0
 vacuum_cost_limit               | 10000
 vacuum_cost_page_dirty          | 20
 vacuum_cost_page_hit            | 1
 vacuum_cost_page_miss           | 10
 vacuum_defer_cleanup_age        | 0
 vacuum_freeze_min_age           | 25000000
 vacuum_freeze_table_age         | 150000000

The tail-end of the vacuum log:

INFO:  index "message_sender_client_ref_confirmation_index" now contains 529204260 row versions in 10752961 pages
DETAIL:  261675128 index row versions were removed.
722622 index pages have been deleted, 367793 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.24 sec.
INFO:  index "messages_v2_recipient_sender" now contains 529194622 row versions in 2632966 pages
DETAIL:  109563299 index row versions were removed.
216413 index pages have been deleted, 132791 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  index "messages_v6_recipient_created_at_type" now contains 91840693 row versions in 2312775 pages
DETAIL:  177088781 index row versions were removed.
209178 index pages have been deleted, 3045 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.03 sec.
INFO:  index "messages_v6_sender_created_at_type" now contains 91840716 row versions in 2272370 pages
DETAIL:  175741424 index row versions were removed.
201936 index pages have been deleted, 2958 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.04 sec.
INFO:  index "messages_v6_unread_messages_count" now contains 4797806 row versions in 29930 pages
DETAIL:  1808779 index row versions were removed.
77 index pages have been deleted, 5 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "messages": found 51093697 removable, 529671512 nonremovable row versions in 22666431 out of 22666435 pages
DETAIL:  701995 dead row versions cannot be removed yet.
There were 716893565 unused item pointers.
0 pages are entirely empty.
CPU 1229.92s/4618.87u sec elapsed 13951.33 sec.
INFO:  vacuuming "pg_toast.pg_toast_18369"
INFO:  index "pg_toast_18369_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_18369": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

If I look at the pg_stat_user_tables log I can see it's cleaned up the dead tuples successfully:

production=# select relname, n_live_tup, n_dead_tup, last_vacuum, last_autovacuum from pg_stat_user_tables;
            relname            | n_live_tup | n_dead_tup |          last_vacuum          |        last_autovacuum        
-------------------------------+------------+------------+-------------------------------+-------------------------------
 messages                      |  529790266 |    2046126 | 2013-08-05 13:56:07.794664+00 | 


I've got a ton of entries like the below in the log since we're approaching the end of the world:

013-08-05 15:15:07.588 UTC,,,30352,,51ffc17b.7690,2,,2013-08-05 15:15:07 UTC,9/358152441,3787657047,DEBUG,00000,"transaction ID wrap limit is 3884757767, limited by database with OID 17671",,,,,,,,,""

Doing a VACUUM on any other relation resets the XID correctly. 

Any suggestions on how we can resolve this before we get shutdown? I suspect even in shutdown mode we'll have this issue unless we can track down the cause. 

Kind regards,

Armand

Re: relfrozenxid not getting reset even after manual VACUUM

От
Armand du Plessis
Дата:
After a second manual run it properly reset the relfrozenid on the affected relation.

Only difference this one was a vacuum analyze but I suspect that was not the cause.

On Monday, August 5, 2013, Armand du Plessis wrote:
Hi there, 

We're running into a scenario where despite doing a manual vacuum as a superuser the relfrozenxid for one relation now dangerously close to wraparound is not getting reset. 

It's a Postgres 9.2.3 cluster. We shutdown other access to the machine while running the VACUUM to ensure it could complete quick enough with an aggressive vacuum (vacuum_cost_limit 10000 and no delay). The previous autovacuum was running for days without completing. 

There's also no old transactions in either pg_prepared_xacts or pg_stat_activity. 

production=# SELECT relname, age(relfrozenxid) as xid_age FROM pg_class WHERE relkind = 'r' ORDER BY age(relfrozenxid) DESC LIMIT 10;
      relname      |  xid_age   
-------------------+------------
 messages          | 2050996318

(This is a similar scenario to one I've posted about a while back (http://www.postgresql.org/message-id/3238.1369055503@sss.pgh.pa.us) - In that case I advanced the autovacuum_freeze_max_age with the intention to replace the problematic table. That hasn't happened in time and now sitting with a real problem)

Below is the vacuum settings at the time of the manual vacuum:

              name               |  setting   
---------------------------------+------------
 autovacuum                      | on
 autovacuum_analyze_scale_factor | 0.01
 autovacuum_analyze_threshold    | 50
 autovacuum_freeze_max_age       | 1750000000
 autovacuum_max_workers          | 2
 autovacuum_naptime              | 60
 autovacuum_vacuum_cost_delay    | 50
 autovacuum_vacuum_cost_limit    | 200
 autovacuum_vacuum_scale_factor  | 2
 autovacuum_vacuum_threshold     | 50
 log_autovacuum_min_duration     | 0
 vacuum_cost_delay               | 0
 vacuum_cost_limit               | 10000
 vacuum_cost_page_dirty          | 20
 vacuum_cost_page_hit            | 1
 vacuum_cost_page_miss           | 10
 vacuum_defer_cleanup_age        | 0
 vacuum_freeze_min_age           | 25000000
 vacuum_freeze_table_age         | 150000000

The tail-end of the vacuum log:

INFO:  index "message_sender_client_ref_confirmation_index" now contains 529204260 row versions in 10752961 pages
DETAIL:  261675128 index row versions were removed.
722622 index pages have been deleted, 367793 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.24 sec.
INFO:  index "messages_v2_recipient_sender" now contains 529194622 row versions in 2632966 pages
DETAIL:  109563299 index row versions were removed.
216413 index pages have been deleted, 132791 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  index "messages_v6_recipient_created_at_type" now contains 91840693 row versions in 2312775 pages
DETAIL:  177088781 index row versions were removed.
209178 index pages have been deleted, 3045 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.03 sec.
INFO:  index "messages_v6_sender_created_at_type" now contains 91840716 row versions in 2272370 pages
DETAIL:  175741424 index row versions were removed.
201936 index pages have been deleted, 2958 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.04 sec.
INFO:  index "messages_v6_unread_messages_count" now contains 4797806 row versions in 29930 pages
DETAIL:  1808779 index row versions were removed.
77 index pages have been deleted, 5 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "messages": found 51093697 removable, 529671512 nonremovable row versions in 22666431 out of 22666435 pages
DETAIL:  701995 dead row versions cannot be removed yet.
There were 716893565 unused item pointers.
0 pages are entirely empty.
CPU 1229.92s/4618.87u sec elapsed 13951.33 sec.
INFO:  vacuuming "pg_toast.pg_toast_18369"
INFO:  index "pg_toast_18369_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_18369": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

If I look at the pg_stat_user_tables log I can see it's cleaned up the dead tuples successfully:

production=# select relname, n_live_tup, n_dead_tup, last_vacuum, last_autovacuum from pg_stat_user_tables;
            relname            | n_live_tup | n_dead_tup |          last_vacuum          |        last_autovacuum        
-------------------------------+------------+------------+-------------------------------+-------------------------------
 messages                      |  529790266 |    2046126 | 2013-08-05 13:56:07.794664+00 | 


I've got a ton of entries like the below in the log since we're approaching the end of the world:

013-08-05 15:15:07.588 UTC,,,30352,,51ffc17b.7690,2,,2013-08-05 15:15:07 UTC,9/358152441,3787657047,DEBUG,00000,"transaction ID wrap limit is 3884757767, limited by database with OID 17671",,,,,,,,,""

Doing a VACUUM on any other relation resets the XID correctly. 

Any suggestions on how we can resolve this before we get shutdown? I suspect even in shutdown mode we'll have this issue unless we can track down the cause. 

Kind regards,

Armand


--
Sent from Gmail Mobile

Re: relfrozenxid not getting reset even after manual VACUUM

От
Gezeala M. Bacuño II
Дата:

On Mon, Aug 5, 2013 at 9:00 AM, Armand du Plessis <adp@bank.io> wrote:
production=# SELECT relname, age(relfrozenxid) as xid_age FROM pg_class WHERE relkind = 'r' ORDER BY age(relfrozenxid) DESC LIMIT 10;


Wow this is still on 9.2? Most likely some toast tables:

SELECT relname, age(relfrozenxid) as xid_age FROM pg_class WHERE relkind in ('r','t') ORDER BY age(relfrozenxid) DESC LIMIT 10;

--

regards


Re: relfrozenxid not getting reset even after manual VACUUM

От
Albe Laurenz
Дата:
Armand du Plessis wrote:
> We're running into a scenario where despite doing a manual vacuum as a superuser the relfrozenxid for
> one relation now dangerously close to wraparound is not getting reset.
> 
> It's a Postgres 9.2.3 cluster. We shutdown other access to the machine while running the VACUUM to
> ensure it could complete quick enough with an aggressive vacuum (vacuum_cost_limit 10000 and no
> delay). The previous autovacuum was running for days without completing.
> 
> There's also no old transactions in either pg_prepared_xacts or pg_stat_activity.

> The tail-end of the vacuum log:
[...]
> DETAIL:  701995 dead row versions cannot be removed yet.

That's your problem.  Somebody must have been using these rows, unless
there is a bug.  That's an awfully high number too.

Yours,
Laurenz Albe

Re: relfrozenxid not getting reset even after manual VACUUM

От
Armand du Plessis
Дата:
Thanks a lot Albe, Gezeala. 

I missed the 701995 dead rows cannot be removed yet in the logs (was late) and will now know to check the toast tables as well if this happens again. 

Thank you!

Armand


On Tue, Aug 6, 2013 at 9:04 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Armand du Plessis wrote:
> We're running into a scenario where despite doing a manual vacuum as a superuser the relfrozenxid for
> one relation now dangerously close to wraparound is not getting reset.
>
> It's a Postgres 9.2.3 cluster. We shutdown other access to the machine while running the VACUUM to
> ensure it could complete quick enough with an aggressive vacuum (vacuum_cost_limit 10000 and no
> delay). The previous autovacuum was running for days without completing.
>
> There's also no old transactions in either pg_prepared_xacts or pg_stat_activity.

> The tail-end of the vacuum log:
[...]
> DETAIL:  701995 dead row versions cannot be removed yet.

That's your problem.  Somebody must have been using these rows, unless
there is a bug.  That's an awfully high number too.

Yours,
Laurenz Albe