Обсуждение: datfrozenxid not dropping after vacuum
Hi, Starting this morning at 0830 local time I noticed that my datfrozenxid starts moving past the `autovacuum_freeze_max_age` value of 200000000. When we encountered this in the past the solution has been to do one of the following: 1. This is related an error similar to ``` found xmin 2675436435 from before relfrozenxid 321165377 ``` Where the solution has been to move the `pg_internal.init` file out of the way and let Postgresql recreate it. Or; 2. A long-running transaction. Typically I'll just find the `idle in transaction` transactions that have a `query_start` around when my alarm went off notifying me when `datfrozenxid` breaches `autovacuum_freeze_max_age`. Using a query similar to ``` SELECT pid, query_start, datname, usename, state, backend_xmin, age(backend_xmin) FROM pg_stat_activity WHERE state = 'idle in transaction'; ``` 3. The autovacuum process seemed to be "stuck" on a particular table. We would kill the pid of the autovacuum process. The problem is that neither of these solutions have seemed to drop `datfrozenxid` back down and there is one specific database in this cluster that's holding onto it. Using these queries from CrunchyData: # Show oldest current xid # WITH max_age AS ( SELECT 2000000000 as max_old_xid , setting AS autovacuum_freeze_max_age FROM pg_catalog.pg_settings WHERE name = 'autovacuum_freeze_max_age' ) , per_database_stats AS ( SELECT datname , m.max_old_xid::int , m.autovacuum_freeze_max_age::int , age(d.datfrozenxid) AS oldest_current_xid FROM pg_catalog.pg_database d JOIN max_age m ON (true) WHERE d.datallowconn ) SELECT max(oldest_current_xid) AS oldest_current_xid , max(ROUND(100*(oldest_current_xid/max_old_xid::float))) AS percent_towards_wraparound , max(ROUND(100*(oldest_current_xid/autovacuum_freeze_max_age::float))) AS percent_towards_emergency_autovac FROM per_database_stats; oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac --------------------+----------------------------+----------------------------------- 230935699 | 12 | 115 # Top 8 individual databases SELECT datname , age(datfrozenxid) , current_setting('autovacuum_freeze_max_age') FROM pg_database ORDER BY 2 DESC LIMIT 8; datname | age | current_setting ---------------------------+-----------+----------------- siteservice | 230935699 | 200000000 coupon-ws | 217613246 | 200000000 contextchangestore | 211343280 | 200000000 template0 | 210351639 | 200000000 productmanager | 207876167 | 200000000 rhistory | 207876167 | 200000000 smsservice | 207876167 | 200000000 techservice | 205432524 | 200000000 That top database `siteservice` is the "problematic" one where a manual vacuum won't coerce it to free up the `datfrozenxid`. Looking at the tables in that database: # SELECT c.oid::regclass , age(c.relfrozenxid) , pg_size_pretty(pg_total_relation_size(c.oid)) FROM pg_class c JOIN pg_namespace n on c.relnamespace = n.oid WHERE relkind IN ('r', 't', 'm') AND n.nspname NOT IN ('pg_toast') ORDER BY 2 DESC LIMIT 10; oid | age | pg_size_pretty --------------------+-----------+---------------- pg_database | 230935699 | 4264 kB pg_proc | 93543215 | 976 kB pg_collation | 93543215 | 560 kB pg_attribute | 93543215 | 600 kB pg_shdepend | 59515320 | 15 MB pg_statistic | 53828900 | 464 kB pg_subscription | 53172718 | 16 kB pg_pltemplate | 53172718 | 56 kB pg_authid | 53172718 | 8616 kB pg_db_role_setting | 53172718 | 64 kB I thought maybe it had to do with my replication slots somehow: # select slot_name, slot_type, database, active, catalog_xmin, restart_lsn, confirmed_flush_lsn from pg_replication_slots ; slot_name | slot_type | database | active | catalog_xmin | restart_lsn | confirmed_flush_lsn ----------------------------------------+-----------+-------------------------------+--------+--------------+---------------+--------------------- dbs1db02 | physical | | f | | | dbs1db01 | physical | | t | | 4D25/ACE6EE08 | dbs1db03 | physical | | t | | 4D25/ACE6EE08 | dbs2db01 | physical | | t | | 4D25/ACE6EE08 | debezium_cmanager | logical | campaign-manager | t | 2152258063 | 4D25/A421A6C8 | 4D25/ABC18C88 debezium_rservice | logical | retail-content-service | t | 2152238060 | 4D25/8EC403B0 | 4D25/A6105DF8 debezium_partnerservice | logical | partnerservice | t | 2152238060 | 4D25/8EC403B0 | 4D25/A5446630 dbs1_dr | physical | | f | | | So I either restarted the physical standbys or I restarted the logical connections (Debezium -> Kafka). I'm also working on a: # vacuumdb --all --jobs=10 --echo --analyze But this is still running (and honestly I'm not really hopeful it's going to do the trick). My primary, read/write database is Postgresql 10.4 (CentOS 7) while my standby databases have been patched to 10.17. Any thoughts on this? Thanks,Matt
On 2021-Sep-01, Matthew Tice wrote: [ problem table is pg_database ] > My primary, read/write database is Postgresql 10.4 (CentOS 7) while my > standby databases have been patched to 10.17. Hmm, I think there was a bug in the early 10.x versions where advancing the xid age of shared tables would not work correctly for some reason ... Ah yes, this was fixed in 10.5, a mere three years ago: Author: Andres Freund <andres@anarazel.de> Branch: master Release: REL_11_BR [a54e1f158] 2018-06-12 11:13:21 -0700 Branch: REL_10_STABLE Release: REL_10_5 [2ce64caaf] 2018-06-12 11:13:21 -0700 Branch: REL9_6_STABLE Release: REL9_6_10 [6a46aba1c] 2018-06-12 11:13:21 -0700 Branch: REL9_5_STABLE Release: REL9_5_14 [14b3ec6f3] 2018-06-12 11:13:21 -0700 Branch: REL9_4_STABLE Release: REL9_4_19 [817f9f9a8] 2018-06-12 11:13:22 -0700 Branch: REL9_3_STABLE Release: REL9_3_24 [9b9b622b2] 2018-06-12 11:13:22 -0700 Fix bugs in vacuum of shared rels, by keeping their relcache entries current. When vacuum processes a relation it uses the corresponding relcache entry's relfrozenxid / relminmxid as a cutoff for when to remove tuples etc. Unfortunately for nailed relations (i.e. critical system catalogs) bugs could frequently lead to the corresponding relcache entry being stale. This set of bugs could cause actual data corruption as vacuum would potentially not remove the correct row versions, potentially reviving them at a later point. After 699bf7d05c some corruptions in this vein were prevented, but the additional error checks could also trigger spuriously. Examples of such errors are: ERROR: found xmin ... from before relfrozenxid ... and ERROR: found multixact ... from before relminmxid ... To be caused by this bug the errors have to occur on system catalog tables. The two bugs are: 1) Invalidations for nailed relations were ignored, based on the theory that the relcache entry for such tables doesn't change. Which is largely true, except for fields like relfrozenxid etc. This means that changes to relations vacuumed in other sessions weren't picked up by already existing sessions. Luckily autovacuum doesn't have particularly longrunning sessions. 2) For shared *and* nailed relations, the shared relcache init file was never invalidated while running. That means that for such tables (e.g. pg_authid, pg_database) it's not just already existing sessions that are affected, but even new connections are as well. That explains why the reports usually were about pg_authid et. al. To fix 1), revalidate the rd_rel portion of a relcache entry when invalid. This implies a bit of extra complexity to deal with bootstrapping, but it's not too bad. The fix for 2) is simpler, simply always remove both the shared and local init files. Author: Andres Freund Reviewed-By: Alvaro Herrera Discussion: https://postgr.es/m/20180525203736.crkbg36muzxrjj5e@alap3.anarazel.de https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bPgG+_GDMxe25TvUY4Sugg@mail.gmail.com https://postgr.es/m/CAKMFJucqbuoDRfxPDX39WhA3vJyxweRg_zDVXzncr6+5wOguWA@mail.gmail.com https://postgr.es/m/CAGewt-ujGpMLQ09gXcUFMZaZsGJC98VXHEFbF-tpPB0fB13K+A@mail.gmail.com Backpatch: 9.3- -- Álvaro Herrera 39°49'30"S 73°17'W "El número de instalaciones de UNIX se ha elevado a 10, y se espera que este número aumente" (UPM, 1972)
Hi Alvaro, thanks for the quick reply. I'm scheduled to do my patching maintenance at the end of this month - but at this point I don't think I'm going to make it. Other than patching, is there a work around? For example, in #2 above: >The fix for 2) is simpler, > simply always remove both the shared and local init files. I'm not familiar with the differences between 'shared' and 'local' init files (I'd imagine I referenced a 'local' file in my original post)? Thanks! Matt On Wed, Sep 1, 2021 at 3:00 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2021-Sep-01, Matthew Tice wrote: > > [ problem table is pg_database ] > > > My primary, read/write database is Postgresql 10.4 (CentOS 7) while my > > standby databases have been patched to 10.17. > > Hmm, I think there was a bug in the early 10.x versions where advancing > the xid age of shared tables would not work correctly for some reason ... > Ah yes, this was fixed in 10.5, a mere three years ago: > > Author: Andres Freund <andres@anarazel.de> > Branch: master Release: REL_11_BR [a54e1f158] 2018-06-12 11:13:21 -0700 > Branch: REL_10_STABLE Release: REL_10_5 [2ce64caaf] 2018-06-12 11:13:21 -0700 > Branch: REL9_6_STABLE Release: REL9_6_10 [6a46aba1c] 2018-06-12 11:13:21 -0700 > Branch: REL9_5_STABLE Release: REL9_5_14 [14b3ec6f3] 2018-06-12 11:13:21 -0700 > Branch: REL9_4_STABLE Release: REL9_4_19 [817f9f9a8] 2018-06-12 11:13:22 -0700 > Branch: REL9_3_STABLE Release: REL9_3_24 [9b9b622b2] 2018-06-12 11:13:22 -0700 > > Fix bugs in vacuum of shared rels, by keeping their relcache entries current. > > When vacuum processes a relation it uses the corresponding relcache > entry's relfrozenxid / relminmxid as a cutoff for when to remove > tuples etc. Unfortunately for nailed relations (i.e. critical system > catalogs) bugs could frequently lead to the corresponding relcache > entry being stale. > > This set of bugs could cause actual data corruption as vacuum would > potentially not remove the correct row versions, potentially reviving > them at a later point. After 699bf7d05c some corruptions in this vein > were prevented, but the additional error checks could also trigger > spuriously. Examples of such errors are: > ERROR: found xmin ... from before relfrozenxid ... > and > ERROR: found multixact ... from before relminmxid ... > To be caused by this bug the errors have to occur on system catalog > tables. > > The two bugs are: > > 1) Invalidations for nailed relations were ignored, based on the > theory that the relcache entry for such tables doesn't > change. Which is largely true, except for fields like relfrozenxid > etc. This means that changes to relations vacuumed in other > sessions weren't picked up by already existing sessions. Luckily > autovacuum doesn't have particularly longrunning sessions. > > 2) For shared *and* nailed relations, the shared relcache init file > was never invalidated while running. That means that for such > tables (e.g. pg_authid, pg_database) it's not just already existing > sessions that are affected, but even new connections are as well. > That explains why the reports usually were about pg_authid et. al. > > To fix 1), revalidate the rd_rel portion of a relcache entry when > invalid. This implies a bit of extra complexity to deal with > bootstrapping, but it's not too bad. The fix for 2) is simpler, > simply always remove both the shared and local init files. > > Author: Andres Freund > Reviewed-By: Alvaro Herrera > Discussion: > https://postgr.es/m/20180525203736.crkbg36muzxrjj5e@alap3.anarazel.de > https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bPgG+_GDMxe25TvUY4Sugg@mail.gmail.com > https://postgr.es/m/CAKMFJucqbuoDRfxPDX39WhA3vJyxweRg_zDVXzncr6+5wOguWA@mail.gmail.com > https://postgr.es/m/CAGewt-ujGpMLQ09gXcUFMZaZsGJC98VXHEFbF-tpPB0fB13K+A@mail.gmail.com > Backpatch: 9.3- > > > -- > Álvaro Herrera 39°49'30"S 73°17'W > "El número de instalaciones de UNIX se ha elevado a 10, > y se espera que este número aumente" (UPM, 1972)
Interestingly enough, I hopped on the database system this morning and found the `datfrozenxid` dropped back down below `autovacuum_freeze_max_age` around 0200 local time (roughly 18 hours after the fact). Looking through the Postgresql logs I don't see anything standing out at that time. I still plan on patching to 10.17 tonight. Matt On Wed, Sep 1, 2021 at 4:01 PM Matthew Tice <mjtice@gmail.com> wrote: > > Hi Alvaro, thanks for the quick reply. > > I'm scheduled to do my patching maintenance at the end of this month - > but at this point I don't think I'm going to make it. > > Other than patching, is there a work around? For example, in #2 above: > >The fix for 2) is simpler, > > simply always remove both the shared and local init files. > > I'm not familiar with the differences between 'shared' and 'local' > init files (I'd imagine I referenced a 'local' file in my original > post)? > > > > Thanks! > > Matt > > On Wed, Sep 1, 2021 at 3:00 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > On 2021-Sep-01, Matthew Tice wrote: > > > > [ problem table is pg_database ] > > > > > My primary, read/write database is Postgresql 10.4 (CentOS 7) while my > > > standby databases have been patched to 10.17. > > > > Hmm, I think there was a bug in the early 10.x versions where advancing > > the xid age of shared tables would not work correctly for some reason ... > > Ah yes, this was fixed in 10.5, a mere three years ago: > > > > Author: Andres Freund <andres@anarazel.de> > > Branch: master Release: REL_11_BR [a54e1f158] 2018-06-12 11:13:21 -0700 > > Branch: REL_10_STABLE Release: REL_10_5 [2ce64caaf] 2018-06-12 11:13:21 -0700 > > Branch: REL9_6_STABLE Release: REL9_6_10 [6a46aba1c] 2018-06-12 11:13:21 -0700 > > Branch: REL9_5_STABLE Release: REL9_5_14 [14b3ec6f3] 2018-06-12 11:13:21 -0700 > > Branch: REL9_4_STABLE Release: REL9_4_19 [817f9f9a8] 2018-06-12 11:13:22 -0700 > > Branch: REL9_3_STABLE Release: REL9_3_24 [9b9b622b2] 2018-06-12 11:13:22 -0700 > > > > Fix bugs in vacuum of shared rels, by keeping their relcache entries current. > > > > When vacuum processes a relation it uses the corresponding relcache > > entry's relfrozenxid / relminmxid as a cutoff for when to remove > > tuples etc. Unfortunately for nailed relations (i.e. critical system > > catalogs) bugs could frequently lead to the corresponding relcache > > entry being stale. > > > > This set of bugs could cause actual data corruption as vacuum would > > potentially not remove the correct row versions, potentially reviving > > them at a later point. After 699bf7d05c some corruptions in this vein > > were prevented, but the additional error checks could also trigger > > spuriously. Examples of such errors are: > > ERROR: found xmin ... from before relfrozenxid ... > > and > > ERROR: found multixact ... from before relminmxid ... > > To be caused by this bug the errors have to occur on system catalog > > tables. > > > > The two bugs are: > > > > 1) Invalidations for nailed relations were ignored, based on the > > theory that the relcache entry for such tables doesn't > > change. Which is largely true, except for fields like relfrozenxid > > etc. This means that changes to relations vacuumed in other > > sessions weren't picked up by already existing sessions. Luckily > > autovacuum doesn't have particularly longrunning sessions. > > > > 2) For shared *and* nailed relations, the shared relcache init file > > was never invalidated while running. That means that for such > > tables (e.g. pg_authid, pg_database) it's not just already existing > > sessions that are affected, but even new connections are as well. > > That explains why the reports usually were about pg_authid et. al. > > > > To fix 1), revalidate the rd_rel portion of a relcache entry when > > invalid. This implies a bit of extra complexity to deal with > > bootstrapping, but it's not too bad. The fix for 2) is simpler, > > simply always remove both the shared and local init files. > > > > Author: Andres Freund > > Reviewed-By: Alvaro Herrera > > Discussion: > > https://postgr.es/m/20180525203736.crkbg36muzxrjj5e@alap3.anarazel.de > > https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bPgG+_GDMxe25TvUY4Sugg@mail.gmail.com > > https://postgr.es/m/CAKMFJucqbuoDRfxPDX39WhA3vJyxweRg_zDVXzncr6+5wOguWA@mail.gmail.com > > https://postgr.es/m/CAGewt-ujGpMLQ09gXcUFMZaZsGJC98VXHEFbF-tpPB0fB13K+A@mail.gmail.com > > Backpatch: 9.3- > > > > > > -- > > Álvaro Herrera 39°49'30"S 73°17'W > > "El número de instalaciones de UNIX se ha elevado a 10, > > y se espera que este número aumente" (UPM, 1972)
On 2021-Sep-01, Matthew Tice wrote: > Hi Alvaro, thanks for the quick reply. Hi. Glad to hear that your problem is now behind. > I'm scheduled to do my patching maintenance at the end of this month - > but at this point I don't think I'm going to make it. > > Other than patching, is there a work around? Hm, in my previous reply I had written a suggestion to vacuum pg_database in the offending database after deleting the pg_internal.init file, but evidently I edited it out before sending. (Probably because I wasn't sure if you need to delete file, connect, vacuum, or rather connect, delete file, vacuum.) > For example, in #2 above: > >The fix for 2) is simpler, > > simply always remove both the shared and local init files. > > I'm not familiar with the differences between 'shared' and 'local' > init files (I'd imagine I referenced a 'local' file in my original > post)? The global file is in the global/ subdirectory of the data directory, and the "local" ones are each in the corresponding database directory: cd $PGDATA $ find . -name pg_internal.init ./base/12758/pg_internal.init ./base/46212/pg_internal.init ./global/pg_internal.init $ psql -c "select oid, datname from pg_database" oid | datname -------+------------ 12757 | postgres 12758 | alvherre 1 | template1 12756 | template0 46212 | regression (5 filas) So in the above there are cache files for databases regression and alvherre, plus the global one. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Hay dos momentos en la vida de un hombre en los que no debería especular: cuando puede permitírselo y cuando no puede" (Mark Twain)