Обсуждение: could not access status of transaction pg_multixact issue
I have this issue for 1 table. Version is 9.3.5 upgraded from 9.2 with pg_upgrade a few months ago. This issues just started in the last couple of days. acustream=# SELECT count(*) from phyorg_charges_to_invoice; ERROR: could not access status of transaction 267035 DETAIL: Could not open file "pg_multixact/members/10AD6": No such file or directory. This error happens when I try and select or vacuum the table. Inserts still work. I have a hot standby database and I can recover the data from there. Is there any work around for this? From pg_class table_name | relfrozenxid ---------------------------+----------phyorg_charges_to_invoice | 84645615 (1 row) acustream=# SELECT datname, age(datfrozenxid) FROM pg_database; datname | age --------------+----------acustream | 84652806 pg_controldata: pg_control version number: 937 Catalog version number: 201306121 Database system identifier: 6006392596691943358 Database cluster state: in production pg_control last modified: Wed 08 Oct 2014 11:48:43 AM MDT Latest checkpoint location: 342/B225BBF8 Prior checkpoint location: 342/59915308 Latest checkpoint's REDO location: 342/8B0AF830 Latest checkpoint's REDO WAL file: 00000001000003420000008B Latest checkpoint's TimeLineID: 1 Latest checkpoint's PrevTimeLineID: 1 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0/84922090 Latest checkpoint's NextOID: 274840 Latest checkpoint's NextMultiXactId: 302394 Latest checkpoint's NextMultiOffset: 2391258 Latest checkpoint's oldestXID: 1799 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 84654342 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 1 Time of latest checkpoint: Wed 08 Oct 2014 11:47:51 AM MDT Fake LSN counter for unlogged rels: 0/1 Minimum recovery ending location: 0/0 Min recovery ending loc's timeline: 0 Backup start location: 0/0 Backup end location: 0/0 End-of-backup record required: no Current wal_level setting: hot_standby Current max_connections setting: 200 Current max_prepared_xacts setting: 0 Current max_locks_per_xact setting: 64 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value Data page checksum version: 0 ls -l pg_multixact/offsets total 1204 -rw-------. 1 postgres postgres 262144 May 29 23:41 0000 -rw-------. 1 postgres postgres 262144 May 31 01:28 0001 -rw-------. 1 postgres postgres 262144 Aug 8 09:58 0002 -rw-------. 1 postgres postgres 262144 Sep 30 16:35 0003 -rw-------. 1 postgres postgres 163840 Oct 8 11:31 0004 -- View this message in context: http://postgresql.1045698.n5.nabble.com/could-not-access-status-of-transaction-pg-multixact-issue-tp5822248.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
On 10/08/2014 11:00 AM, jim_yates wrote: > I have this issue for 1 table. Version is 9.3.5 upgraded from 9.2 with > pg_upgrade a few months ago. > > This issues just started in the last couple of days. > > acustream=# SELECT count(*) from phyorg_charges_to_invoice; > ERROR: could not access status of transaction 267035 > DETAIL: Could not open file "pg_multixact/members/10AD6": No such file or > directory. > > This error happens when I try and select or vacuum the table. Inserts still > work. I have a hot standby database and I can recover the data from there. > Is there any work around for this? > http://www.postgresql.org/docs/9.3/interactive/release-9-3-5.html E.1.2. Changes In pg_upgrade, remove pg_multixact files left behind by initdb (Bruce Momjian) If you used a pre-9.3.5 version of pg_upgrade to upgrade a database cluster to 9.3, it might have left behind a file $PGDATA/pg_multixact/offsets/0000 that should not be there and will eventually cause problems in VACUUM. However, in common cases this file is actually valid and must not be removed. To determine whether your installation has this problem, run this query as superuser, in any database of the cluster: WITH list(file) AS (SELECT * FROM pg_ls_dir('pg_multixact/offsets')) SELECT EXISTS (SELECT * FROM list WHERE file= '0000') AND NOT EXISTS (SELECT * FROM list WHERE file = '0001') AND NOT EXISTS (SELECT * FROM listWHERE file = 'FFFF') AND EXISTS (SELECT * FROM list WHERE file != '0000') AS file_0000_removal_required; If this query returns t, manually remove the file $PGDATA/pg_multixact/offsets/0000. Do nothing if the query returns f. -- Adrian Klaver adrian.klaver@aklaver.com
jim_yates wrote: > I have this issue for 1 table. Version is 9.3.5 upgraded from 9.2 with > pg_upgrade a few months ago. > > This issues just started in the last couple of days. > > acustream=# SELECT count(*) from phyorg_charges_to_invoice; > ERROR: could not access status of transaction 267035 > DETAIL: Could not open file "pg_multixact/members/10AD6": No such file or > directory. > > This error happens when I try and select or vacuum the table. Inserts still > work. I have a hot standby database and I can recover the data from there. > Is there any work around for this? Please see this thread: http://www.postgresql.org/message-id/loom.20140930T022539-927@post.gmane.org My bet is you upgraded to a 9.3 version older than 9.3.5; this means your relminmxid/datminmxid are set to 1 and need to be updated. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Adrian, the query for the 0000 removal returns f. Alvaro, Yes the pg_upgrade was done with 9.3.4. Following the bug #11264 you referenced: select datname, datfrozenxid, datminmxid from pg_database; datname | datfrozenxid | datminmxid --------------+--------------+------------ acustream | 1799 | 1 > A better way not involving mxid_age() would be to use pg_controldata to > extract the current value of the mxid counter, then subtract the current > relminmxid from that value. It's not clear which lines from pg_controldata to use for updating pg_database.datminmxid. I also assume I would do the pg_database update on a idle database. -- View this message in context: http://postgresql.1045698.n5.nabble.com/could-not-access-status-of-transaction-pg-multixact-issue-tp5822248p5822268.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
jim_yates wrote: > > A better way not involving mxid_age() would be to use pg_controldata to > > extract the current value of the mxid counter, then subtract the current > > relminmxid from that value. > > > It's not clear which lines from pg_controldata to use for updating > pg_database.datminmxid. The one labelled NextMultiXactId. > I also assume I would do the pg_database update on a idle database. It doesn't matter, actually. pg_database is a shared catalog, so an update would affect all the databases. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera-9 wrote > The one labelled NextMultiXactId. > >> I also assume I would do the pg_database update on a idle database. > > It doesn't matter, actually. pg_database is a shared catalog, so an > update would affect all the databases. > > -- > Álvaro Herrera http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services Right now my Latest checkpoint's NextMultiXactId is 302870 and the relminmxid from pg_class for the table in question is 1. So I would set the pg_database.datxminmxid to 302869. -- View this message in context: http://postgresql.1045698.n5.nabble.com/could-not-access-status-of-transaction-pg-multixact-issue-tp5822248p5822295.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
Alvaro Herrera-9 wrote > jim_yates wrote: > >> > A better way not involving mxid_age() would be to use pg_controldata to >> > extract the current value of the mxid counter, then subtract the >> current >> > relminmxid from that value. >> >> >> It's not clear which lines from pg_controldata to use for updating >> pg_database.datminmxid. > > The one labelled NextMultiXactId. > >> I also assume I would do the pg_database update on a idle database. > > It doesn't matter, actually. pg_database is a shared catalog, so an > update would affect all the databases. > > -- > Álvaro Herrera http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services I tried doing the update to pg_database on my Dev server and I can't get it to work. How do I calculate the new datminmxid value? NextMultiXactId: 30349 relminmxid from pg_class for the table: 8376 If I subtract the relminmxid from the nextmulixact I get 21793 which won't work. production-copy=# update pg_database set datminmxid=21973 where datname='production-copy'; ERROR: column "datminmxid" is of type xid but expression is of type integer -- View this message in context: http://postgresql.1045698.n5.nabble.com/could-not-access-status-of-transaction-pg-multixact-issue-tp5822248p5822376.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
On 10/09/2014 07:07 AM, jim_yates wrote: > Alvaro Herrera-9 wrote >> jim_yates wrote: >> >>>> A better way not involving mxid_age() would be to use pg_controldata to >>>> extract the current value of the mxid counter, then subtract the >>> current >>>> relminmxid from that value. >>> >>> >>> It's not clear which lines from pg_controldata to use for updating >>> pg_database.datminmxid. >> >> The one labelled NextMultiXactId. >> >>> I also assume I would do the pg_database update on a idle database. >> >> It doesn't matter, actually. pg_database is a shared catalog, so an >> update would affect all the databases. >> >> -- >> Álvaro Herrera http://www.2ndQuadrant.com/ >> PostgreSQL Development, 24x7 Support, Training & Services > > I tried doing the update to pg_database on my Dev server and I can't get it > to work. How do I calculate the new datminmxid value? > > NextMultiXactId: 30349 relminmxid from pg_class for the table: 8376 > > If I subtract the relminmxid from the nextmulixact I get 21793 which won't > work. > > production-copy=# update pg_database set datminmxid=21973 where > datname='production-copy'; > ERROR: column "datminmxid" is of type xid but expression is of type integer > > Casting issue, try: update pg_database set datminmxid='21973' where datname='production-copy'; > -- Adrian Klaver adrian.klaver@aklaver.com
Adrian Klaver wrote: > On 10/09/2014 07:07 AM, jim_yates wrote: > >Alvaro Herrera-9 wrote > >>jim_yates wrote: > >> > >>>>A better way not involving mxid_age() would be to use pg_controldata to > >>>>extract the current value of the mxid counter, then subtract the > >>>current > >>>>relminmxid from that value. > >>> > >>> > >>>It's not clear which lines from pg_controldata to use for updating > >>>pg_database.datminmxid. > >> > >>The one labelled NextMultiXactId. > >> > >>>I also assume I would do the pg_database update on a idle database. > >> > >>It doesn't matter, actually. pg_database is a shared catalog, so an > >>update would affect all the databases. > >> > >>-- > >>Álvaro Herrera http://www.2ndQuadrant.com/ > >>PostgreSQL Development, 24x7 Support, Training & Services > > > >I tried doing the update to pg_database on my Dev server and I can't get it > >to work. How do I calculate the new datminmxid value? > > > >NextMultiXactId: 30349 relminmxid from pg_class for the table: 8376 > > > >If I subtract the relminmxid from the nextmulixact I get 21793 which won't > >work. > > > >production-copy=# update pg_database set datminmxid=21973 where > >datname='production-copy'; > >ERROR: column "datminmxid" is of type xid but expression is of type integer > > Casting issue, try: > > update pg_database set datminmxid='21973' where > datname='production-copy'; There must have been some confusion somewhere; certainly you shouldn't be subtracting anything. The subtraction was just suggested as a way to determine the age. The value to update pg_database.datminmxid to is the oldest one of all the relminmxid in pg_class; so if you have 8376 as the minimum value there, that's what you set pg_database.datminmxid to. Not the 21973 value. update pg_database set datminmxid='8376' where datname='production-copy'; -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera-9 wrote > > There must have been some confusion somewhere; certainly you shouldn't > be subtracting anything. The subtraction was just suggested as a way to > determine the age. The value to update pg_database.datminmxid to is the > oldest one of all the relminmxid in pg_class; so if you have 8376 as the > minimum value there, that's what you set pg_database.datminmxid to. Not > the 21973 value. > > update pg_database set datminmxid='8376' where datname='production-copy'; Ok, I'm still confused. Is the value to use the Max value of relminmxid from pg_class or the value of NextMultiXactId from pg_controldump? Or are these the same thing? -- View this message in context: http://postgresql.1045698.n5.nabble.com/could-not-access-status-of-transaction-pg-multixact-issue-tp5822248p5822399.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
jim_yates wrote: > Alvaro Herrera-9 wrote > > > > There must have been some confusion somewhere; certainly you shouldn't > > be subtracting anything. The subtraction was just suggested as a way to > > determine the age. The value to update pg_database.datminmxid to is the > > oldest one of all the relminmxid in pg_class; so if you have 8376 as the > > minimum value there, that's what you set pg_database.datminmxid to. Not > > the 21973 value. > > > > update pg_database set datminmxid='8376' where datname='production-copy'; > > Ok, I'm still confused. Is the value to use the Max value of relminmxid > from pg_class or the value of NextMultiXactId from pg_controldump? Or are > these the same thing? You need to *minimum* of all the values in relminmxid (excluding those that are zero). The pg_controldata value is a mechanism to determine which value is minimum -- when there's been wraparound, the minimum might be "smaller than zero", i.e. something like 4 thousand millions. I doubt this is your case though. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera-9 wrote > You need to *minimum* of all the values in relminmxid (excluding those > that are zero). > > The pg_controldata value is a mechanism to determine which value is > minimum -- when there's been wraparound, the minimum might be "smaller > than zero", i.e. something like 4 thousand millions. I doubt this is > your case though. > > -- > Álvaro Herrera http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services Then I'm really confused. The minimum relminmxid for all the rows in pg_class that have relminmxid greater then zero is 1. That's the current value of datminmxid in pg_database. And the NextMultiXactId from pg_controldump is 303464. So if I use the min value from pg_class then I have some other issue. Where should I get the new pg_database value from? -- View this message in context: http://postgresql.1045698.n5.nabble.com/could-not-access-status-of-transaction-pg-multixact-issue-tp5822248p5822449.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
jim_yates wrote: > Then I'm really confused. > The minimum relminmxid for all the rows in pg_class that have relminmxid > greater then zero is 1. > That's the current value of datminmxid in pg_database. > > And the NextMultiXactId from pg_controldump is 303464. > > So if I use the min value from pg_class then I have some other issue. > > Where should I get the new pg_database value from? I'm deep in another issue which I don't want to page out right now, but try vacuuming the tables that have relminmxid=1 with low values set for vacuum_multixact_freeze_table_age and vacuum_multixact_freeze_min_age, say 100000. (I think 65536 ought to get you beyond segment pg_multixact/offset/0000, and then that file would be removed.) Since any multixact values below the point at which pg_upgrade ran should be marked "no longer running" through hint bits, there would be no pg_multixact lookups anyway and thus the vacuuming should complete with no errors. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera-9 wrote > I'm deep in another issue which I don't want to page out right now, but > try vacuuming the tables that have relminmxid=1 with low values set for > vacuum_multixact_freeze_table_age and vacuum_multixact_freeze_min_age, > say 100000. (I think 65536 ought to get you beyond segment > pg_multixact/offset/0000, and then that file would be removed.) Since > any multixact values below the point at which pg_upgrade ran should be > marked "no longer running" through hint bits, there would be no > pg_multixact lookups anyway and thus the vacuuming should complete with > no errors. > > -- > Álvaro Herrera http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services No joy: acustream=# set vacuum_multixact_freeze_table_age=100000; SET acustream=# set vacuum_multixact_freeze_min_age=100000; SET acustream=# vacuum phyorg_charges_to_invoice; ERROR: could not access status of transaction 182579 DETAIL: Could not open file "pg_multixact/members/EEAC": No such file or directory. -- View this message in context: http://postgresql.1045698.n5.nabble.com/could-not-access-status-of-transaction-pg-multixact-issue-tp5822248p5822482.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
Alvaro Herrera-9 wrote > jim_yates wrote: > >> Then I'm really confused. >> The minimum relminmxid for all the rows in pg_class that have relminmxid >> greater then zero is 1. >> That's the current value of datminmxid in pg_database. >> >> And the NextMultiXactId from pg_controldump is 303464. >> >> So if I use the min value from pg_class then I have some other issue. >> >> Where should I get the new pg_database value from? > > I'm deep in another issue which I don't want to page out right now, but > try vacuuming the tables that have relminmxid=1 with low values set for > vacuum_multixact_freeze_table_age and vacuum_multixact_freeze_min_age, > say 100000. (I think 65536 ought to get you beyond segment > pg_multixact/offset/0000, and then that file would be removed.) Since > any multixact values below the point at which pg_upgrade ran should be > marked "no longer running" through hint bits, there would be no > pg_multixact lookups anyway and thus the vacuuming should complete with > no errors. > > -- > Álvaro Herrera http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services I set vacuum_multixact_freeze_table_age and vacuum_multixact_freeze_min_age to 100,000 and vacuumed all the tables with a relminmxid='1' and relkind='r' using pg_class as the source. I still couldn't vacuum or select the original table with the issue. I did solve the problem by dropping the table and restoring from my standby server. Is there else anything I need to do to prevent being bitten by this bug again? I still have a value of 1 for datminmxid in pg_database, and the 0000 file is still in pg_multixact/members and offsets. -- View this message in context: http://postgresql.1045698.n5.nabble.com/could-not-access-status-of-transaction-pg-multixact-issue-tp5822248p5822573.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
jim_yates wrote: > Alvaro Herrera-9 wrote > > jim_yates wrote: > > > >> Then I'm really confused. > >> The minimum relminmxid for all the rows in pg_class that have relminmxid > >> greater then zero is 1. > >> That's the current value of datminmxid in pg_database. > >> > >> And the NextMultiXactId from pg_controldump is 303464. > >> > >> So if I use the min value from pg_class then I have some other issue. > >> > >> Where should I get the new pg_database value from? > > > > I'm deep in another issue which I don't want to page out right now, but > > try vacuuming the tables that have relminmxid=1 with low values set for > > vacuum_multixact_freeze_table_age and vacuum_multixact_freeze_min_age, > > say 100000. (I think 65536 ought to get you beyond segment > > pg_multixact/offset/0000, and then that file would be removed.) Since > > any multixact values below the point at which pg_upgrade ran should be > > marked "no longer running" through hint bits, there would be no > > pg_multixact lookups anyway and thus the vacuuming should complete with > > no errors. > > > > -- > > Álvaro Herrera http://www.2ndQuadrant.com/ > > PostgreSQL Development, 24x7 Support, Training & Services > > I set vacuum_multixact_freeze_table_age and vacuum_multixact_freeze_min_age > to 100,000 and vacuumed all the tables with a relminmxid='1' and relkind='r' > using pg_class as the source. > I still couldn't vacuum or select the original table with the issue. > I did solve the problem by dropping the table and restoring from my standby > server. It might have proven interesting to look into the actual values related to the multixact that caused you grief. It's not clear to me whether the 187k value you got in the error message came from before the upgrade or after. If it's prior to the upgrade, there should have been no lookup of it; if it was after, the pg_multixact files should have been there. I wonder if this is somehow related to this problem: http://www.postgresql.org/message-id/20140330040029.GY4582@tamriel.snowman.net > Is there else anything I need to do to prevent being bitten by this bug > again? Supposedly it's a one-time thing after the upgrade. > I still have a value of 1 for datminmxid in pg_database, and the 0000 file > is still in pg_multixact/members and offsets. Eventually the datminmxid should advance. Make sure the minimum relminmxid is no longer 1. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services