Обсуждение: could not access status of transaction pg_multixact issue

Поиск
Список
Период
Сортировка

could not access status of transaction pg_multixact issue

От
jim_yates
Дата:
I have this issue for 1 table.   Version is 9.3.5  upgraded from 9.2 with
pg_upgrade a few months ago.

This issues just started in the last couple of days.

acustream=# SELECT count(*) from phyorg_charges_to_invoice;
ERROR:  could not access status of transaction 267035
DETAIL:  Could not open file "pg_multixact/members/10AD6": No such file or
directory.

This error happens when I try and select or vacuum the table.  Inserts still
work.  I have a hot standby database and I can recover the data from there.  
Is there any work around for this?  


From pg_class       table_name         |   relfrozenxid    
---------------------------+----------phyorg_charges_to_invoice | 84645615
(1 row)

acustream=# SELECT datname, age(datfrozenxid) FROM pg_database;  datname    |   age    
--------------+----------acustream    | 84652806

pg_controldata:

pg_control version number:            937
Catalog version number:               201306121
Database system identifier:           6006392596691943358
Database cluster state:               in production
pg_control last modified:             Wed 08 Oct 2014 11:48:43 AM MDT
Latest checkpoint location:           342/B225BBF8
Prior checkpoint location:            342/59915308
Latest checkpoint's REDO location:    342/8B0AF830
Latest checkpoint's REDO WAL file:    00000001000003420000008B
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0/84922090
Latest checkpoint's NextOID:          274840
Latest checkpoint's NextMultiXactId:  302394
Latest checkpoint's NextMultiOffset:  2391258
Latest checkpoint's oldestXID:        1799
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  84654342
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Time of latest checkpoint:            Wed 08 Oct 2014 11:47:51 AM MDT
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
Current wal_level setting:            hot_standby
Current max_connections setting:      200
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0
ls -l pg_multixact/offsets
total 1204
-rw-------. 1 postgres postgres 262144 May 29 23:41 0000
-rw-------. 1 postgres postgres 262144 May 31 01:28 0001
-rw-------. 1 postgres postgres 262144 Aug  8 09:58 0002
-rw-------. 1 postgres postgres 262144 Sep 30 16:35 0003
-rw-------. 1 postgres postgres 163840 Oct  8 11:31 0004



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/could-not-access-status-of-transaction-pg-multixact-issue-tp5822248.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.



Re: could not access status of transaction pg_multixact issue

От
Adrian Klaver
Дата:
On 10/08/2014 11:00 AM, jim_yates wrote:
> I have this issue for 1 table.   Version is 9.3.5  upgraded from 9.2 with
> pg_upgrade a few months ago.
>
> This issues just started in the last couple of days.
>
> acustream=# SELECT count(*) from phyorg_charges_to_invoice;
> ERROR:  could not access status of transaction 267035
> DETAIL:  Could not open file "pg_multixact/members/10AD6": No such file or
> directory.
>
> This error happens when I try and select or vacuum the table.  Inserts still
> work.  I have a hot standby database and I can recover the data from there.
> Is there any work around for this?
>

http://www.postgresql.org/docs/9.3/interactive/release-9-3-5.html

E.1.2. Changes
    In pg_upgrade, remove pg_multixact files left behind by initdb 
(Bruce Momjian)
    If you used a pre-9.3.5 version of pg_upgrade to upgrade a database 
cluster to 9.3, it might have left behind a file 
$PGDATA/pg_multixact/offsets/0000 that should not be there and will 
eventually cause problems in VACUUM. However, in common cases this file 
is actually valid and must not be removed. To determine whether your 
installation has this problem, run this query as superuser, in any 
database of the cluster:
    WITH list(file) AS (SELECT * FROM pg_ls_dir('pg_multixact/offsets'))    SELECT EXISTS (SELECT * FROM list WHERE
file= '0000') AND           NOT EXISTS (SELECT * FROM list WHERE file = '0001') AND           NOT EXISTS (SELECT * FROM
listWHERE file = 'FFFF') AND           EXISTS (SELECT * FROM list WHERE file != '0000')           AS
file_0000_removal_required;
    If this query returns t, manually remove the file 
$PGDATA/pg_multixact/offsets/0000. Do nothing if the query returns f.


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: could not access status of transaction pg_multixact issue

От
Alvaro Herrera
Дата:
jim_yates wrote:
> I have this issue for 1 table.   Version is 9.3.5  upgraded from 9.2 with
> pg_upgrade a few months ago.
> 
> This issues just started in the last couple of days.
> 
> acustream=# SELECT count(*) from phyorg_charges_to_invoice;
> ERROR:  could not access status of transaction 267035
> DETAIL:  Could not open file "pg_multixact/members/10AD6": No such file or
> directory.
> 
> This error happens when I try and select or vacuum the table.  Inserts still
> work.  I have a hot standby database and I can recover the data from there.  
> Is there any work around for this?  

Please see this thread:
http://www.postgresql.org/message-id/loom.20140930T022539-927@post.gmane.org

My bet is you upgraded to a 9.3 version older than 9.3.5; this means
your relminmxid/datminmxid are set to 1 and need to be updated.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: could not access status of transaction pg_multixact issue

От
jim_yates
Дата:
Adrian, the query for the 0000 removal returns f.

Alvaro, Yes the pg_upgrade was done with 9.3.4.  Following the bug #11264
you referenced:

select datname, datfrozenxid, datminmxid from pg_database;
  datname    | datfrozenxid | datminmxid 
--------------+--------------+------------
acustream    |         1799 |          1


> A better way not involving mxid_age() would be to use pg_controldata to
> extract the current value of the mxid counter, then subtract the current
> relminmxid from that value.


It's not clear which lines from pg_controldata to use for updating
pg_database.datminmxid.

I also assume I would do the pg_database update on a idle database.






--
View this message in context:
http://postgresql.1045698.n5.nabble.com/could-not-access-status-of-transaction-pg-multixact-issue-tp5822248p5822268.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.



Re: could not access status of transaction pg_multixact issue

От
Alvaro Herrera
Дата:
jim_yates wrote:

> > A better way not involving mxid_age() would be to use pg_controldata to
> > extract the current value of the mxid counter, then subtract the current
> > relminmxid from that value.
> 
> 
> It's not clear which lines from pg_controldata to use for updating
> pg_database.datminmxid.

The one labelled NextMultiXactId.

> I also assume I would do the pg_database update on a idle database.

It doesn't matter, actually.  pg_database is a shared catalog, so an
update would affect all the databases.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: could not access status of transaction pg_multixact issue

От
jim_yates
Дата:
Alvaro Herrera-9 wrote
> The one labelled NextMultiXactId.
>
>> I also assume I would do the pg_database update on a idle database.
>
> It doesn't matter, actually.  pg_database is a shared catalog, so an
> update would affect all the databases.
>
> --
> Álvaro Herrera                http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services

Right now my Latest checkpoint's NextMultiXactId is  302870
and the relminmxid from pg_class for the table in question is 1.

So I would set the pg_database.datxminmxid to 302869.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/could-not-access-status-of-transaction-pg-multixact-issue-tp5822248p5822295.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.



Re: could not access status of transaction pg_multixact issue

От
jim_yates
Дата:
Alvaro Herrera-9 wrote
> jim_yates wrote:
>
>> > A better way not involving mxid_age() would be to use pg_controldata to
>> > extract the current value of the mxid counter, then subtract the
>> current
>> > relminmxid from that value.
>>
>>
>> It's not clear which lines from pg_controldata to use for updating
>> pg_database.datminmxid.
>
> The one labelled NextMultiXactId.
>
>> I also assume I would do the pg_database update on a idle database.
>
> It doesn't matter, actually.  pg_database is a shared catalog, so an
> update would affect all the databases.
>
> --
> Álvaro Herrera                http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services

I tried doing the update to pg_database on my Dev server and I can't get it
to work.  How do I calculate the new datminmxid value?

NextMultiXactId:  30349  relminmxid from pg_class for the table: 8376

If I subtract the relminmxid from the nextmulixact I get 21793 which won't
work.

production-copy=# update pg_database set datminmxid=21973 where
datname='production-copy';
ERROR:  column "datminmxid" is of type xid but expression is of type integer






--
View this message in context:
http://postgresql.1045698.n5.nabble.com/could-not-access-status-of-transaction-pg-multixact-issue-tp5822248p5822376.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.



Re: could not access status of transaction pg_multixact issue

От
Adrian Klaver
Дата:
On 10/09/2014 07:07 AM, jim_yates wrote:
> Alvaro Herrera-9 wrote
>> jim_yates wrote:
>>
>>>> A better way not involving mxid_age() would be to use pg_controldata to
>>>> extract the current value of the mxid counter, then subtract the
>>> current
>>>> relminmxid from that value.
>>>
>>>
>>> It's not clear which lines from pg_controldata to use for updating
>>> pg_database.datminmxid.
>>
>> The one labelled NextMultiXactId.
>>
>>> I also assume I would do the pg_database update on a idle database.
>>
>> It doesn't matter, actually.  pg_database is a shared catalog, so an
>> update would affect all the databases.
>>
>> --
>> Álvaro Herrera                http://www.2ndQuadrant.com/
>> PostgreSQL Development, 24x7 Support, Training & Services
>
> I tried doing the update to pg_database on my Dev server and I can't get it
> to work.  How do I calculate the new datminmxid value?
>
> NextMultiXactId:  30349  relminmxid from pg_class for the table: 8376
>
> If I subtract the relminmxid from the nextmulixact I get 21793 which won't
> work.
>
> production-copy=# update pg_database set datminmxid=21973 where
> datname='production-copy';
> ERROR:  column "datminmxid" is of type xid but expression is of type integer
>
>

Casting issue, try:

update pg_database set datminmxid='21973' where
datname='production-copy';

>


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: could not access status of transaction pg_multixact issue

От
Alvaro Herrera
Дата:
Adrian Klaver wrote:
> On 10/09/2014 07:07 AM, jim_yates wrote:
> >Alvaro Herrera-9 wrote
> >>jim_yates wrote:
> >>
> >>>>A better way not involving mxid_age() would be to use pg_controldata to
> >>>>extract the current value of the mxid counter, then subtract the
> >>>current
> >>>>relminmxid from that value.
> >>>
> >>>
> >>>It's not clear which lines from pg_controldata to use for updating
> >>>pg_database.datminmxid.
> >>
> >>The one labelled NextMultiXactId.
> >>
> >>>I also assume I would do the pg_database update on a idle database.
> >>
> >>It doesn't matter, actually.  pg_database is a shared catalog, so an
> >>update would affect all the databases.
> >>
> >>--
> >>Álvaro Herrera                http://www.2ndQuadrant.com/
> >>PostgreSQL Development, 24x7 Support, Training & Services
> >
> >I tried doing the update to pg_database on my Dev server and I can't get it
> >to work.  How do I calculate the new datminmxid value?
> >
> >NextMultiXactId:  30349  relminmxid from pg_class for the table: 8376
> >
> >If I subtract the relminmxid from the nextmulixact I get 21793 which won't
> >work.
> >
> >production-copy=# update pg_database set datminmxid=21973 where
> >datname='production-copy';
> >ERROR:  column "datminmxid" is of type xid but expression is of type integer
> 
> Casting issue, try:
> 
> update pg_database set datminmxid='21973' where
> datname='production-copy';

There must have been some confusion somewhere; certainly you shouldn't
be subtracting anything.  The subtraction was just suggested as a way to
determine the age.  The value to update pg_database.datminmxid to is the
oldest one of all the relminmxid in pg_class; so if you have 8376 as the
minimum value there, that's what you set pg_database.datminmxid to.  Not
the 21973 value.
update pg_database set datminmxid='8376' where datname='production-copy';

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: could not access status of transaction pg_multixact issue

От
jim_yates
Дата:
Alvaro Herrera-9 wrote
> 
> There must have been some confusion somewhere; certainly you shouldn't
> be subtracting anything.  The subtraction was just suggested as a way to
> determine the age.  The value to update pg_database.datminmxid to is the
> oldest one of all the relminmxid in pg_class; so if you have 8376 as the
> minimum value there, that's what you set pg_database.datminmxid to.  Not
> the 21973 value.
> 
>  update pg_database set datminmxid='8376' where datname='production-copy';

Ok, I'm still confused.  Is the value to use the Max value of relminmxid
from pg_class or the value of NextMultiXactId from pg_controldump?   Or are
these the same thing?





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/could-not-access-status-of-transaction-pg-multixact-issue-tp5822248p5822399.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.



Re: could not access status of transaction pg_multixact issue

От
Alvaro Herrera
Дата:
jim_yates wrote:
> Alvaro Herrera-9 wrote
> > 
> > There must have been some confusion somewhere; certainly you shouldn't
> > be subtracting anything.  The subtraction was just suggested as a way to
> > determine the age.  The value to update pg_database.datminmxid to is the
> > oldest one of all the relminmxid in pg_class; so if you have 8376 as the
> > minimum value there, that's what you set pg_database.datminmxid to.  Not
> > the 21973 value.
> > 
> >  update pg_database set datminmxid='8376' where datname='production-copy';
> 
> Ok, I'm still confused.  Is the value to use the Max value of relminmxid
> from pg_class or the value of NextMultiXactId from pg_controldump?   Or are
> these the same thing?

You need to *minimum* of all the values in relminmxid (excluding those
that are zero).

The pg_controldata value is a mechanism to determine which value is
minimum -- when there's been wraparound, the minimum might be "smaller
than zero", i.e. something like 4 thousand millions.  I doubt this is
your case though.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: could not access status of transaction pg_multixact issue

От
jim_yates
Дата:
Alvaro Herrera-9 wrote
> You need to *minimum* of all the values in relminmxid (excluding those
> that are zero).
>
> The pg_controldata value is a mechanism to determine which value is
> minimum -- when there's been wraparound, the minimum might be "smaller
> than zero", i.e. something like 4 thousand millions.  I doubt this is
> your case though.
>
> --
> Álvaro Herrera                http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services

Then I'm really confused.
The minimum relminmxid for all the rows in pg_class that have relminmxid
greater then zero is 1.
That's the current value of datminmxid in pg_database.

And the NextMultiXactId from pg_controldump is  303464.

So if I use the min value from pg_class then I have some other issue.

Where should I get the new pg_database value from?



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/could-not-access-status-of-transaction-pg-multixact-issue-tp5822248p5822449.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.



Re: could not access status of transaction pg_multixact issue

От
Alvaro Herrera
Дата:
jim_yates wrote:

> Then I'm really confused.  
> The minimum relminmxid for all the rows in pg_class that have relminmxid
> greater then zero is 1.
> That's the current value of datminmxid in pg_database.  
> 
> And the NextMultiXactId from pg_controldump is  303464.
> 
> So if I use the min value from pg_class then I have some other issue.  
> 
> Where should I get the new pg_database value from?

I'm deep in another issue which I don't want to page out right now, but
try vacuuming the tables that have relminmxid=1 with low values set for
vacuum_multixact_freeze_table_age and vacuum_multixact_freeze_min_age,
say 100000.  (I think 65536 ought to get you beyond segment
pg_multixact/offset/0000, and then that file would be removed.) Since
any multixact values below the point at which pg_upgrade ran should be
marked "no longer running" through hint bits, there would be no
pg_multixact lookups anyway and thus the vacuuming should complete with
no errors.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: could not access status of transaction pg_multixact issue

От
jim_yates
Дата:
Alvaro Herrera-9 wrote
> I'm deep in another issue which I don't want to page out right now, but
> try vacuuming the tables that have relminmxid=1 with low values set for
> vacuum_multixact_freeze_table_age and vacuum_multixact_freeze_min_age,
> say 100000.  (I think 65536 ought to get you beyond segment
> pg_multixact/offset/0000, and then that file would be removed.) Since
> any multixact values below the point at which pg_upgrade ran should be
> marked "no longer running" through hint bits, there would be no
> pg_multixact lookups anyway and thus the vacuuming should complete with
> no errors.
>
> --
> Álvaro Herrera                http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services

No joy:
acustream=# set vacuum_multixact_freeze_table_age=100000;
SET
acustream=# set vacuum_multixact_freeze_min_age=100000;
SET
acustream=# vacuum phyorg_charges_to_invoice;
ERROR:  could not access status of transaction 182579
DETAIL:  Could not open file "pg_multixact/members/EEAC": No such file or
directory.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/could-not-access-status-of-transaction-pg-multixact-issue-tp5822248p5822482.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.



Re: could not access status of transaction pg_multixact issue

От
jim_yates
Дата:
Alvaro Herrera-9 wrote
> jim_yates wrote:
>
>> Then I'm really confused.
>> The minimum relminmxid for all the rows in pg_class that have relminmxid
>> greater then zero is 1.
>> That's the current value of datminmxid in pg_database.
>>
>> And the NextMultiXactId from pg_controldump is  303464.
>>
>> So if I use the min value from pg_class then I have some other issue.
>>
>> Where should I get the new pg_database value from?
>
> I'm deep in another issue which I don't want to page out right now, but
> try vacuuming the tables that have relminmxid=1 with low values set for
> vacuum_multixact_freeze_table_age and vacuum_multixact_freeze_min_age,
> say 100000.  (I think 65536 ought to get you beyond segment
> pg_multixact/offset/0000, and then that file would be removed.) Since
> any multixact values below the point at which pg_upgrade ran should be
> marked "no longer running" through hint bits, there would be no
> pg_multixact lookups anyway and thus the vacuuming should complete with
> no errors.
>
> --
> Álvaro Herrera                http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services

I set vacuum_multixact_freeze_table_age and vacuum_multixact_freeze_min_age
to 100,000 and vacuumed all the tables with a relminmxid='1' and relkind='r'
using pg_class as the source.
I still couldn't vacuum or select the original table with the issue.
I did solve the problem by dropping the table and restoring from my standby
server.

Is there else anything I need to do to prevent being bitten by this bug
again?

I still have a value of 1 for datminmxid in pg_database, and the 0000 file
is still in pg_multixact/members and offsets.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/could-not-access-status-of-transaction-pg-multixact-issue-tp5822248p5822573.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.



Re: could not access status of transaction pg_multixact issue

От
Alvaro Herrera
Дата:
jim_yates wrote:
> Alvaro Herrera-9 wrote
> > jim_yates wrote:
> > 
> >> Then I'm really confused.  
> >> The minimum relminmxid for all the rows in pg_class that have relminmxid
> >> greater then zero is 1.
> >> That's the current value of datminmxid in pg_database.  
> >> 
> >> And the NextMultiXactId from pg_controldump is  303464.
> >> 
> >> So if I use the min value from pg_class then I have some other issue.  
> >> 
> >> Where should I get the new pg_database value from?
> > 
> > I'm deep in another issue which I don't want to page out right now, but
> > try vacuuming the tables that have relminmxid=1 with low values set for
> > vacuum_multixact_freeze_table_age and vacuum_multixact_freeze_min_age,
> > say 100000.  (I think 65536 ought to get you beyond segment
> > pg_multixact/offset/0000, and then that file would be removed.) Since
> > any multixact values below the point at which pg_upgrade ran should be
> > marked "no longer running" through hint bits, there would be no
> > pg_multixact lookups anyway and thus the vacuuming should complete with
> > no errors.
> > 
> > -- 
> > Álvaro Herrera                http://www.2ndQuadrant.com/
> > PostgreSQL Development, 24x7 Support, Training & Services
> 
> I set vacuum_multixact_freeze_table_age and vacuum_multixact_freeze_min_age
> to 100,000 and vacuumed all the tables with a relminmxid='1' and relkind='r'
> using pg_class as the source.
> I still couldn't vacuum or select the original table with the issue. 
> I did solve the problem by dropping the table and restoring from my standby
> server.

It might have proven interesting to look into the actual values related
to the multixact that caused you grief.  It's not clear to me whether
the 187k value you got in the error message came from before the upgrade
or after.  If it's prior to the upgrade, there should have been no
lookup of it; if it was after, the pg_multixact files should have been
there.

I wonder if this is somehow related to this problem:
http://www.postgresql.org/message-id/20140330040029.GY4582@tamriel.snowman.net

> Is there else anything I need to do to prevent being bitten by this bug
> again?

Supposedly it's a one-time thing after the upgrade.

> I still have a value of 1 for datminmxid in pg_database, and the 0000 file
> is still in pg_multixact/members and offsets.

Eventually the datminmxid should advance.   Make sure the minimum
relminmxid is no longer 1.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services