Обсуждение: Re: MultiXactId Error in Autovacuum

Поиск

Список

Период

Сортировка

Re: MultiXactId Error in Autovacuum

От

Karl Hafner

Дата:

18 ноября 2013 г., 18:30:47

Thank you for your suggestion!

Since the autovacuum process is failing each time because of: "ERROR: MultiXactId 2683601542 does no longer exist -- apparent wraparound", I have expected the manual vacuum also to fail because of the same error.

I've tried it anyway. Well ... manual vacuum also fails unfortunately.

I did not run vacuum with --full, because I want to avoid a locked table.

BR,

Karl

On Sat, Nov 16, 2013 at 9:25 PM, Strahinja Kustudić <strahinjak@nordeus.com> wrote:

If autovacuum doesn't want to finish the vacuuming of that table, why not just kill the autovacuum process on that table (you could use pg_cancel_backend())and run vacuum manually on that table?

Strahinja Kustudić | Lead System Engineer | Nordeus

On Fri, Nov 15, 2013 at 10:33 AM, Karl Hafner <karl@scoreloop.com> wrote:
Sorry, dear mailing list, somehow my email got sent before I finished it :-(

I will go on and repeat. Please bear with me!

I am currently worried about an error that shows up in our log files since a few weeks:

2013-11-15 07:13:13 UTC [22668]: [2-1] ERROR: MultiXactId 2683601542 does no longer exist -- apparent wraparound
2013-11-15 07:13:13 UTC [22668]: [3-1] CONTEXT: automatic vacuum of table "scoreloop.public.gamer_device_sightings"

It is always the same MultiXactId.
It seems that "autovacuum" is active on that table, but it seems to have restarted today in the morning:

select query_start, query, state, pid from pg_stat_activity where query LIKE 'autovacuum%' ;

# select query_start, query, state, pid from pg_stat_activity where query LIKE 'autovacuum%' ;

query_start | query | state | pid
-------------------------------+----------------------------------------------------------------------------------+--------+------
2013-11-15 07:13:13.281202+00 | autovacuum: VACUUM ANALYZE public.gamer_device_sightings (to prevent wraparound) | active | 4360

There are a few locks on tat table:

# select c.oid,c.relname,l.pid,l.mode,l.granted from pg_class c join pg_locks l on c.oid=l.relation where l.pid='4360' order by l.pid;
oid | relname | pid | mode | granted
-------+------------------------------------------+------+--------------------------+---------
19862 | index_gamer_device_sightings_on_gamer_id | 4360 | RowExclusiveLock | t
19719 | gamer_device_sightings_pkey | 4360 | RowExclusiveLock | t
19309 | gamer_device_sightings | 4360 | ShareUpdateExclusiveLock | t

The error I described seems to terminate the auto vacuum process: It has never finished on that table.
It is the only table without a finished vacuum. So I guess the above error is the reason for this.

Is there anything that I can do to get that fixed? Should I worry about the error? (I would not if the vacuum would run through).

Any tips, any help would be much appreciated!

Thank you very much! And sorry once more for my posting mistake
Karl

On Fri, Nov 15, 2013 at 10:16 AM, Karl Hafner <karl@scoreloop.com> wrote:
Dear mailing-list,

I am currently worried about an error that shows up in our log files since a few weeks:

2013-11-15 07:13:13 UTC [22668]: [2-1] ERROR: MultiXactId 2683601542 does no longer exist -- apparent wraparound
2013-11-15 07:13:13 UTC [22668]: [3-1] CONTEXT: automatic vacuum of table "scoreloop.public.gamer_device_sightings"

It seems that "autovacuum" is active on that table, but it seems to have restarted today in the morning:

select query_start, query, state, pid from pg_stat_activity where query LIKE 'autovacuum%' ;

XXX=# select query_start, query, state, pid from pg_stat_activity where query LIKE 'autovacuum%' ;

query_start | query | state | pid
-------------------------------+----------------------------------------------------------------------------------+--------+------
2013-11-15 07:13:13.281202+00 | autovacuum: VACUUM ANALYZE public.gamer_device_sightings (to prevent wraparound) | active | 4360

Re: MultiXactId Error in Autovacuum

От

Albe Laurenz

Дата:

19 ноября 2013 г., 13:00:27

Karl Hafner wrote:
> Since the autovacuum process is failing each time because of: "ERROR:  MultiXactId 2683601542 does no
> longer exist -- apparent wraparound", I have expected the manual vacuum also to fail because of the
> same error.
> I've tried it anyway. Well ... manual vacuum also fails unfortunately.

Do you know if you experienced transaction ID wraparound as the
message suggests?

Are there any other questionable messages in the log?

What do you get for the following queries:

SELECT txid_current();

SELECT datname, datfrozenxid FROM pg_database;

SELECT current_setting('autovacuum_freeze_max_age'),
       current_setting('vacuum_freeze_min_age'),
       current_setting('vacuum_freeze_table_age');

Yours,
Laurenz Albe

Re: MultiXactId Error in Autovacuum

От

Karl Hafner

Дата:

19 ноября 2013 г., 14:14:49

Hi Laurenz,

thank you for looking into my issue.

Do you know if you experienced transaction ID wraparound as the
message suggests?

I am not aware of a wraparound in the past. I am quite sure no.

Are there any other questionable messages in the log?

The log shows nothing special except this error that is always associated with the same MultiXactId 2683601542.

What do you get for the following queries:

SELECT txid_current();

txid_current

--------------

1673482898

SELECT datname, datfrozenxid FROM pg_database;

SELECT datname, datfrozenxid FROM pg_database;

datname | datfrozenxid

-----------+--------------

postgres | 1497699916

template0 | 1497699916

MYDB | 1335927866

template1 | 1497699916

SELECT current_setting('autovacuum_freeze_max_age'),
current_setting('vacuum_freeze_min_age'),
current_setting('vacuum_freeze_table_age');

current_setting | current_setting | current_setting

-----------------+-----------------+-----------------

200000000 | 50000000 | 150000000

Thank you!

Best regards,

Karl

Re: MultiXactId Error in Autovacuum

От

Albe Laurenz

Дата:

19 ноября 2013 г., 17:55:08

Karl Hafner wrote:
>> Do you know if you experienced transaction ID wraparound as the
>> message suggests?
> 
> I am not aware of a wraparound in the past. I am quite sure no.
> 
>> Are there any other questionable messages in the log?
> 
> The log shows nothing special except this error that is always associated with the same MultiXactId
> 2683601542.

Well, probably you are right, because PostgreSQL would shutdown
before a wraparound happens, and you would have noticed that.

I don't know how it happened, but your database has suffered
corruption.  Any chance you can restore from a backup?
Can you still "SELECT *" from the table that has the problem?

Yours,
Laurenz Albe

Re: MultiXactId Error in Autovacuum

От

Karl Hafner

Дата:

20 ноября 2013 г., 17:20:58

Well, probably you are right, because PostgreSQL would shutdown
before a wraparound happens, and you would have noticed that.

I don't know how it happened, but your database has suffered
corruption. Any chance you can restore from a backup?
Can you still "SELECT *" from the table that has the problem?

I cannot run a full SELECT * on that table. It stops after a few minutes with the same error.

I am currently testing different things on a DB created from a backup like exporting per month/per day …

but I am running into this error again and again.

Running a VACUUM --FULL gives me: WARNING: concurrent delete in progress within table "the_bad_table" … then it also breaks.

BR,

Karl

Re: MultiXactId Error in Autovacuum

От

Karl Hafner

Дата:

03 декабря 2013 г., 11:56:05

Hi all,

sorry for not getting back earlier. I wanted to let you know that we found the reason for our error (and a possible solution for our problem) in the meantime:

The starting point of our worries was a log file error message: "ERROR: MultiXactId 2683601542 does no longer exist -- apparent wraparound".

Autovacuum never finished on one of our tables because of this.

So we started to investigate and found out:

We had a few rows in our table that suffered from some inconsistencies in its rowlock status, probably caused by a disc crash a few month ago.

Statements like "SELECT * FROM mytable" or autovacuum run into an error and exit when reading one of these inconstistent rows.

We digged deeper: Those rows were marked already DEAD but still pointed to MultiXactIds that were not listed in

pg_multixact anymore (what probably messed up the lock information)

The interesting point is: Different statements read those information in a different order.

(a) Statements that select a single row like "SELECT * FROM mytable WHERE id = 'xxxx'" would recognize the status DEAD

BEFORE checking the LOCK status. Those statements just return 0 rows.

(b) Statements that read every row like "SELECT id FROM mytable" process the LOCK information BEFORE checking if a row is already marked DEAD.

If there is something wrong (like in our case due to inconsistent lock information) the statement quits with an ERROR.

In our case autovacuum always died because of this.

The trick was patching postgres for case (b) in a way that it would just print a WARNING and the row ID for these defect rows and continue.

Finally we knew the IDs of the defect rows (we had to install the contrib module pgrowlocks to find the locked rows) using the query

"SELECT id FROM mytable AS m, pgrowlocks('mytable') AS p WHERE p.locked_row = m.ctid".

Just deleting these rows using "DELETE FROM mytable WHERE id = 'xxxx'" did not work because of case (a). It returned 0 rows.

So we copied all but the defect rows into a new table, dropped the old table, and renamed the new one - quite an effort for finally deleting two rows :-)

We are wondering it there is a deeper reason behind treating case (a) and (b) in a different way or if it would be better to get rid of these inconsistencies in future postgres versions.

Thanks for all your input and help.

BR,

Karl

On Wed, Nov 20, 2013 at 3:20 PM, Karl Hafner <karl@scoreloop.com> wrote:

Well, probably you are right, because PostgreSQL would shutdown
before a wraparound happens, and you would have noticed that.

I don't know how it happened, but your database has suffered
corruption. Any chance you can restore from a backup?
Can you still "SELECT *" from the table that has the problem?

I cannot run a full SELECT * on that table. It stops after a few minutes with the same error.
I am currently testing different things on a DB created from a backup like exporting per month/per day …
but I am running into this error again and again.
Running a VACUUM --FULL gives me: WARNING: concurrent delete in progress within table "the_bad_table" … then it also breaks.

BR,
Karl

Re: MultiXactId Error in Autovacuum

От

Simon Riggs

Дата:

03 декабря 2013 г., 12:12:50

On 3 December 2013 08:55, Karl Hafner <karl@scoreloop.com> wrote:

> We are wondering it there is a deeper reason behind treating case (a) and
> (b) in a different way or if it would be better to get rid of these
> inconsistencies in future postgres versions.

I believe your problems have been fixed in the latest point release, 9.3.2

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Re: MultiXactId Error in Autovacuum

Re: MultiXactId Error in Autovacuum

Re: MultiXactId Error in Autovacuum

Re: MultiXactId Error in Autovacuum

Re: MultiXactId Error in Autovacuum

Re: MultiXactId Error in Autovacuum

Re: MultiXactId Error in Autovacuum

Re: MultiXactId Error in Autovacuum