Обсуждение: ERROR: found multixact XX from before relminmxid YY

Поиск
Список
Период
Сортировка

ERROR: found multixact XX from before relminmxid YY

От
Mark Fletcher
Дата:
Hi,

Starting yesterday morning, auto vacuuming of one of our postgresql 9.6.10 (CentOS 7) table's started failing:

ERROR:  found multixact 370350365 from before relminmxid 765860874
CONTEXT:  automatic vacuum of table "userdb.public.subs"

This is about as plain and simple a table as there is. No triggers or foreign keys, I'm not using any extensions. It has about 2.8M rows. I have not done any consistency checks, but nothing strange has manifested in production.

Reading the various discussions about this error, the only solution I found was here:


But no other reports of this solving the problem. Can someone verify that if I do the mentioned fix (and I assume upgrade to 9.6.11) that will fix the problem? And that it doesn't indicate table corruption?

Thanks,
Mark

Re: ERROR: found multixact XX from before relminmxid YY

От
Tom Lane
Дата:
Mark Fletcher <markf@corp.groups.io> writes:
> Starting yesterday morning, auto vacuuming of one of our postgresql 9.6.10
> (CentOS 7) table's started failing:
> ERROR:  found multixact 370350365 from before relminmxid 765860874
> CONTEXT:  automatic vacuum of table "userdb.public.subs"

Ugh.

> Reading the various discussions about this error, the only solution I found
> was here:
> https://www.postgresql.org/message-id/CAGewt-ukbL6WL8cc-G%2BiN9AVvmMQkhA9i2TKP4-6wJr6YOQkzA%40mail.gmail.com
> But no other reports of this solving the problem. Can someone verify that
> if I do the mentioned fix (and I assume upgrade to 9.6.11) that will fix
> the problem? And that it doesn't indicate table corruption?

Yeah, SELECT FOR UPDATE should overwrite the broken xmax value and thereby
fix it, I expect.  However, I don't see anything in the release notes
suggesting that we've fixed any related bugs since 9.6.10, so if this
just appeared then we've still got a problem :-(.  Did anything
interesting happen since your last successful autovacuum on that table?
Database crashes, WAL-related parameter changes, that sort of thing?

            regards, tom lane


Re: ERROR: found multixact XX from before relminmxid YY

От
Mark Fletcher
Дата:
On Fri, Dec 28, 2018 at 4:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, SELECT FOR UPDATE should overwrite the broken xmax value and thereby
fix it, I expect.  However, I don't see anything in the release notes
suggesting that we've fixed any related bugs since 9.6.10, so if this
just appeared then we've still got a problem :-(.  Did anything
interesting happen since your last successful autovacuum on that table?
Database crashes, WAL-related parameter changes, that sort of thing?

The last autovacuum of that table was on Dec 8th, the last auto analyze was Dec 26. There have been no schema changes on that particular table, database crashes or WAL-related parameter changes since then. We've done other schema changes during that time, but otherwise the database has been stable.

Thanks,
Mark

Re: ERROR: found multixact XX from before relminmxid YY

От
Andres Freund
Дата:
Hi,

On 2018-12-28 19:49:36 -0500, Tom Lane wrote:
> Mark Fletcher <markf@corp.groups.io> writes:
> > Starting yesterday morning, auto vacuuming of one of our postgresql 9.6.10
> > (CentOS 7) table's started failing:
> > ERROR:  found multixact 370350365 from before relminmxid 765860874
> > CONTEXT:  automatic vacuum of table "userdb.public.subs"
>
> Ugh.
>
> > Reading the various discussions about this error, the only solution I found
> > was here:
> > https://www.postgresql.org/message-id/CAGewt-ukbL6WL8cc-G%2BiN9AVvmMQkhA9i2TKP4-6wJr6YOQkzA%40mail.gmail.com
> > But no other reports of this solving the problem. Can someone verify that
> > if I do the mentioned fix (and I assume upgrade to 9.6.11) that will fix
> > the problem? And that it doesn't indicate table corruption?
>
> Yeah, SELECT FOR UPDATE should overwrite the broken xmax value and thereby
> fix it, I expect.

Right.

> However, I don't see anything in the release notes
> suggesting that we've fixed any related bugs since 9.6.10, so if this
> just appeared then we've still got a problem :-(.  Did anything
> interesting happen since your last successful autovacuum on that table?
> Database crashes, WAL-related parameter changes, that sort of thing?

I think it's entirely conceivable that the damage happened with earlier versions,
and just became visible now as the global horizon increased.

Greetings,

Andres Freund