Re: ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound
Дата
Msg-id 20150526184736.GX5885@postgresql.org
обсуждение исходный текст
Ответ на Re: ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound  (Paul Smith <paul@pscs.co.uk>)
Ответы Re: ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound  (Paul Smith <paul@pscs.co.uk>)
Список pgsql-hackers
Paul Smith wrote:

> No, nothing like that. It was just running fine, and then suddenly (at 2am
> on 23 May) it started throwing up loads of these errors. The DB server
> wasn't even restarted at that point. It was just working fine, then suddenly
> wasn't. (The first error was at 02:00:32 BST, then every few minutes after
> that there's another one).

Another crazy thought is that the host system messed up and overwrote
part of the table with pages from some other guest system.  I have never
seen a report of such a thing happening.

> It's running in a Hyper-V guest. We had taken a backup of the VM at 00:34 on
> 23 May and that looks to be absolutely fine. What I have done now is restore
> that backup and import the new data which arrived since that backup was
> made, and it seems OK now. I still have the 'broken' installation in case
> more information is needed from it. I'd try to get a raw dump of the damaged
> tuple data if I knew how to find where it is in the relation file...

Well, you could try a binary search to locate the (or a) borked page
within the table.  Or you could write a plpgsql function with an
EXCEPTION block that reads each tuple, and in case of error reports the
CTID of the tuple being read. (select * from tab where
ctid=generated_ctid; EXCEPT WHEN OTHERS THEN raise notice ' .. ',
generated_ctid).  Once you have located the problem tuple, a pageinspect
dump would be the starting point.

Another idea is to attach a debugger to the backend process, and set a
breakpoint on function errfinish.  Once the error is hit the breakpoint
will stop the execution before the error is reported; at that point you
can request for a backtrace from the debugger.  (Make sure to examine
errordata to see whether the error being hit is the multixact one.)

> I suppose it's possible that it was disk or memory corruption, but I've seen
> that before, and it hasn't looked like this.

Yeah, same here.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Re: fsync bug faq for publication?
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Run pgindent now?