Re: assertion failure 9.3.4

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: assertion failure 9.3.4
Дата
Msg-id 20140421185422.GA13906@alap3.anarazel.de
обсуждение исходный текст
Ответ на assertion failure 9.3.4  (Andrew Dunstan <andrew.dunstan@pgexperts.com>)
Ответы Re: assertion failure 9.3.4  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: assertion failure 9.3.4  (Andrew Dunstan <andrew@dunslane.net>)
Список pgsql-hackers
Hi,

I spent the last two hours poking arounds in the environment Andrew
provided and I was able to reproduce the issue, find a assert to
reproduce it much faster and find a possible root cause.

Since the symptom of the problem seem to be multixacts with more than
one updating xid, I added a check to MultiXactIdCreateFromMembers()
preventing that. That requires to move ISUPDATE_from_mxstatus() to a
header, but I think we should definitely add such a assert.

As it turns out the problem is in the
else if (result == HeapTupleBeingUpdated && wait)
branch in (at least) heap_update(). When the problem is hit the
to-be-updated tuple originally has HEAP_XMIN_COMMITTED |
HEAP_XMAX_LOCK_ONLY | HEAP_XMAX_KEYSHR_LOCK set. So we release the
buffer lock, acquire the tuple lock, and reacquire the buffer lock. But
inbetween the locking backend has actually updated the tuple.
The code tries to protect against that with:               /*                * recheck the locker; if someone else
changedthe tuple while                * we weren't looking, start over.                */               if
((oldtup.t_data->t_infomask& HEAP_XMAX_IS_MULTI) ||                   !TransactionIdEquals(
     HeapTupleHeaderGetRawXmax(oldtup.t_data),                                        xwait))                   goto
l2;
               can_continue = true;               locker_remains = true;

and similar. The problem is that in Andrew's case the infomask changes
from 0x2192 to 0x2102 (i.e. it's a normal update afterwards), while xmax
stays the same. Ooops.
A bit later there's:       result = can_continue ? HeapTupleMayBeUpdated : HeapTupleUpdated;
So, from thereon we happily continue to update the tuple, thinking
there's no previous updater. Which obviously causes problems.

I've hacked^Wfixed this by changing the infomask test above into
infomask != oldtup.t_data->t_infomask in a couple of places. That seems
to be sufficient to survive the testcase a couple of times.

I am too hungry right now to think about a proper fix for this and
whether there's further problematic areas.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: Perfomance degradation 9.3 (vs 9.2) for FreeBSD
Следующее
От: Tom Lane
Дата:
Сообщение: Re: assertion failure 9.3.4