Re: out-of-order XID insertion in KnownAssignedXids

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Re: out-of-order XID insertion in KnownAssignedXids
Дата
Msg-id 26dce67d-88c6-8abc-828c-6023828729e2@postgrespro.ru
обсуждение исходный текст
Ответ на Re: out-of-order XID insertion in KnownAssignedXids  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers

On 05.10.2018 11:04, Michael Paquier wrote:
> On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote:
>> As you can notice, XID 2004495308 is encountered twice which cause error in
>> KnownAssignedXidsAdd:
>>
>>      if (head > tail &&
>>          TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1], from_xid))
>>      {
>>          KnownAssignedXidsDisplay(LOG);
>>          elog(ERROR, "out-of-order XID insertion in KnownAssignedXids");
>>      }
>>
>> The probability of this error is very small but it can quite easily
>> reproduced: you should just set breakpoint in debugger after calling
>> MarkAsPrepared in twophase.c and then try to prepare any transaction.
>> MarkAsPrepared  will add GXACT to proc array and at this moment there will
>> be two entries in procarray with the same XID:
>>
>> [snip]
>>
>> Now generated RUNNING_XACTS record contains duplicated XIDs.
> So, I have been doing exactly that, and if you trigger a manual
> checkpoint then things happen quite correctly if you let the first
> session finish:
> rmgr: Standby     len (rec/tot):     58/    58, tx:          0, lsn:
> 0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608
> latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606
>
> If you still maintain the debugger after calling MarkAsPrepared, then
> the manual checkpoint would block.  Now if you actually keep the
> debugger, and wait for a checkpoint timeout to happen, then I can see
> the incorrect record.  It is impressive that your customer has been able
> to see that first, and then that you have been able to get into that
> state with simple steps.


There are about 1000 active clients performing 2PC transactions, so if 
you perform backup (which does checkpoint)
then probability seems to be large enough.

I have reproduced this problem without using gdb by just running in 
parallel many 2PC transactions and checkpoints:

for ((i=1;i<10;i++))
do
     pgbench -n -T 300000 -M prepared -f t$i.sql postgres > t$i.log &
done

pgbench -n -T 300000 -f checkpoint.sql postgres > checkpoint.log &
wait
------------------------------

tN.sql:

begin;
update t set val=val+1 where pk=N;
prepare transaction 'tN';
commit prepared 'tN';

------------------------------

checkpoint.sql:

checkpoint;



>
>> I want to ask opinion of community about the best way of fixing this
>> problem.  Should we avoid storing duplicated XIDs in procarray (by
>> invalidating XID in original pgaxct) or eliminate/change check for
>> duplicate in KnownAssignedXidsAdd (for example just ignore
>> duplicates)?
> Hmmmmm...  Please let me think through that first.  It seems to me that
> the record should not be generated to begin with.  At least I am able to
> confirm what you see.
> --
> Michael

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: shared-memory based stats collector
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: pg_upgrade failed with ERROR: null relpartbound for relation18159 error.