Обсуждение: out-of-order XID insertion in KnownAssignedXids
Hi hackers, Looks like there is a bug with logging running transactions XIDs and prepared transactions. One of our customers get error "FATAL: out-of-order XID insertion in KnownAssignedXids" trying to apply backup. WAL contains the following record: rmgr: Standby len (rec/tot): 98/ 98, tx: 0, lsn: 1418/A9A76C90, prev 1418/A9A76C48, desc: RUNNING_XACTS nextXid 2004495309 latestCompletedXid 2004495307 oldestRunningXid 2004495290; 3 xacts: 2004495290 2004495308 2004495308 As you can notice, XID 2004495308 is encountered twice which cause error in KnownAssignedXidsAdd: if (head > tail && TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1], from_xid)) { KnownAssignedXidsDisplay(LOG); elog(ERROR, "out-of-order XID insertion in KnownAssignedXids"); } The probability of this error is very small but it can quite easily reproduced: you should just set breakpoint in debugger after calling MarkAsPrepared in twophase.c and then try to prepare any transaction. MarkAsPrepared will add GXACT to proc array and at this moment there will be two entries in procarray with the same XID: (gdb) p procArray->numProcs $2 = 4 (gdb) p allPgXact[procArray->pgprocnos[0]] $4 = {xid = 513976717, xmin = 0, vacuumFlags = 0 '\000', overflowed = 0 '\000', delayChkpt = 1 '\001', nxids = 0 '\000', used = 0 '\000', parent = 0x0} (gdb) p allPgXact[procArray->pgprocnos[1]] $5 = {xid = 0, xmin = 0, vacuumFlags = 0 '\000', overflowed = 0 '\000', delayChkpt = 0 '\000', nxids = 0 '\000', used = 0 '\000', parent = 0x0} (gdb) p allPgXact[procArray->pgprocnos[2]] $6 = {xid = 0, xmin = 0, vacuumFlags = 0 '\000', overflowed = 0 '\000', delayChkpt = 0 '\000', nxids = 0 '\000', used = 0 '\000', parent = 0x0} (gdb) p allPgXact[procArray->pgprocnos[3]] $7 = {xid = 513976717, xmin = 0, vacuumFlags = 0 '\000', overflowed = 0 '\000', delayChkpt = 0 '\000', nxids = 0 '\000', used = 0 '\000', parent = 0x0} Then you should just wait for sometime until checkpoint timeout is triggered and it logs snapshot: (gdb) bt #0 0x00000000007f3dab in GetRunningTransactionData () at procarray.c:2240 #1 0x00000000007fab22 in LogStandbySnapshot () at standby.c:943 #2 0x000000000077cde8 in BackgroundWriterMain () at bgwriter.c:331 #3 0x00000000005377f3 in AuxiliaryProcessMain (argc=2, argv=0x7ffe00aa00e0) at bootstrap.c:446 #4 0x000000000078e07e in StartChildProcess (type=BgWriterProcess) at postmaster.c:5323 #5 0x000000000078b6f0 in reaper (postgres_signal_arg=17) at postmaster.c:2948 #6 <signal handler called> #7 0x00007f1356d665b3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:84 #8 0x0000000000789931 in ServerLoop () at postmaster.c:1765 #9 0x000000000078906c in PostmasterMain (argc=3, argv=0x1902640) at postmaster.c:1406 #10 0x00000000006d0e4f in main (argc=3, argv=0x1902640) at main.c:228 Now generated RUNNING_XACTS record contains duplicated XIDs. I want to ask opinion of community about the best way of fixing this problem. Should we avoid storing duplicated XIDs in procarray (by invalidating XID in original pgaxct) or eliminate/change check for duplicate in KnownAssignedXidsAdd (for example just ignore duplicates)?
On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote: > As you can notice, XID 2004495308 is encountered twice which cause error in > KnownAssignedXidsAdd: > > if (head > tail && > TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1], from_xid)) > { > KnownAssignedXidsDisplay(LOG); > elog(ERROR, "out-of-order XID insertion in KnownAssignedXids"); > } > > The probability of this error is very small but it can quite easily > reproduced: you should just set breakpoint in debugger after calling > MarkAsPrepared in twophase.c and then try to prepare any transaction. > MarkAsPrepared will add GXACT to proc array and at this moment there will > be two entries in procarray with the same XID: > > [snip] > > Now generated RUNNING_XACTS record contains duplicated XIDs. So, I have been doing exactly that, and if you trigger a manual checkpoint then things happen quite correctly if you let the first session finish: rmgr: Standby len (rec/tot): 58/ 58, tx: 0, lsn: 0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608 latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606 If you still maintain the debugger after calling MarkAsPrepared, then the manual checkpoint would block. Now if you actually keep the debugger, and wait for a checkpoint timeout to happen, then I can see the incorrect record. It is impressive that your customer has been able to see that first, and then that you have been able to get into that state with simple steps. > I want to ask opinion of community about the best way of fixing this > problem. Should we avoid storing duplicated XIDs in procarray (by > invalidating XID in original pgaxct) or eliminate/change check for > duplicate in KnownAssignedXidsAdd (for example just ignore > duplicates)? Hmmmmm... Please let me think through that first. It seems to me that the record should not be generated to begin with. At least I am able to confirm what you see. -- Michael
Вложения
On 05.10.2018 11:04, Michael Paquier wrote: > On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote: >> As you can notice, XID 2004495308 is encountered twice which cause error in >> KnownAssignedXidsAdd: >> >> if (head > tail && >> TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1], from_xid)) >> { >> KnownAssignedXidsDisplay(LOG); >> elog(ERROR, "out-of-order XID insertion in KnownAssignedXids"); >> } >> >> The probability of this error is very small but it can quite easily >> reproduced: you should just set breakpoint in debugger after calling >> MarkAsPrepared in twophase.c and then try to prepare any transaction. >> MarkAsPrepared will add GXACT to proc array and at this moment there will >> be two entries in procarray with the same XID: >> >> [snip] >> >> Now generated RUNNING_XACTS record contains duplicated XIDs. > So, I have been doing exactly that, and if you trigger a manual > checkpoint then things happen quite correctly if you let the first > session finish: > rmgr: Standby len (rec/tot): 58/ 58, tx: 0, lsn: > 0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608 > latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606 > > If you still maintain the debugger after calling MarkAsPrepared, then > the manual checkpoint would block. Now if you actually keep the > debugger, and wait for a checkpoint timeout to happen, then I can see > the incorrect record. It is impressive that your customer has been able > to see that first, and then that you have been able to get into that > state with simple steps. There are about 1000 active clients performing 2PC transactions, so if you perform backup (which does checkpoint) then probability seems to be large enough. I have reproduced this problem without using gdb by just running in parallel many 2PC transactions and checkpoints: for ((i=1;i<10;i++)) do pgbench -n -T 300000 -M prepared -f t$i.sql postgres > t$i.log & done pgbench -n -T 300000 -f checkpoint.sql postgres > checkpoint.log & wait ------------------------------ tN.sql: begin; update t set val=val+1 where pk=N; prepare transaction 'tN'; commit prepared 'tN'; ------------------------------ checkpoint.sql: checkpoint; > >> I want to ask opinion of community about the best way of fixing this >> problem. Should we avoid storing duplicated XIDs in procarray (by >> invalidating XID in original pgaxct) or eliminate/change check for >> duplicate in KnownAssignedXidsAdd (for example just ignore >> duplicates)? > Hmmmmm... Please let me think through that first. It seems to me that > the record should not be generated to begin with. At least I am able to > confirm what you see. > -- > Michael -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 05.10.2018 11:04, Michael Paquier wrote: > On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote: >> As you can notice, XID 2004495308 is encountered twice which cause error in >> KnownAssignedXidsAdd: >> >> if (head > tail && >> TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1], from_xid)) >> { >> KnownAssignedXidsDisplay(LOG); >> elog(ERROR, "out-of-order XID insertion in KnownAssignedXids"); >> } >> >> The probability of this error is very small but it can quite easily >> reproduced: you should just set breakpoint in debugger after calling >> MarkAsPrepared in twophase.c and then try to prepare any transaction. >> MarkAsPrepared will add GXACT to proc array and at this moment there will >> be two entries in procarray with the same XID: >> >> [snip] >> >> Now generated RUNNING_XACTS record contains duplicated XIDs. > So, I have been doing exactly that, and if you trigger a manual > checkpoint then things happen quite correctly if you let the first > session finish: > rmgr: Standby len (rec/tot): 58/ 58, tx: 0, lsn: > 0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608 > latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606 > > If you still maintain the debugger after calling MarkAsPrepared, then > the manual checkpoint would block. Now if you actually keep the > debugger, and wait for a checkpoint timeout to happen, then I can see > the incorrect record. It is impressive that your customer has been able > to see that first, and then that you have been able to get into that > state with simple steps. > >> I want to ask opinion of community about the best way of fixing this >> problem. Should we avoid storing duplicated XIDs in procarray (by >> invalidating XID in original pgaxct) or eliminate/change check for >> duplicate in KnownAssignedXidsAdd (for example just ignore >> duplicates)? > Hmmmmm... Please let me think through that first. It seems to me that > the record should not be generated to begin with. At least I am able to > confirm what you see. The simplest way to fix the problem is to ignore duplicates before adding them to KnownAssignedXids. We in any case perform sort i this place... -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Вложения
On Mon, Oct 08, 2018 at 12:04:28PM +0300, Konstantin Knizhnik wrote: > The simplest way to fix the problem is to ignore duplicates before adding > them to KnownAssignedXids. > We in any case perform sort i this place... I may of course be missing something, but shouldn't we not have duplicates in the first place? -- Michael
Вложения
On 08.10.2018 12:14, Michael Paquier wrote: > On Mon, Oct 08, 2018 at 12:04:28PM +0300, Konstantin Knizhnik wrote: >> The simplest way to fix the problem is to ignore duplicates before adding >> them to KnownAssignedXids. >> We in any case perform sort i this place... > I may of course be missing something, but shouldn't we not have > duplicates in the first place? The reason of appearing duplicated XIDs in case of 2PC seems to be clear. It may be possible to eliminate it by clearing XID of MyPgxact for prepared transaction. But there are two problems with it: 1. I am not sure that it will not break something 2. There is obvious race condition between adding GXACT to ProcArrayAdd and invalidating XID of current transaction. If it is cleared before calling ProcArrayAdd, then there will be some moment when XID is not present in procarray. If it is done after calling ProcArrayAdd, then still it is possible to see duplicated XID in procarray. From my point of view it is easier and less invasive to exclude duplicates while replaying RUNNING_XIDS record. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On October 8, 2018 2:04:28 AM PDT, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > > >On 05.10.2018 11:04, Michael Paquier wrote: >> On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote: >>> As you can notice, XID 2004495308 is encountered twice which cause >error in >>> KnownAssignedXidsAdd: >>> >>> if (head > tail && >>> TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1], >from_xid)) >>> { >>> KnownAssignedXidsDisplay(LOG); >>> elog(ERROR, "out-of-order XID insertion in >KnownAssignedXids"); >>> } >>> >>> The probability of this error is very small but it can quite easily >>> reproduced: you should just set breakpoint in debugger after calling >>> MarkAsPrepared in twophase.c and then try to prepare any >transaction. >>> MarkAsPrepared will add GXACT to proc array and at this moment >there will >>> be two entries in procarray with the same XID: >>> >>> [snip] >>> >>> Now generated RUNNING_XACTS record contains duplicated XIDs. >> So, I have been doing exactly that, and if you trigger a manual >> checkpoint then things happen quite correctly if you let the first >> session finish: >> rmgr: Standby len (rec/tot): 58/ 58, tx: 0, lsn: >> 0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608 >> latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606 >> >> If you still maintain the debugger after calling MarkAsPrepared, then >> the manual checkpoint would block. Now if you actually keep the >> debugger, and wait for a checkpoint timeout to happen, then I can see >> the incorrect record. It is impressive that your customer has been >able >> to see that first, and then that you have been able to get into that >> state with simple steps. >> >>> I want to ask opinion of community about the best way of fixing this >>> problem. Should we avoid storing duplicated XIDs in procarray (by >>> invalidating XID in original pgaxct) or eliminate/change check for >>> duplicate in KnownAssignedXidsAdd (for example just ignore >>> duplicates)? >> Hmmmmm... Please let me think through that first. It seems to me >that >> the record should not be generated to begin with. At least I am able >to >> confirm what you see. > >The simplest way to fix the problem is to ignore duplicates before >adding them to KnownAssignedXids. >We in any case perform sort i this place... I vehemently object to that as the proper course. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On October 8, 2018 2:04:28 AM PDT, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > > >On 05.10.2018 11:04, Michael Paquier wrote: >> On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote: >>> As you can notice, XID 2004495308 is encountered twice which cause >error in >>> KnownAssignedXidsAdd: >>> >>> if (head > tail && >>> TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1], >from_xid)) >>> { >>> KnownAssignedXidsDisplay(LOG); >>> elog(ERROR, "out-of-order XID insertion in >KnownAssignedXids"); >>> } >>> >>> The probability of this error is very small but it can quite easily >>> reproduced: you should just set breakpoint in debugger after calling >>> MarkAsPrepared in twophase.c and then try to prepare any >transaction. >>> MarkAsPrepared will add GXACT to proc array and at this moment >there will >>> be two entries in procarray with the same XID: >>> >>> [snip] >>> >>> Now generated RUNNING_XACTS record contains duplicated XIDs. >> So, I have been doing exactly that, and if you trigger a manual >> checkpoint then things happen quite correctly if you let the first >> session finish: >> rmgr: Standby len (rec/tot): 58/ 58, tx: 0, lsn: >> 0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608 >> latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606 >> >> If you still maintain the debugger after calling MarkAsPrepared, then >> the manual checkpoint would block. Now if you actually keep the >> debugger, and wait for a checkpoint timeout to happen, then I can see >> the incorrect record. It is impressive that your customer has been >able >> to see that first, and then that you have been able to get into that >> state with simple steps. >> >>> I want to ask opinion of community about the best way of fixing this >>> problem. Should we avoid storing duplicated XIDs in procarray (by >>> invalidating XID in original pgaxct) or eliminate/change check for >>> duplicate in KnownAssignedXidsAdd (for example just ignore >>> duplicates)? >> Hmmmmm... Please let me think through that first. It seems to me >that >> the record should not be generated to begin with. At least I am able >to >> confirm what you see. > >The simplest way to fix the problem is to ignore duplicates before >adding them to KnownAssignedXids. >We in any case perform sort i this place... I vehemently object to that as the proper course. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On 08.10.2018 18:24, Andres Freund wrote: > > On October 8, 2018 2:04:28 AM PDT, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: >> >> On 05.10.2018 11:04, Michael Paquier wrote: >>> On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote: >>>> As you can notice, XID 2004495308 is encountered twice which cause >> error in >>>> KnownAssignedXidsAdd: >>>> >>>> if (head > tail && >>>> TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1], >> from_xid)) >>>> { >>>> KnownAssignedXidsDisplay(LOG); >>>> elog(ERROR, "out-of-order XID insertion in >> KnownAssignedXids"); >>>> } >>>> >>>> The probability of this error is very small but it can quite easily >>>> reproduced: you should just set breakpoint in debugger after calling >>>> MarkAsPrepared in twophase.c and then try to prepare any >> transaction. >>>> MarkAsPrepared will add GXACT to proc array and at this moment >> there will >>>> be two entries in procarray with the same XID: >>>> >>>> [snip] >>>> >>>> Now generated RUNNING_XACTS record contains duplicated XIDs. >>> So, I have been doing exactly that, and if you trigger a manual >>> checkpoint then things happen quite correctly if you let the first >>> session finish: >>> rmgr: Standby len (rec/tot): 58/ 58, tx: 0, lsn: >>> 0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608 >>> latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606 >>> >>> If you still maintain the debugger after calling MarkAsPrepared, then >>> the manual checkpoint would block. Now if you actually keep the >>> debugger, and wait for a checkpoint timeout to happen, then I can see >>> the incorrect record. It is impressive that your customer has been >> able >>> to see that first, and then that you have been able to get into that >>> state with simple steps. >>> >>>> I want to ask opinion of community about the best way of fixing this >>>> problem. Should we avoid storing duplicated XIDs in procarray (by >>>> invalidating XID in original pgaxct) or eliminate/change check for >>>> duplicate in KnownAssignedXidsAdd (for example just ignore >>>> duplicates)? >>> Hmmmmm... Please let me think through that first. It seems to me >> that >>> the record should not be generated to begin with. At least I am able >> to >>> confirm what you see. >> The simplest way to fix the problem is to ignore duplicates before >> adding them to KnownAssignedXids. >> We in any case perform sort i this place... > I vehemently object to that as the proper course. And what about adding qsort to GetRunningTransactionData or LogCurrentRunningXacts and excluding duplicates here? > Andres -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 08.10.2018 18:24, Andres Freund wrote: > > On October 8, 2018 2:04:28 AM PDT, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: >> >> On 05.10.2018 11:04, Michael Paquier wrote: >>> On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote: >>>> As you can notice, XID 2004495308 is encountered twice which cause >> error in >>>> KnownAssignedXidsAdd: >>>> >>>> if (head > tail && >>>> TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1], >> from_xid)) >>>> { >>>> KnownAssignedXidsDisplay(LOG); >>>> elog(ERROR, "out-of-order XID insertion in >> KnownAssignedXids"); >>>> } >>>> >>>> The probability of this error is very small but it can quite easily >>>> reproduced: you should just set breakpoint in debugger after calling >>>> MarkAsPrepared in twophase.c and then try to prepare any >> transaction. >>>> MarkAsPrepared will add GXACT to proc array and at this moment >> there will >>>> be two entries in procarray with the same XID: >>>> >>>> [snip] >>>> >>>> Now generated RUNNING_XACTS record contains duplicated XIDs. >>> So, I have been doing exactly that, and if you trigger a manual >>> checkpoint then things happen quite correctly if you let the first >>> session finish: >>> rmgr: Standby len (rec/tot): 58/ 58, tx: 0, lsn: >>> 0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608 >>> latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606 >>> >>> If you still maintain the debugger after calling MarkAsPrepared, then >>> the manual checkpoint would block. Now if you actually keep the >>> debugger, and wait for a checkpoint timeout to happen, then I can see >>> the incorrect record. It is impressive that your customer has been >> able >>> to see that first, and then that you have been able to get into that >>> state with simple steps. >>> >>>> I want to ask opinion of community about the best way of fixing this >>>> problem. Should we avoid storing duplicated XIDs in procarray (by >>>> invalidating XID in original pgaxct) or eliminate/change check for >>>> duplicate in KnownAssignedXidsAdd (for example just ignore >>>> duplicates)? >>> Hmmmmm... Please let me think through that first. It seems to me >> that >>> the record should not be generated to begin with. At least I am able >> to >>> confirm what you see. >> The simplest way to fix the problem is to ignore duplicates before >> adding them to KnownAssignedXids. >> We in any case perform sort i this place... > I vehemently object to that as the proper course. And what about adding qsort to GetRunningTransactionData or LogCurrentRunningXacts and excluding duplicates here? > Andres -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 2018-10-08 18:28:52 +0300, Konstantin Knizhnik wrote: > > > On 08.10.2018 18:24, Andres Freund wrote: > > > > On October 8, 2018 2:04:28 AM PDT, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > > > > > > On 05.10.2018 11:04, Michael Paquier wrote: > > > > On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote: > > > > > As you can notice, XID 2004495308 is encountered twice which cause > > > error in > > > > > KnownAssignedXidsAdd: > > > > > > > > > > if (head > tail && > > > > > TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1], > > > from_xid)) > > > > > { > > > > > KnownAssignedXidsDisplay(LOG); > > > > > elog(ERROR, "out-of-order XID insertion in > > > KnownAssignedXids"); > > > > > } > > > > > > > > > > The probability of this error is very small but it can quite easily > > > > > reproduced: you should just set breakpoint in debugger after calling > > > > > MarkAsPrepared in twophase.c and then try to prepare any > > > transaction. > > > > > MarkAsPrepared will add GXACT to proc array and at this moment > > > there will > > > > > be two entries in procarray with the same XID: > > > > > > > > > > [snip] > > > > > > > > > > Now generated RUNNING_XACTS record contains duplicated XIDs. > > > > So, I have been doing exactly that, and if you trigger a manual > > > > checkpoint then things happen quite correctly if you let the first > > > > session finish: > > > > rmgr: Standby len (rec/tot): 58/ 58, tx: 0, lsn: > > > > 0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608 > > > > latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606 > > > > > > > > If you still maintain the debugger after calling MarkAsPrepared, then > > > > the manual checkpoint would block. Now if you actually keep the > > > > debugger, and wait for a checkpoint timeout to happen, then I can see > > > > the incorrect record. It is impressive that your customer has been > > > able > > > > to see that first, and then that you have been able to get into that > > > > state with simple steps. > > > > > > > > > I want to ask opinion of community about the best way of fixing this > > > > > problem. Should we avoid storing duplicated XIDs in procarray (by > > > > > invalidating XID in original pgaxct) or eliminate/change check for > > > > > duplicate in KnownAssignedXidsAdd (for example just ignore > > > > > duplicates)? > > > > Hmmmmm... Please let me think through that first. It seems to me > > > that > > > > the record should not be generated to begin with. At least I am able > > > to > > > > confirm what you see. > > > The simplest way to fix the problem is to ignore duplicates before > > > adding them to KnownAssignedXids. > > > We in any case perform sort i this place... > > I vehemently object to that as the proper course. > And what about adding qsort to GetRunningTransactionData or > LogCurrentRunningXacts and excluding duplicates here? Sounds less terrible, but still pretty bad. I think we should fix the underlying data inconsistency, not paper over it a couple hundred meters away. Greetings, Andres Freund
On 2018-10-08 18:28:52 +0300, Konstantin Knizhnik wrote: > > > On 08.10.2018 18:24, Andres Freund wrote: > > > > On October 8, 2018 2:04:28 AM PDT, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > > > > > > On 05.10.2018 11:04, Michael Paquier wrote: > > > > On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote: > > > > > As you can notice, XID 2004495308 is encountered twice which cause > > > error in > > > > > KnownAssignedXidsAdd: > > > > > > > > > > if (head > tail && > > > > > TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1], > > > from_xid)) > > > > > { > > > > > KnownAssignedXidsDisplay(LOG); > > > > > elog(ERROR, "out-of-order XID insertion in > > > KnownAssignedXids"); > > > > > } > > > > > > > > > > The probability of this error is very small but it can quite easily > > > > > reproduced: you should just set breakpoint in debugger after calling > > > > > MarkAsPrepared in twophase.c and then try to prepare any > > > transaction. > > > > > MarkAsPrepared will add GXACT to proc array and at this moment > > > there will > > > > > be two entries in procarray with the same XID: > > > > > > > > > > [snip] > > > > > > > > > > Now generated RUNNING_XACTS record contains duplicated XIDs. > > > > So, I have been doing exactly that, and if you trigger a manual > > > > checkpoint then things happen quite correctly if you let the first > > > > session finish: > > > > rmgr: Standby len (rec/tot): 58/ 58, tx: 0, lsn: > > > > 0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608 > > > > latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606 > > > > > > > > If you still maintain the debugger after calling MarkAsPrepared, then > > > > the manual checkpoint would block. Now if you actually keep the > > > > debugger, and wait for a checkpoint timeout to happen, then I can see > > > > the incorrect record. It is impressive that your customer has been > > > able > > > > to see that first, and then that you have been able to get into that > > > > state with simple steps. > > > > > > > > > I want to ask opinion of community about the best way of fixing this > > > > > problem. Should we avoid storing duplicated XIDs in procarray (by > > > > > invalidating XID in original pgaxct) or eliminate/change check for > > > > > duplicate in KnownAssignedXidsAdd (for example just ignore > > > > > duplicates)? > > > > Hmmmmm... Please let me think through that first. It seems to me > > > that > > > > the record should not be generated to begin with. At least I am able > > > to > > > > confirm what you see. > > > The simplest way to fix the problem is to ignore duplicates before > > > adding them to KnownAssignedXids. > > > We in any case perform sort i this place... > > I vehemently object to that as the proper course. > And what about adding qsort to GetRunningTransactionData or > LogCurrentRunningXacts and excluding duplicates here? Sounds less terrible, but still pretty bad. I think we should fix the underlying data inconsistency, not paper over it a couple hundred meters away. Greetings, Andres Freund
On Mon, Oct 08, 2018 at 09:30:49AM -0700, Andres Freund wrote: > Sounds less terrible, but still pretty bad. I think we should fix the > underlying data inconsistency, not paper over it a couple hundred meters > away. +1. I am looking at how to make that possible. -- Michael
Вложения
On Mon, Oct 08, 2018 at 09:30:49AM -0700, Andres Freund wrote: > Sounds less terrible, but still pretty bad. I think we should fix the > underlying data inconsistency, not paper over it a couple hundred meters > away. +1. I am looking at how to make that possible. -- Michael
On Tue, Oct 09, 2018 at 02:59:00PM +0900, Michael Paquier wrote: > +1. I am looking at how to make that possible. And so... Going through a bit of investigation the problem is that each 2PC transaction preparing finishes by putting the procarray in a state where there are two entries referring to the same transaction running as MarkAsPrepared adds an extra entry on the top of the one already existing. This is expected though per the comments in ProcArrayClearTransaction(), as it assumes that MyProc can be safely cleaned and it assumes that there is a duplicated entry. Then, GetRunningTransactionData() ignores the assumption that those duplicate entries can exist, which makes the snapshot data taken as incorrect, generating those incorrect XLOG_RUNNING_XACT records. The most simple fix I can think of first is to tweak the origin of the problem in GetRunningTransactionData() so as those duplicate XIDs are ignored if found as there has to be a state between the time MarkAsPrepared() inserted the 2PC entry in the procarray and the time ProcArrayClearTransaction() gets called. -- Michael
On Tue, Oct 09, 2018 at 02:59:00PM +0900, Michael Paquier wrote: > +1. I am looking at how to make that possible. And so... Going through a bit of investigation the problem is that each 2PC transaction preparing finishes by putting the procarray in a state where there are two entries referring to the same transaction running as MarkAsPrepared adds an extra entry on the top of the one already existing. This is expected though per the comments in ProcArrayClearTransaction(), as it assumes that MyProc can be safely cleaned and it assumes that there is a duplicated entry. Then, GetRunningTransactionData() ignores the assumption that those duplicate entries can exist, which makes the snapshot data taken as incorrect, generating those incorrect XLOG_RUNNING_XACT records. The most simple fix I can think of first is to tweak the origin of the problem in GetRunningTransactionData() so as those duplicate XIDs are ignored if found as there has to be a state between the time MarkAsPrepared() inserted the 2PC entry in the procarray and the time ProcArrayClearTransaction() gets called. -- Michael
Вложения
On 09.10.2018 10:52, Michael Paquier wrote:
On Tue, Oct 09, 2018 at 02:59:00PM +0900, Michael Paquier wrote:+1. I am looking at how to make that possible.And so... Going through a bit of investigation the problem is that each 2PC transaction preparing finishes by putting the procarray in a state where there are two entries referring to the same transaction running as MarkAsPrepared adds an extra entry on the top of the one already existing. This is expected though per the comments in ProcArrayClearTransaction(), as it assumes that MyProc can be safely cleaned and it assumes that there is a duplicated entry. Then, GetRunningTransactionData() ignores the assumption that those duplicate entries can exist, which makes the snapshot data taken as incorrect, generating those incorrect XLOG_RUNNING_XACT records. The most simple fix I can think of first is to tweak the origin of the problem in GetRunningTransactionData() so as those duplicate XIDs are ignored if found as there has to be a state between the time MarkAsPrepared() inserted the 2PC entry in the procarray and the time ProcArrayClearTransaction() gets called. -- Michael
Right now GetRunningTransactionData is used only once in Postgres code - in LogStandbySnapshot to log RUNNING_XACTS record.
So if this situation is not changed, there will be no difference whether to perform sort and exclude duplicates in LogStandbySnapsho
or GetRunningTransactionData.
But GetRunningTransactionData may be used in some other cases... For example I have used it in my snapfs Postgres extensions (fast file-level snapshots)
to check if there are no more active transactions in the system. In my case presence of duplicates is not important and ordering of XIDs is not needed.
But performance of GetRunningTransactionData is also not critical. But I can suspect that there can be cases when it is critical and doing unneeded qsort is not desired.
There may be thousands of active transactions and sorting their XIDs is not instantaneous.
Also please take in account that in most cases XIDs in RUNNINGS_XACTS record are not processed at all (only if we perform recovery from backup and do not reach consistent point yet). This is why this bug was not detected before.
So I completely understand the argument that it is better to eliminate source of the problem (presence of duplicates in RUNNING_XACTS record)
but not sure that in this particular case it is really the best solution. If presence of duplicates is considered to be acceptable for procarray, why we
can not accept it for RUNNING_XACTS record?
I am not insisting that skipping duplicates in ProcArrayApplyRecoveryInfo is the right place (but at least is seems to be the most efficient alternative). But I also do not fill that moving sort to GetRunningTransactionData and elimination of duplicates here (requires one more scan and copying of the whole array) is principally better...-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Tue, Oct 09, 2018 at 05:26:49PM +0300, Konstantin Knizhnik wrote: > But GetRunningTransactionData may be used in some other cases... For > example I have used it in my snapfs Postgres extensions (fast > file-level snapshots) to check if there are no more active > transactions in the system. In my case presence of duplicates is not > important and ordering of XIDs is not needed. But performance of > GetRunningTransactionData is also not critical. But I can suspect that > there can be cases when it is critical and doing unneeded qsort is not > desired. There may be thousands of active transactions and sorting > their XIDs is notinstantaneous. I am not sure if the performance argument is actually this much sensible, it could be as you say, but another thing that we could argue about is that the presence of duplicate entries in GetRunningTransactionData() can be used as a point to understand that 2PC transactions are running, and that among the two, one of them is a dummy entry while the other is pending for being cleared. > So I completely understand the argument that it is better to eliminate > source of the problem (presence of duplicates in RUNNING_XACTS record) > but not sure that in this particular case it is really the best > solution. If presence of duplicates is considered to be acceptable for > procarray, why we can not accept it for RUNNING_XACTS record? This won't solve the case where records have already been generated and won't be processed correctly, but based on the chances this can happen we can rather safely say that fixing only the source would be fine. > I am not insisting that skipping duplicates in ProcArrayApplyRecoveryInfo is > the right place (but at least is seems to be the most efficient > alternative). But I also do not fill that moving sort to > GetRunningTransactionData and elimination of duplicates here (requires one > more scan and copying of the whole array) is principally better... Making recovery a couple of instructions shorter is always a good thing in my opinion. Users don't care much if backups take a long time, ans sweat less if restores take a shorter time. Hence what about doing roughly the following: 1) Document that GetRunningTransactionData() fetches information also about 2PC entries, like that: * GetRunningTransactionData -- returns information about running transactions. * * Similar to GetSnapshotData but returns more information. We include - * all PGXACTs with an assigned TransactionId, even VACUUM processes. + * all PGXACTs with an assigned TransactionId, even VACUUM processes and + * dummy two-phase related entries created when preparing the transaction. 2) Update LogStandbySnapshot so as the list of XIDs fetched is sorted and ordered. This can be used as a sanity check for recovery via ProcArrayApplyRecoveryInfo(). -- Michael
Вложения
On Wed, Oct 10, 2018 at 11:22:45AM +0900, Michael Paquier wrote: > I am not sure if the performance argument is actually this much > sensible, it could be as you say, but another thing that we could argue > about is that the presence of duplicate entries in > GetRunningTransactionData() can be used as a point to understand that > 2PC transactions are running, and that among the two, one of them is a > dummy entry while the other is pending for being cleared. Actually there would be a performance impact for any deployments if we were to do so, as ProcArrayLock is hold and we would need to scan 4 times procArray instead of twice. Many people already complain (justly) that ProcArray is a performance bottleneck, it would be a bad idea to make things worse... > 1) Document that GetRunningTransactionData() fetches information also > about 2PC entries, like that: > * GetRunningTransactionData -- returns information about running transactions. > * > * Similar to GetSnapshotData but returns more information. We include > - * all PGXACTs with an assigned TransactionId, even VACUUM processes. > + * all PGXACTs with an assigned TransactionId, even VACUUM processes and > + * dummy two-phase related entries created when preparing the transaction. > > 2) Update LogStandbySnapshot so as the list of XIDs fetched is sorted > and ordered. This can be used as a sanity check for recovery via > ProcArrayApplyRecoveryInfo(). This also would increase the pressure on ProcArrayLock with wal_level = logical as the WAL record needs to be included while holding the lock. So let's do as Konstantin is suggesting by skipping duplicated XIDs after applying the qsort(). The current code is also doing a bad documentation job, so we should mentioned that GetRunningTransactionData may return duplicated XIDs because of the dummy 2PC entries which overlap with the active ones, and also add a proper comment in ProcArrayApplyRecoveryInfo(). Konstantin, do you want to give it a try with a patch? Or should I? -- Michael
Вложения
On 11.10.2018 12:06, Michael Paquier wrote: > On Wed, Oct 10, 2018 at 11:22:45AM +0900, Michael Paquier wrote: >> I am not sure if the performance argument is actually this much >> sensible, it could be as you say, but another thing that we could argue >> about is that the presence of duplicate entries in >> GetRunningTransactionData() can be used as a point to understand that >> 2PC transactions are running, and that among the two, one of them is a >> dummy entry while the other is pending for being cleared. > Actually there would be a performance impact for any deployments if we > were to do so, as ProcArrayLock is hold and we would need to scan 4 > times procArray instead of twice. Many people already complain (justly) > that ProcArray is a performance bottleneck, it would be a bad idea to > make things worse... > >> 1) Document that GetRunningTransactionData() fetches information also >> about 2PC entries, like that: >> * GetRunningTransactionData -- returns information about running transactions. >> * >> * Similar to GetSnapshotData but returns more information. We include >> - * all PGXACTs with an assigned TransactionId, even VACUUM processes. >> + * all PGXACTs with an assigned TransactionId, even VACUUM processes and >> + * dummy two-phase related entries created when preparing the transaction. >> >> 2) Update LogStandbySnapshot so as the list of XIDs fetched is sorted >> and ordered. This can be used as a sanity check for recovery via >> ProcArrayApplyRecoveryInfo(). > This also would increase the pressure on ProcArrayLock with wal_level = > logical as the WAL record needs to be included while holding the lock. > So let's do as Konstantin is suggesting by skipping duplicated XIDs > after applying the qsort(). The current code is also doing a bad > documentation job, so we should mentioned that GetRunningTransactionData > may return duplicated XIDs because of the dummy 2PC entries which > overlap with the active ones, and also add a proper comment in > ProcArrayApplyRecoveryInfo(). Konstantin, do you want to give it a try > with a patch? Or should I? > -- > Michael Proposed patch is attached. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Вложения
On Thu, Oct 11, 2018 at 08:04:11PM +0300, Konstantin Knizhnik wrote: > Proposed patch is attached. The problem I have with this patch doing the duplication removal and qsort work in LogCurrentRunningXacts is that it would still lock ProcArrayLock until the WAL record has been written with wal_level = logical. With many sessions and 2PC transactions running around, this would be a performance impact for any deployments every time a checkpoint happens or every time the bgwriter decide to log a standby snapshot. Hence I would go instead with the attached, which does the legwork at recovery, which is a one-time code path as you mentioned. Okay, this makes the recovery a bit longer but that's way better than impacting all deployments of Postgres, even those not using 2PC when normally running. And as the sorting phase already happens we just need to do something like the attached. One thing that we could also do for HEAD is to add in RunningTransactionsData if the transaction comes from a 2PC entry, and allow recovery to do more sanity checks. This would require a WAL format change. The proposed patch needs to be back-patched down to 9.3. -- Michael