PG 13.2 Parallel scan with SubTransGetTopmostTransaction assert coredump

Поиск
Список
Период
Сортировка
От Pengchengliu
Тема PG 13.2 Parallel scan with SubTransGetTopmostTransaction assert coredump
Дата
Msg-id 124c01d7425a$f6ac8430$e4058c90$@tju.edu.cn
обсуждение исходный текст
Список pgsql-hackers
Hi hackers,
   With PG 13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), I tested
subtransaction with parallel scan, I got a subtransaction coredump as below:

(gdb) bt
#0  0x00001517ce61f7ff in raise () from /lib64/libc.so.6
#1  0x00001517ce609c35 in abort () from /lib64/libc.so.6
#2  0x0000000000aaf93f in ExceptionalCondition (conditionName=0xb4c920
"TransactionIdFollowsOrEquals(xid, TransactionXmin)", errorType=0xb4c796
"FailedAssertion", 
    fileName=0xb4c738
"/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/
transam/subtrans.c", lineNumber=156)
    at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/utils/er
ror/assert.c:67
#3  0x0000000000563111 in SubTransGetTopmostTransaction (xid=196963)
    at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/t
ransam/subtrans.c:156
#4  0x0000000000b05206 in XidInMVCCSnapshot (xid=196963, snapshot=0x2f8ec58)
    at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/utils/ti
me/snapmgr.c:2293
#5  0x00000000004ff2bc in HeapTupleSatisfiesMVCC (htup=0x7ffc21807120,
snapshot=0x2f8ec58, buffer=1946)
    at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/h
eap/heapam_visibility.c:1073
#6  0x0000000000500363 in HeapTupleSatisfiesVisibility (tup=0x7ffc21807120,
snapshot=0x2f8ec58, buffer=1946)
    at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/h
eap/heapam_visibility.c:1695
#7  0x00000000004e423b in heapgetpage (sscan=0x2f8e840, page=1685) at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/h
eap/heapam.c:447
#8  0x00000000004e68bf in heapgettup_pagemode (scan=0x2f8e840,
dir=ForwardScanDirection, nkeys=0, key=0x0)
    at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/h
eap/heapam.c:1077
#9  0x00000000004e6eb6 in heap_getnextslot (sscan=0x2f8e840,
direction=ForwardScanDirection, slot=0x2fd8a38)
    at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/h
eap/heapam.c:1333
#10 0x000000000075350e in table_scan_getnextslot (sscan=0x2f8e840,
direction=ForwardScanDirection, slot=0x2fd8a38)
    at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/include/access/t
ableam.h:906
#11 0x00000000007535d6 in SeqNext (node=0x2fd86a8) at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor
/nodeSeqscan.c:80
#12 0x000000000071af3c in ExecScanFetch (node=0x2fd86a8, accessMtd=0x753542
<SeqNext>, recheckMtd=0x7535e7 <SeqRecheck>)
    at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor
/execScan.c:133
#13 0x000000000071afdd in ExecScan (node=0x2fd86a8, accessMtd=0x753542
<SeqNext>, recheckMtd=0x7535e7 <SeqRecheck>)
    at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor
/execScan.c:199
#14 0x0000000000753631 in ExecSeqScan (pstate=0x2fd86a8) at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor
/nodeSeqscan.c:112
#15 0x00000000007173af in ExecProcNodeFirst (node=0x2fd86a8) at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor
/execProcnode.c:450
#16 0x000000000070b7f7 in ExecProcNode (node=0x2fd86a8) at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/include/executor
/executor.h:248
#17 0x000000000070e302 in ExecutePlan (estate=0x2fd7ca0,
planstate=0x2fd86a8, use_parallel_mode=false, operation=CMD_SELECT,
sendTuples=true, numberTuples=0, 
    direction=ForwardScanDirection, dest=0x2f8ddd8, execute_once=true) at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor
/execMain.c:1632
#18 0x000000000070be22 in standard_ExecutorRun (queryDesc=0x2f8e7a8,
direction=ForwardScanDirection, count=0, execute_once=true)
    at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor
/execMain.c:350
#19 0x000000000070bc50 in ExecutorRun (queryDesc=0x2f8e7a8,
direction=ForwardScanDirection, count=0, execute_once=true)
    at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor
/execMain.c:294
#20 0x00000000007131d5 in ParallelQueryMain (seg=0x2ef30b8,
toc=0x1517cf85c000)
    at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor
/execParallel.c:1448
#21 0x000000000055f70c in ParallelWorkerMain (main_arg=897455922) at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/t
ransam/parallel.c:1470
#22 0x000000000086e255 in StartBackgroundWorker () at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmast
er/bgworker.c:879
#23 0x000000000088192c in do_start_bgworker (rw=0x2f221c0) at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmast
er/postmaster.c:5870
#24 0x0000000000881cd8 in maybe_start_bgworkers () at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmast
er/postmaster.c:6095
#25 0x0000000000880d14 in sigusr1_handler (postgres_signal_arg=10) at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmast
er/postmaster.c:5255
#26 <signal handler called>
#27 0x00001517ce6dc4bb in select () from /lib64/libc.so.6
#28 0x000000000087c867 in ServerLoop () at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmast
er/postmaster.c:1703
#29 0x000000000087c232 in PostmasterMain (argc=3, argv=0x2ef1070) at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmast
er/postmaster.c:1412
#30 0x0000000000783418 in main (argc=3, argv=0x2ef1070) at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/main/mai
n.c:210
(gdb) f 3
#3  0x0000000000563111 in SubTransGetTopmostTransaction (xid=196963)
    at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/t
ransam/subtrans.c:325
325             Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));
(gdb) p xid
$1 = 196963
(gdb) f 4
#4  0x0000000000b05206 in XidInMVCCSnapshot (xid=196963, snapshot=0x2f8ec58)
    at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/utils/ti
me/snapmgr.c:2293
2293                            xid = SubTransGetTopmostTransaction(xid);
(gdb) p *snapshot
$2 = {snapshot_type = SNAPSHOT_MVCC, xmin = 196962, xmax = 210314, xip =
0x2f8ecc0, xcnt = 82, subxip = 0x0, subxcnt = 0, suboverflowed = true, 
  takenDuringRecovery = false, copied = true, curcid = 14, speculativeToken
= 2139062143, active_count = 0, regd_count = 1, ph_node = {first_child =
0x0, 
    next_sibling = 0xf65ca0 <CatalogSnapshotData+64>, prev_or_parent =
0x2f8dbc8}, whenTaken = 0, lsn = 0}
(gdb) p TransactionXmin
$3 = 196992
(gdb) f 3
#3  0x0000000000563111 in SubTransGetTopmostTransaction (xid=196963)
    at
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/t
ransam/subtrans.c:325
325             Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));
(gdb) p xid
$4 = 196963
(gdb) p TransactionXmin
$5 = 196992

After simple analyze, I think it should be a bug.

1, Main process first get active snapshot(xmin 196962, xmax 210314), and
push active snapshot. 
2, Main process call InitializeParallelDSM, get transaction snapshot(xmin
196992, xmax 210320), and send this snapshot with parameters
PARALLEL_KEY_TRANSACTION_SNAPSHOT.
3, Main process call
ExecParallelInitializeDSM->ExecSeqScanInitializeDSM->table_parallelscan_init
ialize  SerializeSnapshot active snapshot(xmin 196962, xmax 210314)
4, Main process Lanch Parallel Workers process.

Parallel Workers process work flow:
1,
ParallelWorkerMain->BackgroundWorkerInitializeConnectionByOid->GetTransactio
nSnapshot(xmin 196992, xmax 210320) and set TransactionXmin to 196992.
2,
ParallelWorkerMain->RestoreTransactionSnapshot->SetTransactionSnapshot->Proc
ArrayInstallRestoredXmin and set TransactionXmin to 196992.
3,
ParallelWorkerMain->ExecParallelInitializeWorker->ExecSeqScanInitializeWorke
r->table_beginscan_parallel restore active snapshot(xmin 196962, xmax
210314) from main process.
4, Got coredump
ParallelWorkerMain->ParallelQueryMain->ExecutorRun...ExecScan->heapgetpage->
HeapTupleSatisfiesMVCC->XidInMVCCSnapshot->SubTransGetTopmostTransaction 

So the root cause is the Parallel Workers process set the TransactionXmin
with later transcation snapshot. When parallel scan, Parallel Workers
process use the older active snapshot.
It leads to subtrans assert coredump. I cannot get a better solution, Is
there any ideas?
    



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Smith
Дата:
Сообщение: Re: AlterSubscription_refresh "wrconn" wrong variable?
Следующее
От: Japin Li
Дата:
Сообщение: Re: AlterSubscription_refresh "wrconn" wrong variable?