[HACKERS] Multiple table synchronizations are processed serially

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема [HACKERS] Multiple table synchronizations are processed serially
Дата
Msg-id CAD21AoC2KJdavS7MFffmSsRc1dn3Vg_0xmuc=UpBrZ-_MUxh-Q@mail.gmail.com
обсуждение исходный текст
Ответы Re: [HACKERS] Multiple table synchronizations are processed serially  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Список pgsql-hackers
Hi,

While testing table synchronization in logical replication, I found
that multiple table synchronizations for a subscription are processed
serially due to lock wait.

I setup pgbench tables on publisher (scalefactor = 1000) and on
subscriber, and truncated these tables on subscriber, and then created
publication for all pgbench tables. When I created a subscription I
got the following log messages.

2017-05-19 13:12:56.311 JST [54970] LOG:  logical replication apply
for subscription hoge_sub started
2017-05-19 13:12:56.314 JST [54970] LOG:  starting logical replication
worker for subscription "hoge_sub"
2017-05-19 13:12:56.317 JST [54972] LOG:  logical replication sync for
subscription hoge_sub, table pgbench_accounts started
2017-05-19 13:12:57.315 JST [54970] LOG:  starting logical replication
worker for subscription "hoge_sub"
2017-05-19 13:12:57.318 JST [54974] LOG:  logical replication sync for
subscription hoge_sub, table pgbench_branches started
2017-05-19 13:12:58.317 JST [54970] LOG:  starting logical replication
worker for subscription "hoge_sub"
2017-05-19 13:12:58.320 JST [54976] LOG:  logical replication sync for
subscription hoge_sub, table pgbench_history started
2017-05-19 13:12:59.319 JST [54970] LOG:  starting logical replication
worker for subscription "hoge_sub"
2017-05-19 13:12:59.322 JST [54978] LOG:  logical replication sync for
subscription hoge_sub, table pgbench_tellers started

Seems all four table sync workers are launched at the almost same
time, but three workers of them get stuck in idle transaction state
when creating replication slot. That is these waiting workers cannot
proceed its work until first connected table sync worker finishes. ps
command shows the followings.

54898 ?        Ss     0:00 postgres: bgworker: logical replication launcher
54970 ?        Ss     0:00 postgres: bgworker: logical replication
worker for subscription 16432
54971 ?        Ss     0:00 postgres: wal sender process masahiko [local] idle
54972 ?        Rs    10:38 postgres: bgworker: logical replication
worker for subscription 16432 sync 16395
54973 ?        Ss     1:52 postgres: wal sender process masahiko [local] COPY
54974 ?        Ss     0:00 postgres: bgworker: logical replication
worker for subscription 16432 sync 16398
54975 ?        Ss     0:00 postgres: wal sender process masahiko
[local] idle in transaction waiting
54976 ?        Ss     0:00 postgres: bgworker: logical replication
worker for subscription 16432 sync 16389
54977 ?        Ss     0:00 postgres: wal sender process masahiko
[local] idle in transaction waiting
54978 ?        Ss     0:00 postgres: bgworker: logical replication
worker for subscription 16432 sync 16392
54979 ?        Ss     0:00 postgres: wal sender process masahiko
[local] idle in transaction waiting

Also, here is backtrace of the waiting wal sender process.

#0  0x00007f24b4030783 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1  0x000000000084c58d in WaitEventSetWaitBlock (set=0x25ed8e0,
cur_timeout=-1, occurred_events=0x7fff1ea322f0, nevents=1) at
latch.c:1045
#2  0x000000000084c454 in WaitEventSetWait (set=0x25ed8e0, timeout=-1,
occurred_events=0x7fff1ea322f0, nevents=1, wait_event_info=50331652)
at latch. c:997
#3  0x000000000084bbe5 in WaitLatchOrSocket (latch=0x7f24ad385024,
wakeEvents=1, sock=-1, timeout=-1, wait_event_info=50331652) at
latch.c:385
#4  0x000000000084bac6 in WaitLatch (latch=0x7f24ad385024,
wakeEvents=1, timeout=0, wait_event_info=50331652) at latch.c:339
#5  0x0000000000863632 in ProcSleep (locallock=0x25f2fb0,
lockMethodTable=0xbdb020) at proc.c:1255
#6  0x000000000085c83c in WaitOnLock (locallock=0x25f2fb0,
owner=0x2604dd8) at lock.c:1702
#7  0x000000000085b541 in LockAcquireExtended (locktag=0x7fff1ea32800,
lockmode=5, sessionLock=0 '\000', dontWait=0 '\000',
reportMemoryError=1 '\001\ ') at lock.c:998
#8  0x000000000085ab46 in LockAcquire (locktag=0x7fff1ea32800,
lockmode=5, sessionLock=0 '\000', dontWait=0 '\000') at lock.c:688
#9  0x0000000000859a16 in XactLockTableWait (xid=598, rel=0x0,
ctid=0x0, oper=XLTW_None) at lmgr.c:587
#10 0x000000000080d87a in SnapBuildWaitSnapshot (running=0x25ed8a8,
cutoff=599) at snapbuild.c:1400
#11 0x000000000080d64c in SnapBuildFindSnapshot (builder=0x26ab530,
lsn=14202458264, running=0x25ed8a8) at snapbuild.c:1311
#12 0x000000000080d1bb in SnapBuildProcessRunningXacts
(builder=0x26ab530, lsn=14202458264, running=0x25ed8a8) at
snapbuild.c:1106
#13 0x00000000007fb779 in DecodeStandbyOp (ctx=0x25f7090,
buf=0x7fff1ea32930) at decode.c:314
#14 0x00000000007fb28f in LogicalDecodingProcessRecord (ctx=0x25f7090,
record=0x25f7350) at decode.c:117
#15 0x00000000007ff0d5 in DecodingContextFindStartpoint
(ctx=0x25f7090) at logical.c:458
#16 0x00000000008141c8 in CreateReplicationSlot (cmd=0x25ecba8) at
walsender.c:941
#17 0x0000000000814ec5 in exec_replication_command
(cmd_string=0x26511e0 "CREATE_REPLICATION_SLOT
\"hoge_sub_16432_sync_16398\" TEMPORARY LOGICAL pgoutput
USE_SNAPSHOT") at walsender.c:1508
#18 0x000000000087d2d5 in PostgresMain (argc=1, argv=0x25fb2b8,
dbname=0x25fb100 "postgres", username=0x25fb0d8 "masahiko") at
postgres.c:4071
#19 0x00000000007e0ffd in BackendRun (port=0x25f37f0) at postmaster.c:4334
#20 0x00000000007e077d in BackendStartup (port=0x25f37f0) at postmaster.c:4006
#21 0x00000000007dcd86 in ServerLoop () at postmaster.c:1743
#22 0x00000000007dc44a in PostmasterMain (argc=5, argv=0x25cceb0) at
postmaster.c:1351
#23 0x00000000007184a2 in main (argc=5, argv=0x25cceb0) at main.c:228

It also blocks subscriptions that will be created later. I'd say it's
not an expected behavior. Any thoughts?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: [HACKERS] [POC] hash partitioning
Следующее
От: Amit Langote
Дата:
Сообщение: Re: [HACKERS] transition table behavior with inheritance appearsbroken (was: Declarative partitioning - another take)