BUG #17580: use pg_terminate_backend to terminate a wal sender process may wait a long time

Поиск
Список
Период
Сортировка
От PG Bug reporting form
Тема BUG #17580: use pg_terminate_backend to terminate a wal sender process may wait a long time
Дата
Msg-id 17580-849c1d5b6d7eb422@postgresql.org
обсуждение исходный текст
Ответы Re: BUG #17580: use pg_terminate_backend to terminate a wal sender process may wait a long time  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17580
Logged by:          whale song
Email address:      13207961@qq.com
PostgreSQL version: 10.5
Operating system:   centos
Description:

1) Create a table for test
create table test666(id int primary key, number int);

2) Use pg_recvlogical to subscribe
2.1) Create a replicate slot
pg_recvlogical -h 9.223.28.176 -p 11002 -d test -U tbase --slot
wal2json_pg_test02 --create-slot -P wal2json
2.2) Start to subscribe, and use filter-tables to filter test table
test666
pg_recvlogical -h 9.223.28.176 -p 11002 -d test -U tbase --slot
wal2json_pg_test02 --start -o pretty-print=1 -o format-version=2 -o
filter-tables="public.test666"  -f -

3) Insert a lot of data to the table in one transaction
insert into test666 values (generate_series(1000000001,2000000000),
generate_series(1000000001,2000000000));
It may cause a long time, maybe 50 minutes.

4)  Use pg_replication_slots to get wal sender process pid(active_pid)
select *,pg_current_wal_lsn() from pg_replication_slots ;

5) When step 3 finish, use pstack to watch wal sender process stack, when it
is loop in ReorderBufferCommit, kill pg_recvlogical which start in step
2.2
 pstack 80743
#0 0x0000000000c400d9 in ResourceArrayAdd (resarr=0x1576970,
value=139941599240896) at resowner.c:359
#1 0x0000000000c3fef4 in ResourceArrayEnlarge (resarr=0x1576970) at
resowner.c:252
#2 0x0000000000c411f4 in ResourceOwnerEnlargeRelationRefs (owner=0x15768f0)
at resowner.c:1129
#3 0x0000000000be0899 in RelationIncrementReferenceCount
(rel=0x7f46b14fe2c0) at relcache.c:2589
#4 0x0000000000be05da in RelationIdGetRelation (relationId=1177785) at
relcache.c:2451
#5 0x00007f4e0d649b9e in pg_decode_write_tuple (ctx=0x13fc5a8,
relation=0x7f46b14fcf38, tuple=0x315b910, kind=PGOUTPUTJSON_IDENTITY) at
wal2json.c:1503
#6 0x00007f4e0d64a514 in pg_decode_write_change (ctx=0x13fc5a8,
txn=0x15957a8, relation=0x7f46b14fcf38, change=0x33bc9f8) at
wal2json.c:1747
#7 0x00007f4e0d64a9db in pg_decode_change_v2 (ctx=0x13fc5a8, txn=0x15957a8,
relation=0x7f46b14fcf38, change=0x33bc9f8) at wal2json.c:1865
#8 0x00007f4e0d648b2b in pg_decode_change (ctx=0x13fc5a8, txn=0x15957a8,
relation=0x7f46b14fcf38, change=0x33bc9f8) at wal2json.c:1098
#9 0x00000000009cf3d1 in change_cb_wrapper (cache=0x1410fb8, txn=0x15957a8,
relation=0x7f46b14fcf38, change=0x33bc9f8) at logical.c:730
#10 0x00000000009d75df in ReorderBufferCommit (rb=0x1410fb8, xid=174028801,
commit_lsn=44579135148872, end_lsn=44579135148936,
commit_time=712222669188947, origin_id=0, origin_lsn=0) at
reorderbuffer.c:1445
#11 0x00000000009cb15b in DecodeCommit (ctx=0x13fc5a8, buf=0x7fff2574d600,
parsed=0x7fff2574d530, xid=174028801) at decode.c:713
#12 0x00000000009ca76b in DecodeXactOp (ctx=0x13fc5a8, buf=0x7fff2574d600)
at decode.c:335
#13 0x00000000009ca401 in LogicalDecodingProcessRecord (ctx=0x13fc5a8,
record=0x13fc880) at decode.c:175
#14 0x00000000009ead45 in XLogSendLogical () at walsender.c:3152
#15 0x00000000009ea09f in WalSndLoop (send_data=0x9eaca9 <XLogSendLogical>)
at walsender.c:2520
#16 0x00000000009e8aed in StartLogicalReplication (cmd=0x151bb48) at
walsender.c:1445
#17 0x00000000009e944e in exec_replication_command (cmd_string=0x1520408
"START_REPLICATION SLOT subs_f524p217b7_dn0001 LOGICAL 285C/F8FFEA50 (
\"format-version\" '2' )") at walsender.c:1887
#18 0x0000000000a71ed8 in PostgresMain (argc=1, argv=0x1419f58,
dbname=0x1419f28 "prd_jxcard_lingwang", username=0x1419f08 "mingchen_read")
at postgres.c:6118
#19 0x00000000009b0faa in BackendRun (port=0x14135b0) at postmaster.c:5071
#20 0x00000000009b0753 in BackendStartup (port=0x14135b0) at
postmaster.c:4743
#21 0x00000000009ac3ae in ServerLoop () at postmaster.c:1995
#22 0x00000000009aba35 in PostmasterMain (argc=5, argv=0x13dc770) at
postmaster.c:1603
#23 0x0000000000878c44 in main (argc=5, argv=0x13dc770) at main.c:233

6) Use pg_replication_slots to query the replication slots infomation, the
active_pid is still active. Use pg_terminate_backend to terminate
active_pid, it returns true, but the active_pid is still active.

7) The active_pid wal sender process exit after a long time waiting

I think maybe we should add a CHECK_FOR_INTERRUPTS in the loop code of the
ReorderBufferCommit function.
Likes:

void
ReorderBufferCommit(…………)
{
    …………


    PG_TRY();
    {
        …………

        while ((change = ReorderBufferIterTXNNext(rb, iterstate)) != NULL)
        {
            Relation    relation = NULL;
            Oid            reloid;

            /* Add a CHECK_FOR_INTERRUPTS here? */

            …………
        }

    …………
}


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: BUG #17574: Attaching an invalid index to partition head make head index invalid forever
Следующее
От: Richard Guo
Дата:
Сообщение: Re: foreign join error "variable not found in subplan target list"