Обсуждение: BUG #17054: Memory corruption in logical replication worker when replicating into partitioned table
BUG #17054: Memory corruption in logical replication worker when replicating into partitioned table
От
PG Bug reporting form
Дата:
The following bug has been logged on the website:
Bug reference: 17054
Logged by: Sergey Bernikov
Email address: sbernikov@gmail.com
PostgreSQL version: 13.3
Operating system: Ubuntu 18.04.4
Description:
When logical replication target is a partitioned table then execution of any
DDL on source table leads to crash of target (subscriber) server.
Steps to reproduce:
1. in source DB: create table and add to publication
create table test_replication (
id int not null,
value varchar(100),
primary key (id)
);
create publication test_publication for table test_replication;
2. in target DB: create partitioned table and start replication
create table test_replication (
id int not null,
value varchar(100),
primary key (id)
) partition by range (id);
create table test_replication_p_1 partition of test_replication
for values from (0) to (10);
create table test_replication_p_2 partition of test_replication
for values from (10) to (20);
create subscription test_subscription CONNECTION '...' publication
test_publication;
4. in source DB: insert and update data
insert into test_replication(id, value) values (1, 'a1');
insert into test_replication(id, value) values (2, 'a1');
insert into test_replication(id, value) values (3, 'a1');
update test_replication set value = 'a2';
5. in source DB: execute any DDL on the table
vacuum test_replication;
6. in source DB: update data
update test_replication set value = 'a3';
Result: logical replication worker on target server crashes with error
message:
LOG: background worker "logical replication worker" (PID 28356) was
terminated by signal 11: Segmentation fault
LOG: terminating any other active server processes
Backtrace from core dump:
Core was generated by `postgres: 13/main: logical replication worker for
subscription 781420 '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000557026391fef in slot_modify_cstrings
(slot=slot@entry=0x557026fa8298, srcslot=<optimized out>,
rel=rel@entry=0x557026ff7370, values=values@entry=0x7ffff4135550,
replaces=replaces@entry=0x7ffff4138950) at
./build/../src/backend/replication/logical/worker.c:434
434 ./build/../src/backend/replication/logical/worker.c: No such file or
directory.
(gdb) bt
#0 0x0000557026391fef in slot_modify_cstrings
(slot=slot@entry=0x557026fa8298, srcslot=<optimized out>,
rel=rel@entry=0x557026ff7370, values=values@entry=0x7ffff4135550,
replaces=replaces@entry=0x7ffff4138950) at
./build/../src/backend/replication/logical/worker.c:434
#1 0x0000557026392b9f in apply_handle_tuple_routing
(relinfo=0x557026f80928, estate=estate@entry=0x557026fae108,
remoteslot=remoteslot@entry=0x557026f813d8,
newtup=newtup@entry=0x7ffff4135550,
relmapentry=relmapentry@entry=0x557026f96d90,
operation=operation@entry=CMD_UPDATE) at
./build/../src/backend/replication/logical/worker.c:1105
#2 0x00005570263934df in apply_handle_update (s=s@entry=0x7ffff41390a0) at
./build/../src/backend/replication/logical/worker.c:791
#3 0x00005570263941c1 in apply_dispatch (s=0x7ffff41390a0) at
./build/../src/backend/replication/logical/worker.c:1368
#4 LogicalRepApplyLoop (last_received=936525246824) at
./build/../src/backend/replication/logical/worker.c:1577
#5 ApplyWorkerMain (main_arg=<optimized out>) at
./build/../src/backend/replication/logical/worker.c:2123
#6 0x00005570263613ae in StartBackgroundWorker () at
./build/../src/backend/postmaster/bgworker.c:879
#7 0x000055702636d5a3 in do_start_bgworker (rw=0x557026ec9110) at
./build/../src/backend/postmaster/postmaster.c:5870
#8 maybe_start_bgworkers () at
./build/../src/backend/postmaster/postmaster.c:6095
#9 0x000055702636e035 in sigusr1_handler (postgres_signal_arg=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:5255
#10 <signal handler called>
#11 0x00007f4bb7bbcdd7 in __GI___select (nfds=nfds@entry=10,
readfds=readfds@entry=0x7ffff4139870, writefds=writefds@entry=0x0,
exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7ffff41397d0)
at ../sysdeps/unix/sysv/linux/select.c:41
#12 0x000055702636e5f9 in ServerLoop () at
./build/../src/backend/postmaster/postmaster.c:1703
#13 0x0000557026370423 in PostmasterMain (argc=5, argv=<optimized out>) at
./build/../src/backend/postmaster/postmaster.c:1412
#14 0x00005570260c19f8 in main (argc=5, argv=0x557026e73fd0) at
./build/../src/backend/main/main.c:210
PG Bug reporting form <noreply@postgresql.org> writes:
> When logical replication target is a partitioned table then execution of any
> DDL on source table leads to crash of target (subscriber) server.
Thanks for the report! I duplicated the crash on v13 branch tip,
although it's hitting an assertion failure before reaching any segfault:
#2 0x00000000008f466a in ExceptionalCondition (
conditionName=conditionName@entry=0xa5eae0 "natts == rel->attrmap->maplen", errorType=errorType@entry=0x948cc9
"FailedAssertion",
fileName=fileName@entry=0xa52956 "worker.c",
lineNumber=lineNumber@entry=490) at assert.c:67
#3 0x0000000000777741 in slot_modify_cstrings (slot=slot@entry=0x2ec6e40,
srcslot=<optimized out>, rel=rel@entry=0x2eca918,
values=values@entry=0x7fffb3506480, replaces=replaces@entry=0x7fffb3509880)
at worker.c:490
#4 0x00000000007785e7 in apply_handle_tuple_routing (
edata=edata@entry=0x2ea45a0, remoteslot=remoteslot@entry=0x2ea48a0,
newtup=newtup@entry=0x7fffb3506480, operation=operation@entry=CMD_UPDATE)
at worker.c:1153
#5 0x0000000000778e74 in apply_handle_update (s=s@entry=0x7fffb3509fa0)
at worker.c:846
#6 0x000000000077963c in apply_dispatch (s=0x7fffb3509fa0) at worker.c:1415
#7 LogicalRepApplyLoop (last_received=254887792) at worker.c:1624
#8 ApplyWorkerMain (main_arg=<optimized out>) at worker.c:2171
#9 0x0000000000743ec9 in StartBackgroundWorker () at bgworker.c:890
Interestingly, the same test case does NOT crash for me on master.
So apparently we fixed something that should have been back-patched.
regards, tom lane
I wrote:
> PG Bug reporting form <noreply@postgresql.org> writes:
>> When logical replication target is a partitioned table then execution of any
>> DDL on source table leads to crash of target (subscriber) server.
> Thanks for the report! I duplicated the crash on v13 branch tip,
I can't reproduce this anymore after commit b270713fd. I think it's
probably the same thing I found while making a test for your other
report:
logicalrep_partition_open() failed to ensure that the
LogicalRepPartMapEntry it built for a partition was fully
independent of that for the partition root, leading to
trouble if the root entry was later freed or rebuilt.
My failure to see a crash on HEAD was probably an accidental
issue of memory reuse patterns.
regards, tom lane