Re: Segfault when a replica trigger tries to refresh subscription
От | Andres Freund |
---|---|
Тема | Re: Segfault when a replica trigger tries to refresh subscription |
Дата | |
Msg-id | 20201111215820.qihhrz7fayu6myfi@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Segfault when a replica trigger tries to refresh subscription (Jeremy Finzel <finzelj@gmail.com>) |
Список | pgsql-bugs |
Hi, On 2020-11-11 12:23:06 -0600, Jeremy Finzel wrote: > I have only attempted to reproduce this on pg 11.9. It's reproducible on HEAD. > Logs: > 2020-11-11 18:14:09.535 GMT,,,21636,,5fa58af6.5484,8583,,2020-11-06 > 17:42:14 GMT,,0,LOG,00000,"background worker ""logical replication worker"" > (PID 24492) wa > s terminated by signal 11: Segmentation fault",,,,,,,,,"" The backtrace is #0 __GI___libc_free (mem=0x7f7f7f7f7f7f7f7f) at malloc.c:3102 #1 0x00007ffbef93cea8 in PQfreemem (ptr=0x7f7f7f7f7f7f7f7f) at /home/andres/src/postgresql/src/interfaces/libpq/fe-exec.c:3298 #2 0x00007ffbf462e4f7 in libpqrcv_receive (conn=0x5637f8d91740, buffer=0x7ffd7d7fd770, wait_fd=0x7ffd7d7fd778) at /home/andres/src/postgresql/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c:735 #3 0x00005637f773ec76 in LogicalRepApplyLoop (last_received=23489672) at /home/andres/src/postgresql/src/backend/replication/logical/worker.c:2173 #4 0x00005637f7740733 in ApplyWorkerMain (main_arg=0) at /home/andres/src/postgresql/src/backend/replication/logical/worker.c:3091 #5 0x00005637f76f11dd in StartBackgroundWorker () at /home/andres/src/postgresql/src/backend/postmaster/bgworker.c:820 #6 0x00005637f770672d in do_start_bgworker (rw=0x5637f8caf500) at /home/andres/src/postgresql/src/backend/postmaster/postmaster.c:5837 #7 0x00005637f7706b01 in maybe_start_bgworkers () at /home/andres/src/postgresql/src/backend/postmaster/postmaster.c:6062 #8 0x00005637f7705a68 in sigusr1_handler (postgres_signal_arg=10) at /home/andres/src/postgresql/src/backend/postmaster/postmaster.c:5223 #9 <signal handler called> #10 0x00007ffbf395d7c6 in __GI___select (nfds=8, readfds=0x7ffd7d7fe770, writefds=0x0, exceptfds=0x0, timeout=0x7ffd7d7fe7f0) at ../sysdeps/unix/sysv/linux/select.c:41 #11 0x00005637f77012ca in ServerLoop () at /home/andres/src/postgresql/src/backend/postmaster/postmaster.c:1691 #12 0x00005637f7700c67 in PostmasterMain (argc=42, argv=0x5637f8c81e20) at /home/andres/src/postgresql/src/backend/postmaster/postmaster.c:1400 #13 0x00005637f7601934 in main (argc=42, argv=0x5637f8c81e20) at /home/andres/src/postgresql/src/backend/main/main.c:209 i.e. the connection is freed at this point. The problem seems to be that nobody thought about refresh being called in an apply worker. Adding a check for refresh overwriting the worker's connection yields: #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007f2242be5537 in __GI_abort () at abort.c:79 #2 0x000055e28c1a7f80 in errfinish (filename=0x7f2243987065 "libpqwalreceiver.c", lineno=706, funcname=0x7f2243987870 <__func__.5>"libpqrcv_disconnect") at /home/andres/src/postgresql/src/backend/utils/error/elog.c:592 #3 0x00007f224398647e in libpqrcv_disconnect (conn=0x55e28d2c4750) at /home/andres/src/postgresql/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c:706 #4 0x000055e28bd7aa4f in AlterSubscription_refresh (sub=0x55e28d2c3f40, copy_data=true) at /home/andres/src/postgresql/src/backend/commands/subscriptioncmds.c:583 #5 0x000055e28bd7b4fe in AlterSubscription (stmt=0x55e28d2e0188) at /home/andres/src/postgresql/src/backend/commands/subscriptioncmds.c:880 #6 0x000055e28c021e5e in ProcessUtilitySlow (pstate=0x55e28d2c3e28, pstmt=0x55e28d2e00f0, queryString=0x55e28d2bc448 "ALTER SUBSCRIPTION foo REFRESH PUBLICATION", context=PROCESS_UTILITY_QUERY, params=0x0, queryEnv=0x0, dest=0x55e28c4c5980 <spi_printtupDR>, qc=0x7ffe5de60840) at /home/andres/src/postgresql/src/backend/tcop/utility.c:1828 #7 0x000055e28c0204db in standard_ProcessUtility (pstmt=0x55e28d2e00f0, queryString=0x55e28d2bc448 "ALTER SUBSCRIPTION fooREFRESH PUBLICATION", context=PROCESS_UTILITY_QUERY, params=0x0, queryEnv=0x0, dest=0x55e28c4c5980 <spi_printtupDR>, qc=0x7ffe5de60840) at /home/andres/src/postgresql/src/backend/tcop/utility.c:1071 #8 0x000055e28c01f595 in ProcessUtility (pstmt=0x55e28d2e00f0, queryString=0x55e28d2bc448 "ALTER SUBSCRIPTION foo REFRESHPUBLICATION", context=PROCESS_UTILITY_QUERY, params=0x0, queryEnv=0x0, dest=0x55e28c4c5980 <spi_printtupDR>, qc=0x7ffe5de60840) at /home/andres/src/postgresql/src/backend/tcop/utility.c:524 #9 0x000055e28be3833e in _SPI_execute_plan (plan=0x55e28d2dfc50, paramLI=0x0, snapshot=0x0, crosscheck_snapshot=0x0, read_only=false,fire_triggers=true, tcount=0, caller_dest=0x0) at /home/andres/src/postgresql/src/backend/executor/spi.c:2443 #10 0x000055e28be34d29 in SPI_execute_plan_with_paramlist (plan=0x55e28d2dfc50, params=0x0, read_only=false, tcount=0) at /home/andres/src/postgresql/src/backend/executor/spi.c:578 #11 0x00007f223eb91611 in exec_stmt_execsql (estate=0x7ffe5de60ca0, stmt=0x55e28d2c7fa0) at /home/andres/src/postgresql/src/pl/plpgsql/src/pl_exec.c:4288 #12 0x00007f223eb8ccb6 in exec_stmts (estate=0x7ffe5de60ca0, stmts=0x55e28d2c7ff8) at /home/andres/src/postgresql/src/pl/plpgsql/src/pl_exec.c:2055 #13 0x00007f223eb8c88c in exec_stmt_block (estate=0x7ffe5de60ca0, block=0x55e28d2c80d0) at /home/andres/src/postgresql/src/pl/plpgsql/src/pl_exec.c:1906 #14 0x00007f223eb8c09c in exec_toplevel_block (estate=0x7ffe5de60ca0, block=0x55e28d2c80d0) at /home/andres/src/postgresql/src/pl/plpgsql/src/pl_exec.c:1604 #15 0x00007f223eb8abe9 in plpgsql_exec_trigger (func=0x55e28d1e60d8, trigdata=0x7ffe5de610c0) at /home/andres/src/postgresql/src/pl/plpgsql/src/pl_exec.c:1019 #16 0x00007f223eba4c46 in plpgsql_call_handler (fcinfo=0x7ffe5de61040) at /home/andres/src/postgresql/src/pl/plpgsql/src/pl_handler.c:256 #17 0x000055e28bda9d6b in ExecCallTriggerFunc (trigdata=0x7ffe5de610c0, tgindx=0, finfo=0x55e28d29cd30, instr=0x0, per_tuple_context=0x55e28d2b4120) at /home/andres/src/postgresql/src/backend/commands/trigger.c:2077 #18 0x000055e28bdaa2ed in ExecBRInsertTriggers (estate=0x55e28d2b2230, relinfo=0x55e28d29bc60, slot=0x55e28d29b858) at /home/andres/src/postgresql/src/backend/commands/trigger.c:2206 #19 0x000055e28bde890c in ExecSimpleRelationInsert (resultRelInfo=0x55e28d29bc60, estate=0x55e28d2b2230, slot=0x55e28d29b858) at /home/andres/src/postgresql/src/backend/executor/execReplication.c:422 #20 0x000055e28bf94f29 in apply_handle_insert_internal (relinfo=0x55e28d29bc60, estate=0x55e28d2b2230, remoteslot=0x55e28d29b858) at /home/andres/src/postgresql/src/backend/replication/logical/worker.c:1214 #21 0x000055e28bf94e9d in apply_handle_insert (s=0x7ffe5de61290) at /home/andres/src/postgresql/src/backend/replication/logical/worker.c:1190 #22 0x000055e28bf96659 in apply_dispatch (s=0x7ffe5de61290) at /home/andres/src/postgresql/src/backend/replication/logical/worker.c:1912 #23 0x000055e28bf96bab in LogicalRepApplyLoop (last_received=23489672) at /home/andres/src/postgresql/src/backend/replication/logical/worker.c:2150 It doesn't at all seem necessary for AlterSubscription_refresh() to use the global 'wrconn' variable. This code has basically survived unmodified since commit 7c4f52409a8c7d85ed169bbbc1f6092274d03920 Author: Peter Eisentraut <peter_e@gmx.net> Date: 2017-03-23 08:36:36 -0400 Logical replication support for initial data copy The crash is trivial to fix - simply use a local variable instead of wrconn. There's a few more places than what you hit here that need to be dealt with though. Obviously the code also ought to reset wrconn to NULL after freeing. But uh, isn't this code also just plainly in danger of leaking libpq connections in case something goes wrong? /* Try to connect to the publisher. */ conn = walrcv_connect(sub->conninfo, true, sub->name, &err); if (!conn) ereport(ERROR, (errmsg("could not connect to the publisher: %s", err))); /* Get the table list from publisher. */ pubrel_names = fetch_table_list(conn, sub->publications); /* We are done with the remote side, close connection. */ walrcv_disconnect(conn); What if fetch_table_list() errors out? Oddly other cases seem to pay some attention to this, see e.g. DropSubscription(). Greetings, Andres Freund
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Stephen FrostДата:
Сообщение: Re: BUG #16708: Please add some more restrictions to the default systemd unit files
Следующее
От: Bruce MomjianДата:
Сообщение: Re: BUG #16708: Please add some more restrictions to the default systemd unit files