Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker
От | Ondřej Jirman |
---|---|
Тема | Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker |
Дата | |
Msg-id | 20191121155707.pqkgjphh65o5fapz@core.my.home обсуждение исходный текст |
Ответ на | Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Ответы |
Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker
(Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker (Ondřej Jirman <ienieghapheoghaiwida@xff.cz>) |
Список | pgsql-bugs |
On Thu, Nov 21, 2019 at 04:27:32PM +0100, Tomas Vondra wrote: > On Thu, Nov 21, 2019 at 04:07:16PM +0100, Ondřej Jirman wrote: > > I've also managed to get valgrind output on the segfaulting process: > > > > https://megous.com/dl/tmp/valgrind-105524-109020.log > > > > Though it probably doesn't show much new. > > > > Is postgres actually compiled with valgrind support? What does pg_config > say? I don't see any valgrind related options in configure --help, I repackaged it just with CFLAGS set to -ggdb -Og. CONFIGURE = '--prefix=/usr' '--mandir=/usr/share/man' '--datadir=/usr/share/postgresql' '--sysconfdir=/etc' '--with-gssapi''--with-libxml' '--with-openssl' '--with-perl' '--with-python' '--with-tcl' '--with-pam' '--with-system-tzdata=/usr/share/zoneinfo''--with-uuid=e2fs' '--with-icu' '--with-systemd' '--with-ldap' '--enable-nls' '--enable-thread-safety''--disable-rpath' 'PYTHON=/usr/bin/python2' 'CFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong-fno-plt -Og -ggdb -fdebug-prefix-map=/workspace/external/pgsql/postgresql-pkg/src=/usr/src/debug' 'LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now''CPPFLAGS=-D_FORTIFY_SOURCE=2' 'CXXFLAGS=-march=x86-64 -mtune=generic-O2 -pipe -fstack-protector-strong -fno-plt -Og -ggdb -fdebug-prefix-map=/workspace/external/pgsql/postgresql-pkg/src=/usr/src/debug' CC = gcc CPPFLAGS = -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation-Wno-stringop-truncation -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong -fno-plt -Og-ggdb -fdebug-prefix-map=/workspace/external/pgsql/postgresql-pkg/src=/usr/src/debug CFLAGS_SL = -fPIC LDFLAGS = -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -Wl,--as-needed LDFLAGS_EX = LDFLAGS_SL = LIBS = -lpgcommon -lpgport -lpthread -lxml2 -lpam -lssl -lcrypto -lgssapi_krb5 -lz -lreadline -lrt -lcrypt -ldl -lm VERSION = PostgreSQL 12.1 > If you're building the server locally (I don't think there are packages > with valgrind enabled), maybe try adding --enable-cassert option and > > CFLAGS"-DRANDOMIZE_ALLOCATED_MEMORY" > > which will do more extensive checks at runtime. Done. CONFIGURE = '--prefix=/usr' '--mandir=/usr/share/man' '--datadir=/usr/share/postgresql' '--sysconfdir=/etc' '--with-gssapi''--with-libxml' '--with-openssl' '--with-perl' '--with-python' '--with-tcl' '--with-pam' '--with-system-tzdata=/usr/share/zoneinfo''--with-uuid=e2fs' '--with-icu' '--with-systemd' '--with-ldap' '--enable-nls' '--enable-thread-safety''--disable-rpath' '--enable-cassert' 'PYTHON=/usr/bin/python2' 'CFLAGS=-march=x86-64 -mtune=generic-O2 -pipe -fstack-protector-strong -fno-plt -Og -ggdb -fdebug-prefix-map=/workspace/external/pgsql/postgresql-pkg/src=/usr/src/debug-DRANDOMIZE_ALLOCATED_MEMORY' 'LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now''CPPFLAGS=-D_FORTIFY_SOURCE=2' 'CXXFLAGS=-march=x86-64 -mtune=generic-O2 -pipe -fstack-protector-strong -fno-plt -Og -ggdb -fdebug-prefix-map=/workspace/external/pgsql/postgresql-pkg/src=/usr/src/debug' CC = gcc CPPFLAGS = -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation-Wno-stringop-truncation -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong -fno-plt -Og-ggdb -fdebug-prefix-map=/workspace/external/pgsql/postgresql-pkg/src=/usr/src/debug -DRANDOMIZE_ALLOCATED_MEMORY CFLAGS_SL = -fPIC LDFLAGS = -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -Wl,--as-needed LDFLAGS_EX = LDFLAGS_SL = LIBS = -lpgcommon -lpgport -lpthread -lxml2 -lpam -lssl -lcrypto -lgssapi_krb5 -lz -lreadline -lrt -lcrypt -ldl -lm VERSION = PostgreSQL 12.1 The difference now is that _len argument value is not stable accross crashes. one crash... #0 0x00007fb90c336917 in __memmove_sse2_unaligned_erms () from /usr/lib/libc.so.6 #1 0x000055ff2a7d0e64 in memcpy (__len=972662513, __src=<optimized out>, __dest=0x55ff2cc7dce4) at /usr/include/bits/string_fortified.h:34 #2 tts_virtual_materialize (slot=0x55ff2cc7d868) at execTuples.c:235 #3 0x000055ff2a7d2a63 in ExecFetchSlotHeapTuple (slot=slot@entry=0x55ff2cc7d868, materialize=materialize@entry=true, shouldFree=shouldFree@entry=0x7ffc80cfc607)at execTuples.c:1624 #4 0x000055ff2a634809 in heapam_tuple_update (relation=0x7fb8f8338100, otid=0x55ff2cc7e138, slot=0x55ff2cc7d868, cid=0,snapshot=<optimized out>, crosscheck=0x0, wait=true, tmfd=0x7ffc80cfc680, lockmode=0x7ffc80cfc67c, update_indexes=0x7ffc80cfc6d7) at heapam_handler.c:325 #5 0x000055ff2a675d55 in table_tuple_update (update_indexes=0x7ffc80cfc6d7, lockmode=0x7ffc80cfc67c, tmfd=0x7ffc80cfc680,wait=true, crosscheck=0x0, snapshot=0x0, cid=<optimized out>, slot=0x55ff2cc7d868, otid=0x55ff2cc7e138, rel=0x7fb8f8338100) at ../../../../src/include/access/tableam.h:1261 #6 simple_table_tuple_update (rel=rel@entry=0x7fb8f8338100, otid=otid@entry=0x55ff2cc7e138, slot=slot@entry=0x55ff2cc7d868,snapshot=0x0, update_indexes=update_indexes@entry=0x7ffc80cfc6d7) at tableam.c:325 #7 0x000055ff2a7cdf08 in ExecSimpleRelationUpdate (estate=estate@entry=0x55ff2ccd02b8, epqstate=epqstate@entry=0x7ffc80cfc730,searchslot=searchslot@entry=0x55ff2cc7e108, slot=slot@entry=0x55ff2cc7d868) at execReplication.c:493 #8 0x000055ff2a8eb8b2 in apply_handle_update (s=s@entry=0x7ffc80d03d00) at worker.c:750 #9 0x000055ff2a8ebe62 in apply_dispatch (s=s@entry=0x7ffc80d03d00) at worker.c:968 #10 0x000055ff2a8ec5ba in LogicalRepApplyLoop (last_received=11741462160) at worker.c:1176 #11 0x000055ff2a8ecd1a in ApplyWorkerMain (main_arg=<optimized out>) at worker.c:1734 #12 0x000055ff2a8b3927 in StartBackgroundWorker () at bgworker.c:834 #13 0x000055ff2a8c2b6b in do_start_bgworker (rw=rw@entry=0x55ff2cbbf430) at postmaster.c:5770 #14 0x000055ff2a8c2cf7 in maybe_start_bgworkers () at postmaster.c:5996 #15 0x000055ff2a8c3678 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5167 #16 <signal handler called> #17 0x00007fb90c384e2a in select () from /usr/lib/libc.so.6 #18 0x000055ff2a8c40f8 in ServerLoop () at postmaster.c:1668 #19 0x000055ff2a8c54fc in PostmasterMain (argc=3, argv=<optimized out>) at postmaster.c:1377 #20 0x000055ff2a81f88a in main (argc=3, argv=0x55ff2cb94260) at main.c:228 and another ... #0 0x00007f7a1befc917 in __memmove_sse2_unaligned_erms () from /usr/lib/libc.so.6 #1 0x0000559a5439ae64 in memcpy (__len=766338801, __src=<optimized out>, __dest=0x559a55626934) at /usr/include/bits/string_fortified.h:34 #2 tts_virtual_materialize (slot=0x559a556264b8) at execTuples.c:235 #3 0x0000559a5439ca63 in ExecFetchSlotHeapTuple (slot=slot@entry=0x559a556264b8, materialize=materialize@entry=true, shouldFree=shouldFree@entry=0x7fff7f2d5787)at execTuples.c:1624 #4 0x0000559a541fe809 in heapam_tuple_update (relation=0x7f7a07efe100, otid=0x559a55626d88, slot=0x559a556264b8, cid=0,snapshot=<optimized out>, crosscheck=0x0, wait=true, tmfd=0x7fff7f2d5800, lockmode=0x7fff7f2d57fc, update_indexes=0x7fff7f2d5857) at heapam_handler.c:325 #5 0x0000559a5423fd55 in table_tuple_update (update_indexes=0x7fff7f2d5857, lockmode=0x7fff7f2d57fc, tmfd=0x7fff7f2d5800,wait=true, crosscheck=0x0, snapshot=0x0, cid=<optimized out>, slot=0x559a556264b8, otid=0x559a55626d88, rel=0x7f7a07efe100) at ../../../../src/include/access/tableam.h:1261 #6 simple_table_tuple_update (rel=rel@entry=0x7f7a07efe100, otid=otid@entry=0x559a55626d88, slot=slot@entry=0x559a556264b8,snapshot=0x0, update_indexes=update_indexes@entry=0x7fff7f2d5857) at tableam.c:325 #7 0x0000559a54397f08 in ExecSimpleRelationUpdate (estate=estate@entry=0x559a55678dc8, epqstate=epqstate@entry=0x7fff7f2d58b0,searchslot=searchslot@entry=0x559a55626d58, slot=slot@entry=0x559a556264b8) at execReplication.c:493 #8 0x0000559a544b58b2 in apply_handle_update (s=s@entry=0x7fff7f2dce80) at worker.c:750 #9 0x0000559a544b5e62 in apply_dispatch (s=s@entry=0x7fff7f2dce80) at worker.c:968 #10 0x0000559a544b65ba in LogicalRepApplyLoop (last_received=11741462160) at worker.c:1176 #11 0x0000559a544b6d1a in ApplyWorkerMain (main_arg=<optimized out>) at worker.c:1734 #12 0x0000559a5447d927 in StartBackgroundWorker () at bgworker.c:834 #13 0x0000559a5448cb6b in do_start_bgworker (rw=rw@entry=0x559a5558c010) at postmaster.c:5770 #14 0x0000559a5448ccf7 in maybe_start_bgworkers () at postmaster.c:5996 #15 0x0000559a5448d678 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5167 #16 <signal handler called> #17 0x00007f7a1bf4ae2a in select () from /usr/lib/libc.so.6 #18 0x0000559a5448e0f8 in ServerLoop () at postmaster.c:1668 #19 0x0000559a5448f4fc in PostmasterMain (argc=3, argv=<optimized out>) at postmaster.c:1377 #20 0x0000559a543e988a in main (argc=3, argv=0x559a55534260) at main.c:228 So it perhaps looks like some memory management issue in the subscribing postmaster, rather than wal corruption on the publisher. > > > Can you show us \d+ on the videos table? What's the replica identity? > > > > Table "public.videos" > > Column | Type | Collation | Nullable | Default | Storage | Statstarget | Description > > -------------+-----------------------------+-----------+----------+------------------------------------+----------+--------------+------------- > > id | integer | | not null | nextval('videos_id_seq'::regclass) | plain | | > > title | text | | not null | | extended | | > > cover_image | bytea | | not null | | extended | | > > metadata | jsonb | | not null | | extended | | > > category | integer | | not null | | plain | | > > published | date | | not null | | plain | | > > added | timestamp without time zone | | not null | now() | plain | | > > played | boolean | | not null | false | plain | | > > Indexes: > > "videos_pkey" PRIMARY KEY, btree (id) > > Foreign-key constraints: > > "videos_category_fkey" FOREIGN KEY (category) REFERENCES categories(id) > > Publications: > > "pub" > > > > Table "public.videos" > > Column | Type | Collation | Nullable | Default | Storage | Statstarget | Description > > -------------+-----------------------------+-----------+----------+------------------------------------+----------+--------------+------------- > > id | integer | | not null | nextval('videos_id_seq'::regclass) | plain | | > > title | text | | not null | | extended | | > > cover_image | bytea | | not null | | extended | | > > metadata | jsonb | | not null | | extended | | > > category | integer | | not null | | plain | | > > published | date | | not null | | plain | | > > added | timestamp without time zone | | not null | now() | plain | | > > played | boolean | | not null | false | plain | | > > Indexes: > > "videos_pkey" PRIMARY KEY, btree (id) > > Foreign-key constraints: > > "videos_category_fkey" FOREIGN KEY (category) REFERENCES categories(id) > > Access method: heap > > > > > > Googling for relica identity, it looks like it should be shown in the \d+ > > output, but it isn't, for me. Maybe because it's has default value? > > > > Yes, I think it should be listed either for the index, or there should > be an extra section with replica identity. But that depends on psql > version. Aren't you accidentally using an older psql version? That > should be mentioned right after connecting to the server, I think. Maybe it has something to do with my upgrade method. I dumped/restored the replica with pg_dumpall, and then just proceded to enable subscription and refresh publication with (copy_data=false) for all my subscriptions. Now, I've tried to start fresh and replicate from the primary "hometv" database into a new l5_hometv2 on my crashing 12.1 server and it replicates fine, no crashes. Anyway even in this fresh replicated database, there's no REPLICA IDENTITY in the \d+ output. I don't have multiple postgresql installations on the host. There's only a difference between primary version 11.5 (ARMv7) and replica 12.1 (x86_64). regards, o. > regards > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Tomas VondraДата:
Сообщение: Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker
Следующее
От: Tomas VondraДата:
Сообщение: Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker