Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker

Поиск
Список
Период
Сортировка
От Ondřej Jirman
Тема Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker
Дата
Msg-id 20191121155707.pqkgjphh65o5fapz@core.my.home
обсуждение исходный текст
Ответ на Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker  (Ondřej Jirman <ienieghapheoghaiwida@xff.cz>)
Список pgsql-bugs
On Thu, Nov 21, 2019 at 04:27:32PM +0100, Tomas Vondra wrote:
> On Thu, Nov 21, 2019 at 04:07:16PM +0100, Ondřej Jirman wrote:
> > I've also managed to get valgrind output on the segfaulting process:
> > 
> > https://megous.com/dl/tmp/valgrind-105524-109020.log
> > 
> > Though it probably doesn't show much new.
> > 
> 
> Is postgres actually compiled with valgrind support? What does pg_config
> say?

I don't see any valgrind related options in configure --help, I repackaged it
just with CFLAGS set to -ggdb -Og.

CONFIGURE = '--prefix=/usr' '--mandir=/usr/share/man' '--datadir=/usr/share/postgresql' '--sysconfdir=/etc'
'--with-gssapi''--with-libxml' '--with-openssl' '--with-perl' '--with-python' '--with-tcl' '--with-pam'
'--with-system-tzdata=/usr/share/zoneinfo''--with-uuid=e2fs' '--with-icu' '--with-systemd' '--with-ldap' '--enable-nls'
'--enable-thread-safety''--disable-rpath' 'PYTHON=/usr/bin/python2' 'CFLAGS=-march=x86-64 -mtune=generic -O2 -pipe
-fstack-protector-strong-fno-plt -Og -ggdb
-fdebug-prefix-map=/workspace/external/pgsql/postgresql-pkg/src=/usr/src/debug'
'LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now''CPPFLAGS=-D_FORTIFY_SOURCE=2' 'CXXFLAGS=-march=x86-64
-mtune=generic-O2 -pipe -fstack-protector-strong -fno-plt -Og -ggdb
-fdebug-prefix-map=/workspace/external/pgsql/postgresql-pkg/src=/usr/src/debug'
CC = gcc
CPPFLAGS = -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
-Wno-format-truncation-Wno-stringop-truncation -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong -fno-plt
-Og-ggdb -fdebug-prefix-map=/workspace/external/pgsql/postgresql-pkg/src=/usr/src/debug
 
CFLAGS_SL = -fPIC
LDFLAGS = -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -Wl,--as-needed
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lpthread -lxml2 -lpam -lssl -lcrypto -lgssapi_krb5 -lz -lreadline -lrt -lcrypt -ldl -lm
VERSION = PostgreSQL 12.1

> If you're building the server locally (I don't think there are packages
> with valgrind enabled), maybe try adding --enable-cassert option and
> 
>    CFLAGS"-DRANDOMIZE_ALLOCATED_MEMORY"
> 
> which will do more extensive checks at runtime.

Done. 

CONFIGURE = '--prefix=/usr' '--mandir=/usr/share/man' '--datadir=/usr/share/postgresql' '--sysconfdir=/etc'
'--with-gssapi''--with-libxml' '--with-openssl' '--with-perl' '--with-python' '--with-tcl' '--with-pam'
'--with-system-tzdata=/usr/share/zoneinfo''--with-uuid=e2fs' '--with-icu' '--with-systemd' '--with-ldap' '--enable-nls'
'--enable-thread-safety''--disable-rpath' '--enable-cassert' 'PYTHON=/usr/bin/python2' 'CFLAGS=-march=x86-64
-mtune=generic-O2 -pipe -fstack-protector-strong -fno-plt -Og -ggdb
-fdebug-prefix-map=/workspace/external/pgsql/postgresql-pkg/src=/usr/src/debug-DRANDOMIZE_ALLOCATED_MEMORY'
'LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now''CPPFLAGS=-D_FORTIFY_SOURCE=2' 'CXXFLAGS=-march=x86-64
-mtune=generic-O2 -pipe -fstack-protector-strong -fno-plt -Og -ggdb
-fdebug-prefix-map=/workspace/external/pgsql/postgresql-pkg/src=/usr/src/debug'
CC = gcc
CPPFLAGS = -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
-Wno-format-truncation-Wno-stringop-truncation -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong -fno-plt
-Og-ggdb -fdebug-prefix-map=/workspace/external/pgsql/postgresql-pkg/src=/usr/src/debug -DRANDOMIZE_ALLOCATED_MEMORY
 
CFLAGS_SL = -fPIC
LDFLAGS = -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -Wl,--as-needed
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lpthread -lxml2 -lpam -lssl -lcrypto -lgssapi_krb5 -lz -lreadline -lrt -lcrypt -ldl -lm
VERSION = PostgreSQL 12.1

The difference now is that _len argument value is not stable accross crashes.

one crash...

#0  0x00007fb90c336917 in __memmove_sse2_unaligned_erms () from /usr/lib/libc.so.6
#1  0x000055ff2a7d0e64 in memcpy (__len=972662513, __src=<optimized out>, __dest=0x55ff2cc7dce4) at
/usr/include/bits/string_fortified.h:34
#2  tts_virtual_materialize (slot=0x55ff2cc7d868) at execTuples.c:235
#3  0x000055ff2a7d2a63 in ExecFetchSlotHeapTuple (slot=slot@entry=0x55ff2cc7d868, materialize=materialize@entry=true,
shouldFree=shouldFree@entry=0x7ffc80cfc607)at execTuples.c:1624
 
#4  0x000055ff2a634809 in heapam_tuple_update (relation=0x7fb8f8338100, otid=0x55ff2cc7e138, slot=0x55ff2cc7d868,
cid=0,snapshot=<optimized out>, crosscheck=0x0, wait=true, tmfd=0x7ffc80cfc680,
 
    lockmode=0x7ffc80cfc67c, update_indexes=0x7ffc80cfc6d7) at heapam_handler.c:325
#5  0x000055ff2a675d55 in table_tuple_update (update_indexes=0x7ffc80cfc6d7, lockmode=0x7ffc80cfc67c,
tmfd=0x7ffc80cfc680,wait=true, crosscheck=0x0, snapshot=0x0, cid=<optimized out>,
 
    slot=0x55ff2cc7d868, otid=0x55ff2cc7e138, rel=0x7fb8f8338100) at ../../../../src/include/access/tableam.h:1261
#6  simple_table_tuple_update (rel=rel@entry=0x7fb8f8338100, otid=otid@entry=0x55ff2cc7e138,
slot=slot@entry=0x55ff2cc7d868,snapshot=0x0, update_indexes=update_indexes@entry=0x7ffc80cfc6d7)
 
    at tableam.c:325
#7  0x000055ff2a7cdf08 in ExecSimpleRelationUpdate (estate=estate@entry=0x55ff2ccd02b8,
epqstate=epqstate@entry=0x7ffc80cfc730,searchslot=searchslot@entry=0x55ff2cc7e108,
 
    slot=slot@entry=0x55ff2cc7d868) at execReplication.c:493
#8  0x000055ff2a8eb8b2 in apply_handle_update (s=s@entry=0x7ffc80d03d00) at worker.c:750
#9  0x000055ff2a8ebe62 in apply_dispatch (s=s@entry=0x7ffc80d03d00) at worker.c:968
#10 0x000055ff2a8ec5ba in LogicalRepApplyLoop (last_received=11741462160) at worker.c:1176
#11 0x000055ff2a8ecd1a in ApplyWorkerMain (main_arg=<optimized out>) at worker.c:1734
#12 0x000055ff2a8b3927 in StartBackgroundWorker () at bgworker.c:834
#13 0x000055ff2a8c2b6b in do_start_bgworker (rw=rw@entry=0x55ff2cbbf430) at postmaster.c:5770
#14 0x000055ff2a8c2cf7 in maybe_start_bgworkers () at postmaster.c:5996
#15 0x000055ff2a8c3678 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5167
#16 <signal handler called>
#17 0x00007fb90c384e2a in select () from /usr/lib/libc.so.6
#18 0x000055ff2a8c40f8 in ServerLoop () at postmaster.c:1668
#19 0x000055ff2a8c54fc in PostmasterMain (argc=3, argv=<optimized out>) at postmaster.c:1377
#20 0x000055ff2a81f88a in main (argc=3, argv=0x55ff2cb94260) at main.c:228

and another ...

#0  0x00007f7a1befc917 in __memmove_sse2_unaligned_erms () from /usr/lib/libc.so.6
#1  0x0000559a5439ae64 in memcpy (__len=766338801, __src=<optimized out>, __dest=0x559a55626934) at
/usr/include/bits/string_fortified.h:34
#2  tts_virtual_materialize (slot=0x559a556264b8) at execTuples.c:235
#3  0x0000559a5439ca63 in ExecFetchSlotHeapTuple (slot=slot@entry=0x559a556264b8, materialize=materialize@entry=true,
shouldFree=shouldFree@entry=0x7fff7f2d5787)at execTuples.c:1624
 
#4  0x0000559a541fe809 in heapam_tuple_update (relation=0x7f7a07efe100, otid=0x559a55626d88, slot=0x559a556264b8,
cid=0,snapshot=<optimized out>, crosscheck=0x0, wait=true, tmfd=0x7fff7f2d5800,
 
    lockmode=0x7fff7f2d57fc, update_indexes=0x7fff7f2d5857) at heapam_handler.c:325
#5  0x0000559a5423fd55 in table_tuple_update (update_indexes=0x7fff7f2d5857, lockmode=0x7fff7f2d57fc,
tmfd=0x7fff7f2d5800,wait=true, crosscheck=0x0, snapshot=0x0, cid=<optimized out>,
 
    slot=0x559a556264b8, otid=0x559a55626d88, rel=0x7f7a07efe100) at ../../../../src/include/access/tableam.h:1261
#6  simple_table_tuple_update (rel=rel@entry=0x7f7a07efe100, otid=otid@entry=0x559a55626d88,
slot=slot@entry=0x559a556264b8,snapshot=0x0, update_indexes=update_indexes@entry=0x7fff7f2d5857)
 
    at tableam.c:325
#7  0x0000559a54397f08 in ExecSimpleRelationUpdate (estate=estate@entry=0x559a55678dc8,
epqstate=epqstate@entry=0x7fff7f2d58b0,searchslot=searchslot@entry=0x559a55626d58,
 
    slot=slot@entry=0x559a556264b8) at execReplication.c:493
#8  0x0000559a544b58b2 in apply_handle_update (s=s@entry=0x7fff7f2dce80) at worker.c:750
#9  0x0000559a544b5e62 in apply_dispatch (s=s@entry=0x7fff7f2dce80) at worker.c:968
#10 0x0000559a544b65ba in LogicalRepApplyLoop (last_received=11741462160) at worker.c:1176
#11 0x0000559a544b6d1a in ApplyWorkerMain (main_arg=<optimized out>) at worker.c:1734
#12 0x0000559a5447d927 in StartBackgroundWorker () at bgworker.c:834
#13 0x0000559a5448cb6b in do_start_bgworker (rw=rw@entry=0x559a5558c010) at postmaster.c:5770
#14 0x0000559a5448ccf7 in maybe_start_bgworkers () at postmaster.c:5996
#15 0x0000559a5448d678 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5167
#16 <signal handler called>
#17 0x00007f7a1bf4ae2a in select () from /usr/lib/libc.so.6
#18 0x0000559a5448e0f8 in ServerLoop () at postmaster.c:1668
#19 0x0000559a5448f4fc in PostmasterMain (argc=3, argv=<optimized out>) at postmaster.c:1377
#20 0x0000559a543e988a in main (argc=3, argv=0x559a55534260) at main.c:228

So it perhaps looks like some memory management issue in the subscribing
postmaster, rather than wal corruption on the publisher.

> > > Can you show us \d+ on the videos table? What's the replica identity?
> > 
> >                                                             Table "public.videos"
> >   Column    |            Type             | Collation | Nullable |              Default               | Storage  |
Statstarget | Description
 
> >
-------------+-----------------------------+-----------+----------+------------------------------------+----------+--------------+-------------
> > id          | integer                     |           | not null | nextval('videos_id_seq'::regclass) | plain    |
           |
 
> > title       | text                        |           | not null |                                    | extended |
           |
 
> > cover_image | bytea                       |           | not null |                                    | extended |
           |
 
> > metadata    | jsonb                       |           | not null |                                    | extended |
           |
 
> > category    | integer                     |           | not null |                                    | plain    |
           |
 
> > published   | date                        |           | not null |                                    | plain    |
           |
 
> > added       | timestamp without time zone |           | not null | now()                              | plain    |
           |
 
> > played      | boolean                     |           | not null | false                              | plain    |
           |
 
> > Indexes:
> >    "videos_pkey" PRIMARY KEY, btree (id)
> > Foreign-key constraints:
> >    "videos_category_fkey" FOREIGN KEY (category) REFERENCES categories(id)
> > Publications:
> >    "pub"
> > 
> >                                                             Table "public.videos"
> >   Column    |            Type             | Collation | Nullable |              Default               | Storage  |
Statstarget | Description
 
> >
-------------+-----------------------------+-----------+----------+------------------------------------+----------+--------------+-------------
> > id          | integer                     |           | not null | nextval('videos_id_seq'::regclass) | plain    |
           |
 
> > title       | text                        |           | not null |                                    | extended |
           |
 
> > cover_image | bytea                       |           | not null |                                    | extended |
           |
 
> > metadata    | jsonb                       |           | not null |                                    | extended |
           |
 
> > category    | integer                     |           | not null |                                    | plain    |
           |
 
> > published   | date                        |           | not null |                                    | plain    |
           |
 
> > added       | timestamp without time zone |           | not null | now()                              | plain    |
           |
 
> > played      | boolean                     |           | not null | false                              | plain    |
           |
 
> > Indexes:
> >    "videos_pkey" PRIMARY KEY, btree (id)
> > Foreign-key constraints:
> >    "videos_category_fkey" FOREIGN KEY (category) REFERENCES categories(id)
> > Access method: heap
> > 
> > 
> > Googling for relica identity, it looks like it should be shown in the \d+
> > output, but it isn't, for me. Maybe because it's has default value?
> > 
> 
> Yes, I think it should be listed either for the index, or there should
> be an extra section with replica identity. But that depends on psql
> version. Aren't you accidentally using an older psql version? That
> should be mentioned right after connecting to the server, I think.

Maybe it has something to do with my upgrade method. I dumped/restored
the replica with pg_dumpall, and then just proceded to enable subscription
and refresh publication with (copy_data=false) for all my subscriptions.

Now, I've tried to start fresh and replicate from the primary "hometv" database
into a new l5_hometv2 on my crashing 12.1 server and it replicates fine, no
crashes. Anyway even in this fresh replicated database, there's no REPLICA
IDENTITY in the \d+ output.

I don't have multiple postgresql installations on the host. There's only
a difference between primary version 11.5 (ARMv7) and replica 12.1 (x86_64).

regards,
    o.

> regards
> 
> -- 
> Tomas Vondra                  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker