Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker

Поиск
Список
Период
Сортировка
От Ondřej Jirman
Тема Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker
Дата
Msg-id 20191121150716.szxqukxrtpjpckm3@core.my.home
обсуждение исходный текст
Ответ на Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-bugs
On Thu, Nov 21, 2019 at 03:35:03PM +0100, Tomas Vondra wrote:
> On Thu, Nov 21, 2019 at 02:55:23PM +0100, Ondřej Jirman wrote:
> > On Thu, Nov 21, 2019 at 02:32:37PM +0100, Tomas Vondra wrote:
> > > On Thu, Nov 21, 2019 at 01:59:13PM +0100, Ondřej Jirman wrote:
> > > > On Thu, Nov 21, 2019 at 12:53:26PM +0100, postgresql wrote:
> > > > > Hello,
> > > > >
> > > > > On Thu, Nov 21, 2019 at 11:39:40AM +0100, Tomas Vondra wrote:
> > > > > > On Thu, Nov 21, 2019 at 01:14:18AM +0000, PG Bug reporting form wrote:
> > > > > > > Replication of one of my databases (running on ARMv7 machine) started
> > > > > > > segfaulting on the subscriber side (x86_64) like this:
> > > > > > >
> > > > > > > #0  0x00007fc259739917 in __memmove_sse2_unaligned_erms () from
> > > > > > > /usr/lib/libc.so.6
> > > > > > > #1  0x000055d033e93d44 in memcpy (__len=620701425, __src=<optimized out>,
> > > > > > > __dest=0x55d0356da804) at /usr/include/bits/string_fortified.h:34
> > > > > > > #2  tts_virtual_materialize (slot=0x55d0356da3b8) at execTuples.c:235
> > > > > > > #3  0x000055d033e94d32 in ExecFetchSlotHeapTuple
> > > > > > > (slot=slot@entry=0x55d0356da3b8, materialize=materialize@entry=true,
> > > > > > > shouldFree=shouldFree@entry=0x7fff0e7cf387) at execTuples.c:1624
> > > > >
> > > > > I forgot to add that publisher is still PostgreSQL 11.5.
> > > > >
> > > >
> > > > I can also add that I have data checksumming enabled on both ends, and
> > > > it did not detect any corruption:
> > > >
> > > > # pg_verify_checksums -D /var/lib/postgres/data
> > > > Checksum scan completed
> > > > Data checksum version: 1
> > > > Files scanned:  1751
> > > > Blocks scanned: 86592
> > > > Bad checksums:  0
> > > >
> > > > # pg_checksums /var/lib/postgres/data
> > > > Checksum operation completed
> > > > Files scanned:  22777
> > > > Blocks scanned: 3601527
> > > > Bad checksums:  0
> > > > Data checksum version: 1
> > > >
> > > > WAL log on the publisher is also dumpable to a state hours after the issues
> > > > started:
> > > >
> > > > I've put the dump here, if it's of any use: https://megous.com/dl/tmp/wal_dump.txt
> > > >
> > > > Dump ends with:
> > > >
> > > > pg_waldump: FATAL:  error in WAL record at 2/BBE0E538: invalid record length at 2/BBE0E5A8: wanted 24, got 0
> > > >
> > > > But that seems normal. I get that error on my other database clusters, too.
> > > >
> > > > I managed to extract the failing logical decoding data from the publisher, if
> > > > that helps:
> > > >
> > > >
> > > > SELECT * FROM pg_logical_slot_peek_binary_changes('l5_hometv', NULL, NULL, 'proto_version', '1',
'publication_names','pub');
 
> > > >
> > > > 2/BBD86EA0 | 56395 | \x4200000002bbd880b800023acd790ce5510000dc4b
> > > > 2/BBD87E90 | 56395 |
\x5200004a687075626c696300766964656f73006400080169640000000017ffffffff007469746c650000000019ffffffff00636f7665725f696d6167650000000011ffffffff006d657461646174610000000edafffffff
> > > >
f0063617465676f72790000000017ffffffff007075626c6973686564000000043affffffff006164646564000000045affffffff00706c617965640000000010ffffffff
> > > > 2/BBD87E90 | 56395 |
\x5500004a684e0008740000000438333933740000005650617a6465726b613a204f206dc3a964696120736520706f7665646520626f6a2e204b64796279206ec3a17320706f6c6974696369206d696c6f76616c692c20627
> > > >
96c6f206279206ec49b636f20c5a17061746ec49b7574000001397b226964223a20226430313064343430303965323131656162323539616331663662323230656538222c202264617465223a2022323031392d31312d3138222c20226e616d65223a20
> > > >
2250617a6465726b613a204f206dc3a964696120736520706f7665646520626f6a2e204b64796279206ec3a17320706f6c6974696369206d696c6f76616c692c2062796c6f206279206ec49b636f20c5a17061746ec49b222c2022696d616765223a202
> > > >
268747470733a2f2f63646e2e7873642e637a2f726573697a652f63353535656239633131353333313632386164666539396237343534353731655f657874726163743d302c302c313931392c313038305f726573697a653d3732302c3430355f2e6a70
> > > >
673f686173683d6362316362623836336230353361613561333761346666616439303865303431227d7400000003323432740000000a323031392d31312d3138740000001a323031392d31312d31382031323a35303a30312e383136333736740000000
> > > > 174
> > > > 2/BBD880E8 | 56395 | \x430000000002bbd880b800000002bbd880e800023acd790ce551
> > > >
> > > 
> > > Can you show the whole transaction? From the WAL dump it seems it just
> > > did a single UPDATE:
> > 
> > I'm not sure what you mean. The above is the whole output I got from that
> > SELECT.
> > 
> > > rmgr: Heap        len (rec/tot):     59/  4075, tx:      56395, lsn: 2/BBD86EA0, prev 2/BBD86E68, desc: LOCK off
2:xid 56395: flags 0 LOCK_ONLY EXCL_LOCK
 
> > >     blkref #0: rel 1663/19030/19048 fork main blk 415 (FPW); hole: offset: 56, length: 4176
> > > rmgr: Heap        len (rec/tot):    523/   523, tx:      56395, lsn: 2/BBD87E90, prev 2/BBD86EA0, desc:
HOT_UPDATEoff 2 xmax 56395 ; new off 9 xmax 0
 
> > >     blkref #0: rel 1663/19030/19048 fork main blk 415
> > > rmgr: Transaction len (rec/tot):     46/    46, tx:      56395, lsn: 2/BBD880B8, prev 2/BBD87E90, desc: COMMIT
2019-11-2023:28:47.016273 CET
 
> > > 
> > > Which fields does it update?
> > 
> > The only update that my system does in this database is:
> > 
> > UPDATE videos SET played = TRUE WHERE id = ?
> > 
> > The rest is just inserts:
> > 
> > INSERT INTO categories (name, metadata, cover_image, provider) VALUES (?, ?, ?, (SELECT id FROM providers WHERE
system_name= ?))
 
> > INSERT INTO videos (title, metadata, cover_image, category, published) VALUES (?, ?, ?, ?, ?)
> > 
> > There are no other data modification statements executed.
> > 
> 
> Well, this is definitely the updated - both judging by the WAL dump and
> also because the backtrace contains apply_handle_update.

I've also managed to get valgrind output on the segfaulting process:

https://megous.com/dl/tmp/valgrind-105524-109020.log

Though it probably doesn't show much new.

> Can you show us \d+ on the videos table? What's the replica identity?

                                                             Table "public.videos"
   Column    |            Type             | Collation | Nullable |              Default               | Storage  |
Statstarget | Description
 

-------------+-----------------------------+-----------+----------+------------------------------------+----------+--------------+-------------
 id          | integer                     |           | not null | nextval('videos_id_seq'::regclass) | plain    |
        |
 
 title       | text                        |           | not null |                                    | extended |
        |
 
 cover_image | bytea                       |           | not null |                                    | extended |
        |
 
 metadata    | jsonb                       |           | not null |                                    | extended |
        |
 
 category    | integer                     |           | not null |                                    | plain    |
        |
 
 published   | date                        |           | not null |                                    | plain    |
        |
 
 added       | timestamp without time zone |           | not null | now()                              | plain    |
        |
 
 played      | boolean                     |           | not null | false                              | plain    |
        |
 
Indexes:
    "videos_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "videos_category_fkey" FOREIGN KEY (category) REFERENCES categories(id)
Publications:
    "pub"

                                                             Table "public.videos"
   Column    |            Type             | Collation | Nullable |              Default               | Storage  |
Statstarget | Description
 

-------------+-----------------------------+-----------+----------+------------------------------------+----------+--------------+-------------
 id          | integer                     |           | not null | nextval('videos_id_seq'::regclass) | plain    |
        |
 
 title       | text                        |           | not null |                                    | extended |
        | 
 cover_image | bytea                       |           | not null |                                    | extended |
        |
 
 metadata    | jsonb                       |           | not null |                                    | extended |
        |
 
 category    | integer                     |           | not null |                                    | plain    |
        |
 
 published   | date                        |           | not null |                                    | plain    |
        |
 
 added       | timestamp without time zone |           | not null | now()                              | plain    |
        |
 
 played      | boolean                     |           | not null | false                              | plain    |
        |
 
Indexes:
    "videos_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "videos_category_fkey" FOREIGN KEY (category) REFERENCES categories(id)
Access method: heap


Googling for relica identity, it looks like it should be shown in the \d+
output, but it isn't, for me. Maybe because it's has default value?

regards,
    o.


> regards
> 
> -- 
> Tomas Vondra                  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: Failed assertion clauses != NIL
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker