Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker
От | Tomas Vondra |
---|---|
Тема | Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker |
Дата | |
Msg-id | 20191121165752.dffge6bh756xlfdg@development обсуждение исходный текст |
Ответ на | Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker (Ondřej Jirman <ienieghapheoghaiwida@xff.cz>) |
Ответы |
Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker
|
Список | pgsql-bugs |
On Thu, Nov 21, 2019 at 05:15:02PM +0100, Ondřej Jirman wrote: >On Thu, Nov 21, 2019 at 04:57:07PM +0100, Ondřej Jirman wrote: >> >> Maybe it has something to do with my upgrade method. I >> dumped/restored the replica with pg_dumpall, and then just proceded >> to enable subscription and refresh publication with (copy_data=false) >> for all my subscriptions. > >OTOH, it may not. There are 2 more databases replicated the same way >from the same database cluster, and they don't crash the replica >server, and continue replicating. The one of the other databases also >has bytea columns in some of the tables. > >It really just seems related to the machine restart (a regular one) >that I did on the primary, minutes later replica crashed, and kept >crashing ever since whenever connecting to the primary for the hometv >database. > Hmmm. A restart of the primary certainly should not cause any such damage, that'd be a bug too. And it'd be a bit strange that it correctly sends the data and it crashes the replica. How exactly did you restart the primary? What mode - smart/fast/immediate? >So maybe something's wrong with the replica database (maybe because the >connection got killed by the walsender at unfortunate time), rather >than the original database, because I can replicate the original DB >afresh into a new copy just fine and other databases continue >replicating just fine if I disable the crashing subscription. > Possibly, but what would be the damaged bit? The only thing I can think of is the replication slot info (i.e. snapshot), and I know there were some timing issues in the serialization. How far is the change from the restart point of the slot (visible in pg_replication_slots)? If there are many changes since then, that'd mean the corrupted snapshot is unlikely. There's a lot of moving parts in this - you're replicating between major versions, and from ARM to x86. All of that should work, of course, but maybe there's a bug somewhere. So it might take time to investigate and fix. Thanks for you patience ;-) regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-bugs по дате отправления: