Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker
От | Tomas Vondra |
---|---|
Тема | Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker |
Дата | |
Msg-id | 20191121234701.2wtyc77zd5ux3myf@development обсуждение исходный текст |
Ответ на | Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker
|
Список | pgsql-bugs |
On Thu, Nov 21, 2019 at 05:37:25PM -0500, Tom Lane wrote: >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> FWIW my hunch is the bug is somewhere in this chunk of code from >> apply_heap_update: > >> oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); >> ExecCopySlot(remoteslot, localslot); >> slot_modify_cstrings(remoteslot, rel, newtup.values, newtup.changed); >> MemoryContextSwitchTo(oldctx); > >My first reaction to looking at this was that ExecCopySlot() could >be dropped entirely, because the first thing that slot_modify_cstrings >does is > > slot_getallattrs(slot); > ExecClearTuple(slot); > >That slot_getallattrs call seems 100% useless as well, because >surely ExecClearTuple is just going to drop all the data. > I don't think that's quite true. After the ExecCopySlot call, the pass-by-ref Datums in remoteslot will point to a tuple attached to localslot. But it does not pass the tuple 'ownership' to the remoteslot, i.e. the flag TTS_FLAG_SHOULDFREE won't be set, i.e. the tuple won't be freed. >On closer inspection, though, it looks like the author of this >code is relying on ExecClearTuple to *not* destroy the data, >because the subsequent loop might overwrite only some of the >slot's columns, before it does > > ExecStoreVirtualTuple(slot); > >which supposes that all the columns are valid. > >But of course this is 100% broken, because whatever ExecClearTuple >may have done or not done to the slot's datum/isnull arrays, it >certainly pfree'd the slot's physical tuple. So any pass-by-ref >datums are now dangling pointers. > >I imagine the only reason this code has gotten past the valgrind >animals is that we're not testing cases where non-replaced columns >in the subscriber table are of pass-by-ref types. > I haven't checked, but I'd imagine we actually do such tests. I've however tried to reproduce this, unsuccessfully. >I draw a direct line between the existence of this bug and the lack >of commentary in slot_modify_cstrings about what it's doing. If the >author had troubled to write a comment explaining these machinations, >he might've noticed the bug, or at least reviewers might have. > Not sure. More comments would not hurt, but I don't see any bug yet. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-bugs по дате отправления: