Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker
Дата
Msg-id 26015.1574441403@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker  (Ondřej Jirman <ienieghapheoghaiwida@xff.cz>)
Список pgsql-bugs
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On Fri, Nov 22, 2019 at 01:54:01AM +0100, Ondřej Jirman wrote:
>> On Thu, Nov 21, 2019 at 07:45:34PM -0500, Tom Lane wrote:
>>> =?utf-8?Q?Ond=C5=99ej?= Jirman <ienieghapheoghaiwida@xff.cz> writes:
>>>> newtup.changed
>>>> {true, true, false, true, true, true, true, true, false <repeats 1656 times>}

>>> So column 3 is not getting replaced.

> Can you show us the attribute list as defined in the system, including
> e.g. dropped columns? That is, something like
>    SELECT attnum, attname, atttypid FROM pg_attribute
>     WHERE attrelid = 'public.videos'::regclass;
> both from the published and subscriber.

After digging around a bit more in the logrep code, it seems like we'd
only have a situation with newtup.changed[i] = false if the publisher
had hit this bit in logicalrep_write_tuple:

        else if (att->attlen == -1 && VARATT_IS_EXTERNAL_ONDISK(values[i]))
        {
            pq_sendbyte(out, 'u');    /* unchanged toast column */
            continue;
        }

So this seems to indicate that Ondřej's case involves an update on a row
that contained a toasted-out-of-line column that did not get updated.
That's different from the trigger conditions that I postulated, but the
end result is the same: slot_modify_cstrings would have a case where
it's supposed to retain the old data for a pass-by-ref column, and it'd
fail to do that correctly.

I also discovered that before v12, the calling code looked like this:

        ExecStoreTuple(localslot->tts_tuple, remoteslot, InvalidBuffer, false);
        slot_modify_cstrings(remoteslot, rel, newtup.values, newtup.changed);

so that at that time, "remoteslot" indeed did not have its own local
copy of the tuple, and thus the code failed to fail.  Perhaps this was
intentional on the part of whoever wrote this, but it's still an
unacceptable abuse of the API if you ask me.

I added some tests involving dropped columns to what I had last night
and pushed it.   However, looking at this now, I see that we really
still don't have enough coverage --- the code path I showed above
is not hit by the regression tests, according to the code coverage
bot.  I don't think it's really acceptable for such a corner case
bit of the logrep protocol to not get tested :-(

            regards, tom lane



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker
Следующее
От: Ondřej Jirman
Дата:
Сообщение: Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker