Re: BUG #16226: background worker "logical replication worker" (PID) was terminated by signal 11: Segmentation

Поиск
Список
Период
Сортировка
От Vadim Yatsenko
Тема Re: BUG #16226: background worker "logical replication worker" (PID) was terminated by signal 11: Segmentation
Дата
Msg-id CAJTwZ8w-o6QwL-4v=-jjCWDtB2UDA1KY05GCRYQ+6fbvR2ErZA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #16226: background worker "logical replication worker" (PID ) was terminated by signal 11: Segmentation  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
Tom,

Thanks you!  We'll wait patch to update our servers.

Best Regards,
Vadim Yatsenko

ср, 22 янв. 2020 г., 18:28 Tom Lane <tgl@sss.pgh.pa.us>:
> We have 2 PostgreSQL servers with logical replication between Postgres 11.6
> (Primary) and 12.1 (Logical). Some times ago, we changed column type in a 2
> big tables from integer to text:
> ...
> , this of course led to a full rewrite both tables. We repated this
> operation on both servers. And after that we started to get error like
> "background worker "logical replication worker" (PID <pid>) was terminated
> by signal 11: Segmentation fault" and server goes to recovery mode.

Not sure, but this seems like it might be explained by this recent
bug fix:


Author: Tom Lane <tgl@sss.pgh.pa.us>
Branch: master [4d9ceb001] 2019-11-22 11:31:19 -0500
Branch: REL_12_STABLE [a2aa224e0] 2019-11-22 11:31:19 -0500
Branch: REL_11_STABLE [b72a44c51] 2019-11-22 11:31:19 -0500
Branch: REL_10_STABLE [5d3fcb53a] 2019-11-22 11:31:19 -0500

    Fix bogus tuple-slot management in logical replication UPDATE handling.

    slot_modify_cstrings seriously abused the TupleTableSlot API by relying
    on a slot's underlying data to stay valid across ExecClearTuple.  Since
    this abuse was also quite undocumented, it's little surprise that the
    case got broken during the v12 slot rewrites.  As reported in bug #16129
    from Ondřej Jirman, this could lead to crashes or data corruption when
    a logical replication subscriber processes a row update.  Problems would
    only arise if the subscriber's table contained columns of pass-by-ref
    types that were not being copied from the publisher.

    Fix by explicitly copying the datum/isnull arrays from the source slot
    that the old row was in already.  This ends up being about the same
    thing that happened pre-v12, but hopefully in a less opaque and
    fragile way.

    We might've caught the problem sooner if there were any test cases
    dealing with updates involving non-replicated or dropped columns.
    Now there are.

    Back-patch to v10 where this code came in.  Even though the failure
    does not manifest before v12, IMO this code is too fragile to leave
    as-is.  In any case we certainly want the additional test coverage.

    Patch by me; thanks to Tomas Vondra for initial investigation.

    Discussion: https://postgr.es/m/16129-a0c0f48e71741e5f@postgresql.org

                        regards, tom lane

В списке pgsql-bugs по дате отправления:

Предыдущее
От: selva kumar
Дата:
Сообщение: Query will execute when inner query have issue
Следующее
От: Daniel Gustafsson
Дата:
Сообщение: Re: Query will execute when inner query have issue