Re: BUG #16226: background worker "logical replication worker" (PID) was terminated by signal 11: Segmentation

Поиск

Список

Период

Сортировка

От	Vadim Yatsenko
Тема	Re: BUG #16226: background worker "logical replication worker" (PID) was terminated by signal 11: Segmentation
Дата	23 января 2020 г. 12:03:02
Msg-id	CAJTwZ8w-o6QwL-4v=-jjCWDtB2UDA1KY05GCRYQ+6fbvR2ErZA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: BUG #16226: background worker "logical replication worker" (PID ) was terminated by signal 11: Segmentation (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-bugs

Дерево обсуждения

Tom,

Thanks you! We'll wait patch to update our servers.

Best Regards,
Vadim Yatsenko

ср, 22 янв. 2020 г., 18:28 Tom Lane <tgl@sss.pgh.pa.us>:

> We have 2 PostgreSQL servers with logical replication between Postgres 11.6
> (Primary) and 12.1 (Logical). Some times ago, we changed column type in a 2
> big tables from integer to text:
> ...
> , this of course led to a full rewrite both tables. We repated this
> operation on both servers. And after that we started to get error like
> "background worker "logical replication worker" (PID <pid>) was terminated
> by signal 11: Segmentation fault" and server goes to recovery mode.

Not sure, but this seems like it might be explained by this recent
bug fix:

Author: Tom Lane <tgl@sss.pgh.pa.us>
Branch: master [4d9ceb001] 2019-11-22 11:31:19 -0500
Branch: REL_12_STABLE [a2aa224e0] 2019-11-22 11:31:19 -0500
Branch: REL_11_STABLE [b72a44c51] 2019-11-22 11:31:19 -0500
Branch: REL_10_STABLE [5d3fcb53a] 2019-11-22 11:31:19 -0500

Fix bogus tuple-slot management in logical replication UPDATE handling.

slot_modify_cstrings seriously abused the TupleTableSlot API by relying
on a slot's underlying data to stay valid across ExecClearTuple. Since
this abuse was also quite undocumented, it's little surprise that the
case got broken during the v12 slot rewrites. As reported in bug #16129
from Ondřej Jirman, this could lead to crashes or data corruption when
a logical replication subscriber processes a row update. Problems would
only arise if the subscriber's table contained columns of pass-by-ref
types that were not being copied from the publisher.

Fix by explicitly copying the datum/isnull arrays from the source slot
that the old row was in already. This ends up being about the same
thing that happened pre-v12, but hopefully in a less opaque and
fragile way.

We might've caught the problem sooner if there were any test cases
dealing with updates involving non-replicated or dropped columns.
Now there are.

Back-patch to v10 where this code came in. Even though the failure
does not manifest before v12, IMO this code is too fragile to leave
as-is. In any case we certainly want the additional test coverage.

Patch by me; thanks to Tomas Vondra for initial investigation.

Discussion: https://postgr.es/m/16129-a0c0f48e71741e5f@postgresql.org

regards, tom lane

В списке pgsql-bugs по дате отправления:

Предыдущее

От: selva kumar
Дата: 23 января 2020 г., 10:34:40
Сообщение: Query will execute when inner query have issue

Следующее

От: Daniel Gustafsson
Дата: 23 января 2020 г., 12:55:53
Сообщение: Re: Query will execute when inner query have issue

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BUG #16226: background worker "logical replication worker" (PID) was terminated by signal 11: Segmentation

Предыдущее

Следующее