Re: Synchronous commit behavior during network outage

Поиск
Список
Период
Сортировка
От Andrey Borodin
Тема Re: Synchronous commit behavior during network outage
Дата
Msg-id 4B0CD464-74FA-4030-B8CC-30881D97A799@yandex-team.ru
обсуждение исходный текст
Ответ на Re: Synchronous commit behavior during network outage  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Synchronous commit behavior during network outage
Список pgsql-hackers

> 2 июля 2021 г., в 10:59, Jeff Davis <pgsql@j-davis.com> написал(а):
>
> On Wed, 2021-06-30 at 17:28 +0500, Andrey Borodin wrote:
>>> My patch also covers the backend termination case. Is there a
>>> reason
>>> you left that case out?
>>
>> Yes, backend termination is used by HA tool before rewinding the
>> node.
>
> Can't you just disable sync rep first (using ALTER SYSTEM SET
> synchronous_standby_names=''), which will unstick the backend, and then
> terminate it?
If the failover happens due to unresponsive node we cannot just turn off sync rep. We need to have some spare
connectionsfor that (number of stuck backends will skyrocket during network partitioning). We need available
descriptorsand some memory to fork new backend. We will need to re-read config. We need time to try after all. 
At some failures we may lack some of these.

Partial degradation is already hard task. Without ability to easily terminate running Postgres HA tool will often
resortto SIGKILL. 

>
> If you don't handle the termination case, then there's still a chance
> for the transaction to become visible to other clients before its
> replicated.
Termination is admin command, they know what they are doing.
Cancelation is part of user protocol.

BTW can we have two GUCs? So that HA tool developers will decide on their own which guaranties they provide?

>
>> There is one more caveat we need to fix: we should prevent instant
>> recovery from happening.
>
> That can already be done with the restart_after_crash GUC.

Oh, I didn't know it, we will use it. Thanks!


Best regards, Andrey Borodin.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dilip Kumar
Дата:
Сообщение: Re: Logical replication - schema change not invalidating the relation cache
Следующее
От: Haotian Wu
Дата:
Сообщение: Re: Add option --drop-cascade for pg_dump/restore