Re: Synchronous commit behavior during network outage

Поиск
Список
Период
Сортировка
От Ondřej Žižka
Тема Re: Synchronous commit behavior during network outage
Дата
Msg-id 9adfee70-7b0a-6ff4-2c76-482e44cd5080@stratox.cz
обсуждение исходный текст
Ответ на Re: Synchronous commit behavior during network outage  (Andrey Borodin <x4mmm@yandex-team.ru>)
Ответы Re: Synchronous commit behavior during network outage
Список pgsql-hackers
Hello Andrey,

I went through the thread for your patch and seems to me as an 
acceptable solution...

 > The only case patch does not handle is sudden backend crash - 
Postgres will recover without a restart.

We also use a HA tool (Patroni). If the whole machine fails, it will 
find a new master and it should be OK. We use a 4 node setup (2 sync 
replicas and 1 async from every replica). If there is an issue just with 
sync replica (async operated normally) and the master fails completely 
in this situation, it will be solved by Patroni (the async replica 
become another sync), but if it is just the backend process, the master 
will not failover and changes will be still visible...

If the sync replica outage is temporal it will be solved itself when the 
node will establish a replication slot again... If the outage is "long", 
Patroni will remove the "old" sync replica from the cluster and the 
async replica reading from the master would be new sync. So yes... In 2 
node setup, this can be an issue, but in 4 node setup, this seems to me 
like a solution.
The only situation I can imagine is a situation when the client 
connections use a different network than the replication network and the 
replication network would be down completely, but the client network 
will be up. In that case, the master can be an "isolated island" and if 
it fails, we can lose the changed data.
Is this situation also covered in your model: "transaction effects 
should not be observable on primary until requirements of 
synchronous_commit are satisfied."

Do you agree with my thoughts?

Maybe would be possible to implement it into PostgreSQL with a note in 
documentation, that a multinode (>=3 nodes) cluster is necessary.

Regards
Ondrej

On 22/04/2021 05:55, Andrey Borodin wrote:

> Hi Ondrej!
>
>> 19 апр. 2021 г., в 22:19, Ondřej Žižka <ondrej.zizka@stratox.cz> написал(а):
>>
>> Do you think, that would be possible to implement a process that would solve this use case?
>> Thank you
>> Ondrej
>>
> Feel free to review patch fixing this at [0]. It's classified as "Server Features", but I'm sure it's a bug fix.
>
> Yandex.Cloud PG runs with this patch for more than half a year. Because we cannot afford loosing data in HA
clusters.
>
> It's somewhat incomplete solution, because PG restart or crash recovery will make waiting transactions visible. But
weprotect from this on HA tool's side.
 
>
> Best regards, Andrey Borodin.
>
> [0] https://commitfest.postgresql.org/33/2402/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: compute_query_id and pg_stat_statements
Следующее
От: Magnus Hagander
Дата:
Сообщение: Re: compute_query_id and pg_stat_statements