Re: Handle infinite recursion in logical replication setup

Поиск
Список
Период
Сортировка
От Jonathan S. Katz
Тема Re: Handle infinite recursion in logical replication setup
Дата
Msg-id 546b47d2-8dc4-08ab-da3b-d98e8928a2b3@postgresql.org
обсуждение исходный текст
Ответ на Re: Handle infinite recursion in logical replication setup  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Handle infinite recursion in logical replication setup
Список pgsql-hackers
On 1/10/23 10:17 AM, Amit Kapila wrote:
> On Tue, Jan 10, 2023 at 8:13 AM Jonathan S. Katz <jkatz@postgresql.org> wrote:

>> This consistently created the deadlock in my testing.
>>
>> Discussing with Masahiko off-list, this is due to a deadlock from 4
>> processes: the walsenders on A and B, and the apply workers on A and B.
>> The walsenders are waiting for feedback from the apply workers, and the
>> apply workers are waiting for the walsenders to synchronize (I may be
>> oversimplifying).
>>
>> He suggested I try the above example instead with `synchronous_commit`
>> set to `local`. In this case, I verified that there is no more deadlock,
>> but he informed me that we would not be able to use cascading
>> synchronous replication when "origin=none".
>>
> 
> This has nothing to do with the origin feature. I mean this should
> happen with origin=any or even in PG15 without using origin at all.
> Am, I missing something? One related point to note is that in physical
> replication cascading replication is asynchronous. See docs [1]
> (Cascading replication is currently asynchronous....)

This is not directly related to the origin feature, but the origin 
feature makes it apparent. There is a common use-case where people want 
to replicate data synchronously between two tables, and this is enabled 
by the "origin" feature.

To be clear, my bigger concern is that it's fairly easy for users to 
create a deadlock situation based on how they set "synchronous_commit" 
with the origin feature -- this is the main reason why I brought it up. 
I'm less concerned about the "cascading" case, though I want to try out 
sync rep between 3 instances to see what happens.

>> If we decide that this is a documentation issue, I'd suggest we improve
>> the guidance around using `synchronous_commit`[1] on the CREATE
>> SUBSCRIPTION page, as the GUC page[2] warns against using `local`:
>>
> 
> Yeah, but on Create Subscription page, we have mentioned that it is
> safe to use off for logical replication.

While I think you can infer that it's "safe" for synchronous 
replication, I don't think it's clear.

We say it's "safe to use `off` for logical replication", but provide a 
lengthy explanation around synchronous logical replication that 
concludes that it's "advantageous to set synchronous_commit to local or 
higher" but does not address safety. The first line in the explanation 
of the parameter links to the `synchronous_commit` GUC which 
specifically advises against using "local" for synchronous replication.

Additionally, because we say "local" or higher, we increase the risk of 
the aforementioned in HEAD with the origin feature.

I know I'm hammering on this point a bit, but it feels like this is 
relatively easy to misconfigure with the upcoming "origin" change (I did 
so myself from reading the devel docs) and we should ensure we guide our 
users appropriately.

> One can use local or higher
> for reducing the latency for COMMIT when synchronous replication is
> used in the publisher. Won't using 'local' while creating subscription
> would suffice the need to consistently replicate the data? I mean it
> is equivalent to somebody using levels greater than local in case of
> physical replication. I think in the case of physical replication, we
> won't wait for standby to replicate to another node before sending a
> response, so why to wait in the case of logical replication? If this
> understanding is correct, then probably it is sufficient to support
> 'local' for a subscription.

I think this is a good explanation. We should incorporate some version 
of this into the docs, and add some clarity around the use of 
`synchronous_commit` option in `CREATE SUBSCRIPTION` in particular with 
the origin feature. I can make an attempt at this.

Perhaps another question: based on this, should we disallow setting 
values of `synchronous_commit` greater than `local` when using 
"origin=none"? That might be too strict, but maybe we should warn around 
the risk of deadlock either in the logs or in the docs.

Thanks,

Jonathan


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: fixing CREATEROLE
Следующее
От: Nathan Bossart
Дата:
Сообщение: Re: Avoiding "wrong tuple length" errors at the end of VACUUM on pg_database update (Backpatch of 947789f to v12 and v13)