Re: Handle infinite recursion in logical replication setup

Поиск
Список
Период
Сортировка
От Jonathan S. Katz
Тема Re: Handle infinite recursion in logical replication setup
Дата
Msg-id afb653ba-e2b1-33a3-a54c-849f4466e1b4@postgresql.org
обсуждение исходный текст
Ответ на Re: Handle infinite recursion in logical replication setup  (vignesh C <vignesh21@gmail.com>)
Ответы Re: Handle infinite recursion in logical replication setup  (Peter Smith <smithpb2250@gmail.com>)
Re: Handle infinite recursion in logical replication setup  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On 7/25/22 4:54 AM, vignesh C wrote:
> On Sun, Jul 24, 2022 at 10:21 PM Jonathan S. Katz <jkatz@postgresql.org> wrote:
>>
>> On 7/22/22 12:47 AM, Amit Kapila wrote:
>>> On Fri, Jul 22, 2022 at 1:39 AM Jonathan S. Katz <jkatz@postgresql.org> wrote:

>>> BTW, do you have any opinion on the idea of the first remaining patch
>>> where we accomplish two things: a) Checks and throws an error if
>>> 'copy_data = on' and 'origin = none' but the publication tables were
>>> also replicated from other publishers. b) Adds 'force' value for
>>> copy_data parameter to allow copying in such a case. The primary
>>> reason for this patch is to avoid loops or duplicate data in the
>>> initial phase. We can't skip copying based on origin as we can do
>>> while replicating changes from WAL. So, we detect that the publisher
>>> already has data from some other node and doesn't allow replication
>>> unless the user uses the 'force' option for copy_data.
>>
>> In general, I agree with the patch; but I'm not sure why we are calling
>> "copy_data = force" in this case and how it varies from "on". I
>> understand the goal is to prevent the infinite loop, but is there some
>> technical restriction why we can't set "origin = none, copy_data = on"
>> and have this work (and apologies if I missed that upthread)?
> 
> Let's take a simple case to understand why copy_data = force is
> required to replicate between two primaries for table t1 which has
> data as given below:

> step4 - Node-2:
> Create Subscription sub2 Connection '<node-1 details>' Publication
> pub1_2 with (origin = none, copy_data=on);
> If we had allowed the create subscription to be successful with
> copy_data = on. After this the data will be something like this:
> Node-1:
> 1, 2, 3, 4, 5, 6, 7, 8
> 
> Node-2:
> 1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 7, 8
> 
> So, you can see that data on Node-2 (5, 6, 7, 8) is duplicated. In
> case, table t1 has a unique key, it will lead to a unique key
> violation and replication won't proceed.
> 
> To avoid this we will throw an error:
> ERROR:  could not replicate table "public.t1"
> DETAIL:  CREATE/ALTER SUBSCRIPTION with origin = none and copy_data =
> on is not allowed when the publisher has subscribed same table.
> HINT:  Use CREATE/ALTER SUBSCRIPTION with copy_data = off/force.

Thanks for the example. I agree that it is fairly simple to reproduce.

I understand that "copy_data = force" is meant to protect a user from 
hurting themself. I'm not convinced that this is the best way to do so.

For example today I can subscribe to multiple publications that write to 
the same table. If I have a primary key on that table, and two of the 
subscriptions try to write an identical ID, we conflict. We don't have 
any special flags or modes to guard against that from happening, though 
we do have documentation on conflicts and managing them.

AFAICT the same issue with "copy_data" also exists in the above scenario 
too, even without the "origin" attribute. However, I think this case is 
more noticeable for "origin=none" because we currently default 
"copy_data" to "true" and in this case data can be copied in two 
directions.

That said, this introduces a new restriction for this particular 
scenario that doesn't exist on other scenarios. Instead, I would 
advocate we document how to correctly set up the two-way replication 
scenario (which we have a draft!), document the warnings around the 
conflicts, perhaps include Vignesh's instructions on how to remediate a 
conflict on initial sync, and consider throwing a WARNING as you suggested.

Thoughts?

Thanks,

Jonathan

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Introduce wait_for_subscription_sync for TAP tests
Следующее
От: Peter Smith
Дата:
Сообщение: Re: Handle infinite recursion in logical replication setup