RE: Data is copied twice when specifying both child and parent table in publication

Поиск
Список
Период
Сортировка
От houzj.fnst@fujitsu.com
Тема RE: Data is copied twice when specifying both child and parent table in publication
Дата
Msg-id OS0PR01MB5716B73A9B3C87466CD08A1D94949@OS0PR01MB5716.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Data is copied twice when specifying both child and parent table in publication  (Greg Nancarrow <gregn4422@gmail.com>)
Ответы Re: Data is copied twice when specifying both child and parent table in publication  (Greg Nancarrow <gregn4422@gmail.com>)
Re: Data is copied twice when specifying both child and parent table in publication  (Amit Kapila <amit.kapila16@gmail.com>)
RE: Data is copied twice when specifying both child and parent table in publication  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Список pgsql-hackers
On Friday, November 5, 2021 11:20 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>On Thu, Nov 4, 2021 at 7:10 PM Amit Kapila <mailto:amit.kapila16@gmail.com> wrote:
>>
>> On Thu, Nov 4, 2021 at 12:23 PM Greg Nancarrow <mailto:gregn4422@gmail.com> wrote:
>> >
>> > On Thu, Nov 4, 2021 at 3:13 PM Amit Kapila <mailto:amit.kapila16@gmail.com> wrote:
>> > >
>> > > On further thinking about this, I think we should define the behavior
>> > > of replication among partitioned (on the publisher) and
>> > > non-partitioned (on the subscriber) tables a bit more clearly.
>> > >
>> > > - If the "publish_via_partition_root" is set for a publication then we
>> > > can always replicate to the table with the same name as the root table
>> > > in publisher.
>> > > - If the "publish_via_partition_root" is *not* set for a publication
>> > > then we can always replicate to the tables with the same name as the
>> > > non-root tables in publisher.
>> > >
>> > > Thoughts?
>> > >
>> >
>> > I'd adjust that wording slightly, because "we can always replicate to
>> > ..." sounds a bit vague, and saying that an option is set or not set
>> > could be misinterpreted, as the option could be "set" to false.
>> >
>> > How about:
>> >
>> > - If "publish_via_partition_root" is true for a publication, then data
>> > is replicated to the table with the same name as the root (i.e.
>> > partitioned) table in the publisher.
>> > - If "publish_via_partition_root" is false (the default) for a
>> > publication, then data is replicated to tables with the same name as
>> > the non-root (i.e. partition) tables in the publisher.
>> >
>>
>> Sounds good to me. If we follow this then I think the patch by Hou-San
>> is good to solve the first problem as described in his last email [1]?
>>
>> [1] -
https://www.postgresql.org/message-id/OS0PR01MB5716C756312959F293A822C794869%40OS0PR01MB5716.jpnprd01.prod.outlook.com
>>
>
>Almost.
>The patch does seem to solve that first problem (double publish on tablesync).
>I used the following test (taken from [2]), and variations of it:  
>
>However, there did still seem to be a problem, if publish_via_partition_root is then set to false; it seems that can
resultin 
 
>duplicate partition entries in the pg_publication_tables view, see below (this follows on from the test scenario given
above):
>
>postgres=# select * from pg_publication_tables;
> pubname | schemaname | tablename 
>---------+------------+-----------
> pub1    | sch1       | tbl1
> pub1    | sch3       | t1
>(2 rows)
>
>postgres=#  alter publication pub1 set (publish_via_partition_root=false);
>ALTER PUBLICATION
>postgres=# select * from pg_publication_tables;
> pubname | schemaname | tablename  
>---------+------------+------------
> pub1    | sch2       | tbl1_part1
> pub1    | sch2       | tbl1_part2
> pub1    | sch2       | tbl1_part1
> pub1    | sch3       | t1
>(4 rows)
>
>So I think the patch would need to be updated to prevent that.

Thanks for testing the patch.

The reason of the duplicate output is that:
The existing function GetPublicationRelations doesn't de-duplicate the output
oid list. So, when adding both child and parent table to the
publication(pubviaroot = false), the pg_publication_tables view will output
duplicate partition.

Attach the fix patch.
0001 fix data double publish(first issue in this thread)
0002 fix duplicate partition in view pg_publication_tables(reported by greg when testing the 0001 patch)

About the fix for second issue in this thread.
> "I think one possible idea to investigate is that on the 
> subscriber-side, after fetching tables, we check the already 
> subscribed tables and if the child tables already exist then we ignore 
> the parent table and vice versa."

When looking into how to fix the second issue, I have a question:

After changing publish_via_partition_root from false to true, the
subcriber will fetch the partitioned table from publisher when refreshing.

In subsriber side, If all the child tables of the partitioned table already
subscribed, then we can just skip the table sync for the partitioned table. But
if only some of the child tables(not all child tables) were already subscribed,
should we skip the partitioned table's table sync ? I am not sure about the
appropriate behavior here.

What do you think ?

Best regards,
Hou zj





Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: vignesh C
Дата:
Сообщение: Re: Printing backtrace of postgres processes
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: Parallel vacuum workers prevent the oldest xmin from advancing