Data is copied twice when specifying both child and parent table in publication

Поиск
Список
Период
Сортировка
От houzj.fnst@fujitsu.com
Тема Data is copied twice when specifying both child and parent table in publication
Дата
Msg-id OS0PR01MB57167F45D481F78CDC5986F794B99@OS0PR01MB5716.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответы RE: Data is copied twice when specifying both child and parent table in publication  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Список pgsql-hackers
Hi,

In another logical replication related thread[1], my colleague Greg found that
if publish_via_partition_root is true, then the child table's data will be
copied twice when adding both child and parent table to the publication.

Example:

-----
Pub:
create table tbl1 (a int) partition by range (a);
create table tbl1_part1 partition of tbl1 for values from (1) to (10);
create table tbl1_part2 partition of tbl1 for values from (10) to (20);
create publication pub for table tbl1, tbl1_part1 with (publish_via_partition_root=on);

insert into tbl1_part1 values(1);

Sub:
create table tbl1 (a int) partition by range (a);
create table tbl1_part1 partition of tbl1 for values from (1) to (10);
create table tbl1_part2 partition of tbl1 for values from (10) to (20);
create subscription sub CONNECTION 'dbname=postgres port=10000' publication pub;

-- data is copied twice
select * from tbl1_part1;
 a
---
 1
 1
-----

The reason is that the subscriber will fetch the table list from publisher
using the following sql[2] and the subscriber will execute table
synchronization for each table in the query results in this case. But
tbl1_part1 is a partition of tbl1, so the data of tbl1_part1 was copied twice.

[2]
select * from pg_publication_tables;
 pubname | schemaname | tablename
---------+------------+------------
 pub     | public     | tbl1
 pub     | public     | tbl1_part1

IMO, it looks like a bug and it's more natural to only execute the table
synchronization for the parent table in the above case. Because as the document
said: if publish_via_partition_root is true, "changes in a partitioned table
(or on its partitions) contained in the publication will be published using the
identity and schema of the partitioned table rather than that of the individual
partitions that are actually changed;"

To fix it, I think we should fix function GetPublicationRelations which
generate data for the view pg_publication_tables and make it only show the
parent table if publish_via_partition_root is true. And for other future
feature like schema level publication, we can also follow this to exclude
partitions if their parent is specified by FOR TABLE in the same publication.

Attach a patch to fix it.
Thoughts ?

[1] https://www.postgresql.org/message-id/CAJcOf-eBhDUT2J5zs8Z0qEMiZUdhinX%2BbuGX3GN4V83fPnZV3Q%40mail.gmail.com

Best regards,
Hou zhijie


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: torikoshia
Дата:
Сообщение: Re: RFC: Logging plan of the running query
Следующее
От: Daniel Gustafsson
Дата:
Сообщение: Re: Unbounded %s in sscanf