RE: Data is copied twice when specifying both child and parent table in publication

Поиск
Список
Период
Сортировка
От houzj.fnst@fujitsu.com
Тема RE: Data is copied twice when specifying both child and parent table in publication
Дата
Msg-id OS0PR01MB57165AB0FE2E96B642238AFB94BD9@OS0PR01MB5716.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Data is copied twice when specifying both child and parent table in publication  (Amit Langote <amitlangote09@gmail.com>)
Ответы Re: Data is copied twice when specifying both child and parent table in publication  (Dilip Kumar <dilipbalaut@gmail.com>)
RE: Data is copied twice when specifying both child and parent table in publication  ("shiy.fnst@fujitsu.com" <shiy.fnst@fujitsu.com>)
Список pgsql-hackers
On Monday, October 18, 2021 5:03 PM Amit Langote <amitlangote09@gmail.com> wrote:
> I can imagine that the behavior seen here may look surprising, but not
> sure if I would call it a bug as such.  I do remember thinking about
> this case and the current behavior is how I may have coded it to be.
> 
> Looking at this command in Hou-san's email:
> 
>   create publication pub for table tbl1, tbl1_part1 with
> (publish_via_partition_root=on);
> 
> It's adding both the root partitioned table and the leaf partition
> *explicitly*, and it's not clear to me if the latter's inclusion in
> the publication should be assumed because the former is found to have
> been added to the publication, that is, as far as the latter's
> visibility to the subscriber is concerned.  It's not a stretch to
> imagine that a user may write the command this way to account for a
> subscriber node on which tbl1 and tbl1_part1 are unrelated tables.
> 
> I don't think we assume anything on the publisher side regarding the
> state/configuration of tables on the subscriber side, at least with
> publication commands where tables are added to a publication
> explicitly, so it is up to the user to make sure that the tables are
> not added duplicatively.  One may however argue that the way we've
> decided to handle FOR ALL TABLES does assume something about
> partitions where it skips advertising them to subscribers when
> publish_via_partition_root flag is set to true, but that is exactly to
> avoid the duplication of data that goes to a subscriber.

Hi,

Thanks for the explanation.

I think one reason that I consider this behavior a bug is that: If we add
both the root partitioned table and the leaf partition explicitly to the
publication (and set publish_via_partition_root = on), the behavior of the
apply worker is inconsistent with the behavior of table sync worker.

In this case, all changes in the leaf the partition will be applied using the
identity and schema of the partitioned(root) table. But for the table sync, it
will execute table sync for both the leaf and the root table which cause
duplication of data.

Wouldn't it be better to make the behavior consistent here ?

Best regards,
Hou zj



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "osumi.takamichi@fujitsu.com"
Дата:
Сообщение: RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Следующее
От: "osumi.takamichi@fujitsu.com"
Дата:
Сообщение: RE: Failed transaction statistics to measure the logical replication progress