Обсуждение: tablesync copy ignores publication actions

Поиск
Список
Период
Сортировка

tablesync copy ignores publication actions

От
Peter Smith
Дата:
The logical replication tablesync ignores the publication 'publish'
operations during the initial data copy.

This is current/known PG behaviour (e.g. as recently mentioned [1])
but it was not documented anywhere.

This patch just documents the existing behaviour and gives some examples.

------
[1] https://www.postgresql.org/message-id/CAA4eK1L_98LF7Db4yFY1PhKKRzoT83xtN41jTS5X%2B8OeGrAkLw%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Вложения

Re: tablesync copy ignores publication actions

От
"Euler Taveira"
Дата:
On Tue, Jun 7, 2022, at 1:10 AM, Peter Smith wrote:
The logical replication tablesync ignores the publication 'publish'
operations during the initial data copy.

This is current/known PG behaviour (e.g. as recently mentioned [1])
but it was not documented anywhere.
initial data synchronization != replication. publish parameter is a replication
property; it is not a initial data synchronization property. Maybe we should
make it clear like you are suggesting.

This patch just documents the existing behaviour and gives some examples.
Why did you add this information to that specific paragraph? IMO it belongs to
a separate paragraph; I would add it as the first paragraph in that subsection.

I suggest the following paragraph:

<para>
The initial data synchronization does not take into account the 
<literal>publish</literal> parameter to copy the existing data.
</para>

There is no point to link the Initial Snapshot subsection. That subsection is
explaining the initial copy steps and you want to inform about the effect of a
publication parameter on the initial copy. Although both are talking about the
same topic (initial copy), that link to Initial Snapshot subsection won't add
additional information about the publish parameter. You could expand the
suggested sentence to make it clear what publish parameter is or even add a
link to the CREATE PUBLICATION synopsis (that explains about publish
parameter).

You add an empty paragraph. Remove it.

I'm not sure it deserves an example. It is an easy-to-understand concept and a
good description is better than ~ 80 new lines.


--
Euler Taveira

Re: tablesync copy ignores publication actions

От
Amit Kapila
Дата:
On Tue, Jun 7, 2022 at 7:08 PM Euler Taveira <euler@eulerto.com> wrote:
>
> On Tue, Jun 7, 2022, at 1:10 AM, Peter Smith wrote:
>
> The logical replication tablesync ignores the publication 'publish'
> operations during the initial data copy.
>
> This is current/known PG behaviour (e.g. as recently mentioned [1])
> but it was not documented anywhere.
>
> initial data synchronization != replication. publish parameter is a replication
> property; it is not a initial data synchronization property. Maybe we should
> make it clear like you are suggesting.
>

+1 to document it. We respect some other properties of publication
like the publish_via_partition_root parameter, column lists, and row
filters. So it is better to explain about 'publish' parameter which we
ignore during the initial sync.

> This patch just documents the existing behaviour and gives some examples.
>
> Why did you add this information to that specific paragraph? IMO it belongs to
> a separate paragraph; I would add it as the first paragraph in that subsection.
>
> I suggest the following paragraph:
>
> <para>
> The initial data synchronization does not take into account the
> <literal>publish</literal> parameter to copy the existing data.
> </para>
>
> There is no point to link the Initial Snapshot subsection. That subsection is
> explaining the initial copy steps and you want to inform about the effect of a
> publication parameter on the initial copy. Although both are talking about the
> same topic (initial copy), that link to Initial Snapshot subsection won't add
> additional information about the publish parameter.
>

Here, we are explaining the behavior of row filters during initial
sync so adding a link to the Initial Snapshot section makes sense to
me.

> You could expand the
> suggested sentence to make it clear what publish parameter is or even add a
> link to the CREATE PUBLICATION synopsis (that explains about publish
> parameter).
>

+1. I suggest that we should add some text about the behavior of
initial sync in CREATE PUBLICATION doc (along with the 'publish'
parameter) or otherwise, we can explain it where we are talking about
publications [1].

> You add an empty paragraph. Remove it.
>
> I'm not sure it deserves an example. It is an easy-to-understand concept and a
> good description is better than ~ 80 new lines.
>

I don't think it is very clear that "initial data synchronization !=
replication" as mentioned by you nor does our docs does a good job in
explaining it otherwise the confusion wouldn't have arisen in the
email link shared by Peter. Personally, I think such things can be
better explained by example and in that regards the example shared by
Peter does half the job because it doesn't explain the replication
part. I don't think "Initial Snapshot" is the right place for these
examples considering we want to show the replication based on the
publish actions. We can extend it to show one example with row filters
as well. How about showing these examples in the Subscription section
[2]?

[1]: https://www.postgresql.org/docs/devel/logical-replication-publication.html
[2]: https://www.postgresql.org/docs/devel/logical-replication-subscription.html


-- 
With Regards,
Amit Kapila.



RE: tablesync copy ignores publication actions

От
"shiy.fnst@fujitsu.com"
Дата:
On Wed, Jun 8, 2022 12:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Tue, Jun 7, 2022 at 7:08 PM Euler Taveira <euler@eulerto.com> wrote:
> >
> > On Tue, Jun 7, 2022, at 1:10 AM, Peter Smith wrote:
> >
> > The logical replication tablesync ignores the publication 'publish'
> > operations during the initial data copy.
> >
> > This is current/known PG behaviour (e.g. as recently mentioned [1])
> > but it was not documented anywhere.
> >
> > initial data synchronization != replication. publish parameter is a replication
> > property; it is not a initial data synchronization property. Maybe we should
> > make it clear like you are suggesting.
> >
> 
> +1 to document it. We respect some other properties of publication
> like the publish_via_partition_root parameter, column lists, and row
> filters. So it is better to explain about 'publish' parameter which we
> ignore during the initial sync.
> 

I also agree to add it to the document.

> > You could expand the
> > suggested sentence to make it clear what publish parameter is or even add
> a
> > link to the CREATE PUBLICATION synopsis (that explains about publish
> > parameter).
> >
> 
> +1. I suggest that we should add some text about the behavior of
> initial sync in CREATE PUBLICATION doc (along with the 'publish'
> parameter) or otherwise, we can explain it where we are talking about
> publications [1].
> 

I noticed that CREATE SUBSCRIPTION doc mentions that row filter will affect
initial sync along with "copy_data" parameter.[1] Maybe we can add some text for
"publish" parameter there.

[1] https://www.postgresql.org/docs/devel/sql-createsubscription.html

Regards,
Shi yu

Re: tablesync copy ignores publication actions

От
Peter Smith
Дата:
PSA v2 of the patch, based on all feedback received.

~~~

Main differences from v1:

* Rewording and more explanatory text.

* The examples were moved to the "Subscription" [1] page and also
extended to show some normal replication and row filter examples, from
[Amit].

* Added some text to CREATE PUBLICATION 'publish' param [2], from [Euler][Amit].

* Added some text to CREATE SUBSCRIPTION Notes [3], from [Shi-san].

* Added some text to the "Publication page" [4] to say the 'publish'
is only for DML operations.

* I changed the note in "Row Filter - Initial Data Synchronization"
[5] to be a warning because I felt users could be surprised to see
data exposed by the initial copy, which a DML operation would not
expose.

------
[1] https://www.postgresql.org/docs/devel/logical-replication-subscription.html
[2] https://www.postgresql.org/docs/devel/sql-createpublication.html
[3] https://www.postgresql.org/docs/devel/sql-createsubscription.html
[4] https://www.postgresql.org/docs/devel/logical-replication-publication.html
[5]
https://www.postgresql.org/docs/devel/logical-replication-row-filter.html#LOGICAL-REPLICATION-ROW-FILTER-INITIAL-DATA-SYNC

[Euler] https://www.postgresql.org/message-id/bd49c14d-7a01-4ae3-b424-8c49630fec57%40www.fastmail.com
[Amit] https://www.postgresql.org/message-id/CAA4eK1Lb5QpWCQU8qkELnX6t8z7JeVtGantmKptxkkpxnYnpHA%40mail.gmail.com
[Shi-san]
https://www.postgresql.org/message-id/OSZPR01MB631026B8428422EAC1BFB8A4FDAA9%40OSZPR01MB6310.jpnprd01.prod.outlook.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Вложения

RE: tablesync copy ignores publication actions

От
"shiy.fnst@fujitsu.com"
Дата:
On Tue, Jun 14, 2022 3:36 PM Peter Smith <smithpb2250@gmail.com> wrote:
> 
> PSA v2 of the patch, based on all feedback received.
> 
> ~~~
> 
> Main differences from v1:
> 
> * Rewording and more explanatory text.
> 
> * The examples were moved to the "Subscription" [1] page and also
> extended to show some normal replication and row filter examples, from
> [Amit].
> 
> * Added some text to CREATE PUBLICATION 'publish' param [2], from
> [Euler][Amit].
> 
> * Added some text to CREATE SUBSCRIPTION Notes [3], from [Shi-san].
> 
> * Added some text to the "Publication page" [4] to say the 'publish'
> is only for DML operations.
> 
> * I changed the note in "Row Filter - Initial Data Synchronization"
> [5] to be a warning because I felt users could be surprised to see
> data exposed by the initial copy, which a DML operation would not
> expose.
> 

Thanks for updating the patch. Two comments:

1.
+     it means the copied table <literal>t3</literal> contains all rows even when
+     they do not patch the row filter of publication <literal>pub3b</literal>.

Typo. I think "they do not patch the row filter" should be "they do not match
the row filter", right?

2.
@@ -500,7 +704,6 @@
       </para>
      </listitem>
     </itemizedlist></para>
-
   </sect2>
 
   <sect2 id="logical-replication-row-filter-examples">

It seems we should remove this change.

Regards,
Shi yu

Re: tablesync copy ignores publication actions

От
Peter Smith
Дата:
On Wed, Jun 15, 2022 at 5:05 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
>
...
> Thanks for updating the patch. Two comments:
>
> 1.
> +     it means the copied table <literal>t3</literal> contains all rows even when
> +     they do not patch the row filter of publication <literal>pub3b</literal>.
>
> Typo. I think "they do not patch the row filter" should be "they do not match
> the row filter", right?
>
> 2.
> @@ -500,7 +704,6 @@
>        </para>
>       </listitem>
>      </itemizedlist></para>
> -
>    </sect2>
>
>    <sect2 id="logical-replication-row-filter-examples">
>
> It seems we should remove this change.
>

Thank you for your review comments. Those reported mistakes are fixed
in the attached patch v3.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Вложения

Re: tablesync copy ignores publication actions

От
Amit Kapila
Дата:
On Thu, Jun 16, 2022 at 6:07 AM Peter Smith <smithpb2250@gmail.com> wrote:

>
> Thank you for your review comments. Those reported mistakes are fixed
> in the attached patch v3.
>

This patch looks mostly good to me except for a few minor comments
which are mentioned below. It is not very clear in which branch(es) we
should commit this patch? As per my understanding, this is a
pre-existing behavior but we want to document it because (a) It was
not already documented, and (b) we followed it for row filters in
PG-15 it seems that should be explained. So, we have the following
options (a) commit it only for PG-15, (b) commit for PG-15 and
backpatch the relevant sections, or (c) commit it when branch opens
for PG-16. What do you or others think?

Few comments:
==============
1.
>
-   particular event types.  By default, all operation types are replicated.
-   (Row filters have no effect for <command>TRUNCATE</command>. See
-   <xref linkend="logical-replication-row-filter"/>).
+   particular event types. By default, all operation types are replicated.
+   These are DML operation limitations only; they do not affect the initial
+   data synchronization copy.
>

Using limitations in the above sentence can be misleading. Can we
change it to something like: "These publication specifications apply
only for DML operations; they do ... ".

2.
+     operations. The publication <literal>pub3b</literal> has a row filter.

In the Examples section, you have used row filter whereas that section
is later in the docs. So, it is better if you give reference to that
section in the above sentence (see Section ...).

3.
+         <para>
+          This parameter only affects DML operations. In particular, the
+          subscription initial data synchronization does not take
this parameter
+          into account when copying existing table data.
+         </para>

In the second sentence: "... subscription initial data synchronization
..." doesn't sound appropriate. Can we change it to something like:
"In particular, the initial data synchronization (see Section ..) in
logical replication does not take this parameter into account when
copying existing table data."?


-- 
With Regards,
Amit Kapila.



Re: tablesync copy ignores publication actions

От
Peter Smith
Дата:
On Wed, Jun 22, 2022 at 2:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jun 16, 2022 at 6:07 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> >
> > Thank you for your review comments. Those reported mistakes are fixed
> > in the attached patch v3.
> >
>
> This patch looks mostly good to me except for a few minor comments
> which are mentioned below. It is not very clear in which branch(es) we
> should commit this patch? As per my understanding, this is a
> pre-existing behavior but we want to document it because (a) It was
> not already documented, and (b) we followed it for row filters in
> PG-15 it seems that should be explained. So, we have the following
> options (a) commit it only for PG-15, (b) commit for PG-15 and
> backpatch the relevant sections, or (c) commit it when branch opens
> for PG-16. What do you or others think?

Even though this is a very old docs omission, AFAIK nobody ever raised
it as a problem before. It only became more important because of the
PG15 row-filters. So I think option (a) is ok.

>
> Few comments:
> ==============
> 1.
> >
> -   particular event types.  By default, all operation types are replicated.
> -   (Row filters have no effect for <command>TRUNCATE</command>. See
> -   <xref linkend="logical-replication-row-filter"/>).
> +   particular event types. By default, all operation types are replicated.
> +   These are DML operation limitations only; they do not affect the initial
> +   data synchronization copy.
> >
>
> Using limitations in the above sentence can be misleading. Can we
> change it to something like: "These publication specifications apply
> only for DML operations; they do ... ".
>

OK - modified as suggested.

> 2.
> +     operations. The publication <literal>pub3b</literal> has a row filter.
>
> In the Examples section, you have used row filter whereas that section
> is later in the docs. So, it is better if you give reference to that
> section in the above sentence (see Section ...).
>

OK - added xref as suggested.

> 3.
> +         <para>
> +          This parameter only affects DML operations. In particular, the
> +          subscription initial data synchronization does not take
> this parameter
> +          into account when copying existing table data.
> +         </para>
>
> In the second sentence: "... subscription initial data synchronization
> ..." doesn't sound appropriate. Can we change it to something like:
> "In particular, the initial data synchronization (see Section ..) in
> logical replication does not take this parameter into account when
> copying existing table data."?
>

OK - modified and added xref as suggested.

~~

PSA patch v4 to address all the above review comments.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Вложения

RE: tablesync copy ignores publication actions

От
"shiy.fnst@fujitsu.com"
Дата:
On Wed, Jun 22, 2022 4:49 PM Peter Smith <smithpb2250@gmail.com> wrote:
> 
> On Wed, Jun 22, 2022 at 2:18 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >
> > On Thu, Jun 16, 2022 at 6:07 AM Peter Smith <smithpb2250@gmail.com>
> wrote:
> >
> > >
> > > Thank you for your review comments. Those reported mistakes are fixed
> > > in the attached patch v3.
> > >
> >
> > This patch looks mostly good to me except for a few minor comments
> > which are mentioned below. It is not very clear in which branch(es) we
> > should commit this patch? As per my understanding, this is a
> > pre-existing behavior but we want to document it because (a) It was
> > not already documented, and (b) we followed it for row filters in
> > PG-15 it seems that should be explained. So, we have the following
> > options (a) commit it only for PG-15, (b) commit for PG-15 and
> > backpatch the relevant sections, or (c) commit it when branch opens
> > for PG-16. What do you or others think?
> 
> Even though this is a very old docs omission, AFAIK nobody ever raised
> it as a problem before. It only became more important because of the
> PG15 row-filters. So I think option (a) is ok.
> 

I also think option (a) is ok.

> 
> PSA patch v4 to address all the above review comments.
> 

Thanks for updating the patch. It looks good to me.

Besides, I tested the examples in the patch, and there's no problem.

Regards,
Shi yu

Re: tablesync copy ignores publication actions

От
Amit Kapila
Дата:
On Thu, Jun 23, 2022 at 8:43 AM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
>
> On Wed, Jun 22, 2022 4:49 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > >
> > > This patch looks mostly good to me except for a few minor comments
> > > which are mentioned below. It is not very clear in which branch(es) we
> > > should commit this patch? As per my understanding, this is a
> > > pre-existing behavior but we want to document it because (a) It was
> > > not already documented, and (b) we followed it for row filters in
> > > PG-15 it seems that should be explained. So, we have the following
> > > options (a) commit it only for PG-15, (b) commit for PG-15 and
> > > backpatch the relevant sections, or (c) commit it when branch opens
> > > for PG-16. What do you or others think?
> >
> > Even though this is a very old docs omission, AFAIK nobody ever raised
> > it as a problem before. It only became more important because of the
> > PG15 row-filters. So I think option (a) is ok.
> >
>
> I also think option (a) is ok.
>
> >
> > PSA patch v4 to address all the above review comments.
> >
>
> Thanks for updating the patch. It looks good to me.
>

The patch looks good to me as well. I will push this patch in HEAD (as
per option (a)) tomorrow unless I see any more suggestions/comments.

-- 
With Regards,
Amit Kapila.



Re: tablesync copy ignores publication actions

От
Robert Haas
Дата:
On Thu, Jun 23, 2022 at 2:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> The patch looks good to me as well. I will push this patch in HEAD (as
> per option (a)) tomorrow unless I see any more suggestions/comments.

The example seems to demonstrate the point quite well but one thing
that I notice is that it is quite long. I don't really see an obvious
way of making it shorter without making it less clear, so perhaps
that's fine.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: tablesync copy ignores publication actions

От
Amit Kapila
Дата:
On Fri, Jun 24, 2022 at 2:09 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Jun 23, 2022 at 2:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > The patch looks good to me as well. I will push this patch in HEAD (as
> > per option (a)) tomorrow unless I see any more suggestions/comments.
>
> The example seems to demonstrate the point quite well but one thing
> that I notice is that it is quite long. I don't really see an obvious
> way of making it shorter without making it less clear, so perhaps
> that's fine.
>

Thanks for looking into it. Pushed!

-- 
With Regards,
Amit Kapila.