Обсуждение: Replica Identity check of partition table on subscriber

Поиск
Список
Период
Сортировка

Replica Identity check of partition table on subscriber

От
"shiy.fnst@fujitsu.com"
Дата:
Hi hackers,

I saw a problem in logical replication, when the target table on subscriber is a
partitioned table, it only checks whether the Replica Identity of partitioned
table is consistent with the publisher, and doesn't check Replica Identity of
the partition.

For example:
-- publisher --
create table tbl (a int not null, b int);
create unique INDEX ON tbl (a);
alter table tbl replica identity using INDEX tbl_a_idx;
create publication pub for table tbl;

-- subscriber --
-- table tbl (parent table) has RI index, while table child has no RI index.
create table tbl (a int not null, b int) partition by range (a);
create table child partition of tbl default;
create unique INDEX ON tbl (a);
alter table tbl replica identity using INDEX tbl_a_idx;
create subscription sub connection 'port=5432 dbname=postgres' publication pub;

-- publisher --
insert into tbl values (11,11);
update tbl set a=a+1;

It caused an assertion failure on subscriber:
TRAP: FailedAssertion("OidIsValid(idxoid) || (remoterel->replident == REPLICA_IDENTITY_FULL)", File: "worker.c", Line:
2088,PID: 1616523) 

The backtrace is attached.

We got the assertion failure because idxoid is invalid, as table child has no
Replica Identity or Primary Key. We have a check in check_relation_updatable(),
but what it checked is table tbl (the parent table) and it passed the check.

I think one approach to fix it is to check the target partition in this case,
instead of the partitioned table.

When trying to fix it, I saw some other problems about updating partition map
cache.

a) In logicalrep_partmap_invalidate_cb(), the type of the entry in
LogicalRepPartMap should be LogicalRepPartMapEntry, instead of
LogicalRepRelMapEntry.

b) In logicalrep_partition_open(), it didn't check if the entry is valid.

c) When the publisher send new relation mapping, only relation map cache will be
updated, and partition map cache wouldn't. I think it also should be updated
because it has remote relation information, too.

Attach two patches which tried to fix them.
0001 patch: fix the above three problems about partition map cache.
0002 patch: fix the assertion failure, check the Replica Identity of the
partition if the target table is a partitioned table.

Thanks to Hou Zhijie for helping me in the first patch.

I will add a test for it later if no one doesn't like this fix.

Regards,
Shi yu

Вложения

Re: Replica Identity check of partition table on subscriber

От
Amit Kapila
Дата:
On Wed, Jun 8, 2022 at 2:17 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
>
> I saw a problem in logical replication, when the target table on subscriber is a
> partitioned table, it only checks whether the Replica Identity of partitioned
> table is consistent with the publisher, and doesn't check Replica Identity of
> the partition.
>
...
>
> It caused an assertion failure on subscriber:
> TRAP: FailedAssertion("OidIsValid(idxoid) || (remoterel->replident == REPLICA_IDENTITY_FULL)", File: "worker.c",
Line:2088, PID: 1616523)
 
>
> The backtrace is attached.
>
> We got the assertion failure because idxoid is invalid, as table child has no
> Replica Identity or Primary Key. We have a check in check_relation_updatable(),
> but what it checked is table tbl (the parent table) and it passed the check.
>

I can reproduce the problem. This seems to be the problem since commit
f1ac27bf (Add logical replication support to replicate into
partitioned tables), so adding Amit L. and Peter E.

> I think one approach to fix it is to check the target partition in this case,
> instead of the partitioned table.
>

This approach sounds reasonable to me. One minor point:
+/*
+ * Check that replica identity matches.
+ *
+ * We allow for stricter replica identity (fewer columns) on subscriber as
+ * that will not stop us from finding unique tuple. IE, if publisher has
+ * identity (id,timestamp) and subscriber just (id) this will not be a
+ * problem, but in the opposite scenario it will.
+ *
+ * Don't throw any error here just mark the relation entry as not updatable,
+ * as replica identity is only for updates and deletes but inserts can be
+ * replicated even without it.
+ */
+static void
+logicalrep_check_updatable(LogicalRepRelMapEntry *entry)

Can we name this function as logicalrep_rel_mark_updatable as we are
doing that? If so, change the comments as well.

> When trying to fix it, I saw some other problems about updating partition map
> cache.
>
> a) In logicalrep_partmap_invalidate_cb(), the type of the entry in
> LogicalRepPartMap should be LogicalRepPartMapEntry, instead of
> LogicalRepRelMapEntry.
>
> b) In logicalrep_partition_open(), it didn't check if the entry is valid.
>
> c) When the publisher send new relation mapping, only relation map cache will be
> updated, and partition map cache wouldn't. I think it also should be updated
> because it has remote relation information, too.
>

Is there any test case that can show the problem due to your above observations?

-- 
With Regards,
Amit Kapila.



Re: Replica Identity check of partition table on subscriber

От
Amit Langote
Дата:
Hi Amit,

On Thu, Jun 9, 2022 at 8:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Jun 8, 2022 at 2:17 PM shiy.fnst@fujitsu.com
> <shiy.fnst@fujitsu.com> wrote:
> > I saw a problem in logical replication, when the target table on subscriber is a
> > partitioned table, it only checks whether the Replica Identity of partitioned
> > table is consistent with the publisher, and doesn't check Replica Identity of
> > the partition.
> ...
> >
> > It caused an assertion failure on subscriber:
> > TRAP: FailedAssertion("OidIsValid(idxoid) || (remoterel->replident == REPLICA_IDENTITY_FULL)", File: "worker.c",
Line:2088, PID: 1616523)
 
> >
> > The backtrace is attached.
> >
> > We got the assertion failure because idxoid is invalid, as table child has no
> > Replica Identity or Primary Key. We have a check in check_relation_updatable(),
> > but what it checked is table tbl (the parent table) and it passed the check.
> >
>
> I can reproduce the problem. This seems to be the problem since commit
> f1ac27bf (Add logical replication support to replicate into
> partitioned tables), so adding Amit L. and Peter E.

Thanks, I can see the problem.

I have looked at other mentioned problems with the code too and agree
they look like bugs.

Both patches look to be on the right track to fix the issues, but will
look more closely to see if I've anything to add.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com



RE: Replica Identity check of partition table on subscriber

От
"shiy.fnst@fujitsu.com"
Дата:
On Thu, June 9, 2022 7:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> > I think one approach to fix it is to check the target partition in this case,
> > instead of the partitioned table.
> >
> 
> This approach sounds reasonable to me. One minor point:
> +/*
> + * Check that replica identity matches.
> + *
> + * We allow for stricter replica identity (fewer columns) on subscriber as
> + * that will not stop us from finding unique tuple. IE, if publisher has
> + * identity (id,timestamp) and subscriber just (id) this will not be a
> + * problem, but in the opposite scenario it will.
> + *
> + * Don't throw any error here just mark the relation entry as not updatable,
> + * as replica identity is only for updates and deletes but inserts can be
> + * replicated even without it.
> + */
> +static void
> +logicalrep_check_updatable(LogicalRepRelMapEntry *entry)
> 
> Can we name this function as logicalrep_rel_mark_updatable as we are
> doing that? If so, change the comments as well.
> 

OK. Modified.

> > When trying to fix it, I saw some other problems about updating partition
> map
> > cache.
> >
> > a) In logicalrep_partmap_invalidate_cb(), the type of the entry in
> > LogicalRepPartMap should be LogicalRepPartMapEntry, instead of
> > LogicalRepRelMapEntry.
> >
> > b) In logicalrep_partition_open(), it didn't check if the entry is valid.
> >
> > c) When the publisher send new relation mapping, only relation map cache
> will be
> > updated, and partition map cache wouldn't. I think it also should be updated
> > because it has remote relation information, too.
> >
> 
> Is there any test case that can show the problem due to your above
> observations?
> 

Please see the following case.

-- publisher
create table tbl (a int primary key, b int);
create publication pub for table tbl;

-- subscriber
create table tbl (a int primary key, b int, c int) partition by range (a);
create table child partition of tbl default;

-- publisher, make cache
insert into tbl values (1,1);
update tbl set a=a+1;
alter table tbl add column c int;
update tbl set c=1 where a=2;

-- subscriber
postgres=# select * from tbl;
 a | b | c
---+---+---
 2 | 1 |
(1 row)

The value of column c updated failed on subscriber.
And after applying the first patch, it would work fine.

I have added this case to the first patch. Also add a test case for the second
patch.

Attach the new patches.

Regards,
Shi yu

Вложения

Re: Replica Identity check of partition table on subscriber

От
Amit Langote
Дата:
Hello,

On Wed, Jun 8, 2022 at 5:47 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
> Hi hackers,
>
> I saw a problem in logical replication, when the target table on subscriber is a
> partitioned table, it only checks whether the Replica Identity of partitioned
> table is consistent with the publisher, and doesn't check Replica Identity of
> the partition.
>
> For example:
> -- publisher --
> create table tbl (a int not null, b int);
> create unique INDEX ON tbl (a);
> alter table tbl replica identity using INDEX tbl_a_idx;
> create publication pub for table tbl;
>
> -- subscriber --
> -- table tbl (parent table) has RI index, while table child has no RI index.
> create table tbl (a int not null, b int) partition by range (a);
> create table child partition of tbl default;
> create unique INDEX ON tbl (a);
> alter table tbl replica identity using INDEX tbl_a_idx;
> create subscription sub connection 'port=5432 dbname=postgres' publication pub;
>
> -- publisher --
> insert into tbl values (11,11);
> update tbl set a=a+1;
>
> It caused an assertion failure on subscriber:
> TRAP: FailedAssertion("OidIsValid(idxoid) || (remoterel->replident == REPLICA_IDENTITY_FULL)", File: "worker.c",
Line:2088, PID: 1616523)
 
>
> The backtrace is attached.
>
> We got the assertion failure because idxoid is invalid, as table child has no
> Replica Identity or Primary Key. We have a check in check_relation_updatable(),
> but what it checked is table tbl (the parent table) and it passed the check.
>
> I think one approach to fix it is to check the target partition in this case,
> instead of the partitioned table.

That makes sense.  A normal user update of a partitioned table will
only perform CheckCmdReplicaIdentity() for leaf partitions and the
logical replication updates should do the same.  I may have been
confused at the time to think that ALTER TABLE REPLICA IDENTITY makes
sure that the replica identities of all relations in a partition tree
are forced to be the same at all times, though it seems that the patch
to do so [1] didn't actually get in.  I guess adding a test case would
have helped.

> When trying to fix it, I saw some other problems about updating partition map
> cache.
>
> a) In logicalrep_partmap_invalidate_cb(), the type of the entry in
> LogicalRepPartMap should be LogicalRepPartMapEntry, instead of
> LogicalRepRelMapEntry.

Indeed.

> b) In logicalrep_partition_open(), it didn't check if the entry is valid.

Yeah, that's bad.  Actually, it seems that localrelvalid stuff for
LogicalRepRelMapEntry came in 3d65b0593c5, but it's likely we missed
in that commit that LogicalRepPartMapEntrys contain copies of, not
pointers to, the relevant parent's entry.  This patch fixes that
oversight.

> c) When the publisher send new relation mapping, only relation map cache will be
> updated, and partition map cache wouldn't. I think it also should be updated
> because it has remote relation information, too.

Yes, again a result of forgetting that the partition map entries have
copies of relation map entries.

+logicalrep_partmap_invalidate

I wonder why not call this logicalrep_partmap_update() to go with
logicalrep_relmap_update()?  It seems confusing to have
logicalrep_partmap_invalidate() right next to
logicalrep_partmap_invalidate_cb().

+/*
+ * Invalidate the existing entry in the partition map.

I think logicalrep_partmap_invalidate() may update *multiple* entries,
because the hash table scan may find multiple PartMapEntrys containing
a copy of the RelMapEntry with given remoteid, that is, for multiple
partitions of a given local parent table mapped to that remote
relation.  So, please fix the comment as:

Invalidate/Update the entries in the partition map that refer to 'remoterel'

Likewise:

+   /* Invalidate the corresponding partition map as well */

Maybe, this should say:

Also update all entries in the partition map that refer to 'remoterel'.

In 0002:

+logicalrep_check_updatable

+1 to Amit's suggestion to use "mark" instead of "check".

@@ -1735,6 +1735,13 @@ apply_handle_insert_internal(ApplyExecutionData *edata,
 static void
 check_relation_updatable(LogicalRepRelMapEntry *rel)
 {
+   /*
+    * If it is a partitioned table, we don't check it, we will check its
+    * partition later.
+    */
+   if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+       return;

Why do this?  I mean why if logicalrep_check_updatable() doesn't care
if the relation is partitioned or not -- it does all the work
regardless.

I suggest we don't add this check in check_relation_updatable().

+   /* Check if we can do the update or delete. */

Maybe mention "leaf partition", as:

Check if we can do the update or delete on the leaf partition.

BTW, the following hunk in patch 0002 should really be a part of 0001.

@@ -584,7 +594,6 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
    Oid         partOid = RelationGetRelid(partrel);
    AttrMap    *attrmap = root->attrmap;
    bool        found;
-   int         i;
    MemoryContext oldctx;

    if (LogicalRepPartMap == NULL)

> Thanks to Hou Zhijie for helping me in the first patch.

Thank you both for the fixes.

> I will add a test for it later

That would be very welcome.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/message-id/flat/201902041630.gpadougzab7v%40alvherre.pgsql



Re: Replica Identity check of partition table on subscriber

От
Amit Kapila
Дата:
On Fri, Jun 10, 2022 at 2:26 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> +logicalrep_partmap_invalidate
>
> I wonder why not call this logicalrep_partmap_update() to go with
> logicalrep_relmap_update()?  It seems confusing to have
> logicalrep_partmap_invalidate() right next to
> logicalrep_partmap_invalidate_cb().
>

I am thinking about why we need to update the relmap in this new
function logicalrep_partmap_invalidate()? I think it may be better to
do it in logicalrep_partition_open() when actually required,
otherwise, we end up doing a lot of work that may not be of use unless
the corresponding partition is accessed. Also, it seems awkward to me
that we do the same thing in this new function
logicalrep_partmap_invalidate() and then also in
logicalrep_partition_open() under different conditions.

One more point about the 0001, it doesn't seem to have a test that
validates logicalrep_partmap_invalidate_cb() functionality. I think
for that we need to Alter the local table (table on the subscriber
side). Can we try to write a test for it?

-- 
With Regards,
Amit Kapila.



RE: Replica Identity check of partition table on subscriber

От
"houzj.fnst@fujitsu.com"
Дата:
On Saturday, June 11, 2022 9:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Fri, Jun 10, 2022 at 2:26 PM Amit Langote <amitlangote09@gmail.com>
> wrote:
> >
> > +logicalrep_partmap_invalidate
> >
> > I wonder why not call this logicalrep_partmap_update() to go with
> > logicalrep_relmap_update()?  It seems confusing to have
> > logicalrep_partmap_invalidate() right next to
> > logicalrep_partmap_invalidate_cb().
> >
> 
> I am thinking about why we need to update the relmap in this new function
> logicalrep_partmap_invalidate()? I think it may be better to do it in
> logicalrep_partition_open() when actually required, otherwise, we end up doing
> a lot of work that may not be of use unless the corresponding partition is
> accessed. Also, it seems awkward to me that we do the same thing in this new
> function
> logicalrep_partmap_invalidate() and then also in
> logicalrep_partition_open() under different conditions.
> 
> One more point about the 0001, it doesn't seem to have a test that validates
> logicalrep_partmap_invalidate_cb() functionality. I think for that we need to Alter
> the local table (table on the subscriber side). Can we try to write a test for it?


Thanks for Amit. L and Amit. K for your comments ! I agree with this point.
Here is the version patch set which try to address all these comments.

In addition, when reviewing the code, I found some other related
problems in the code.

1)
        entry->attrmap = make_attrmap(map->maplen);
        for (attno = 0; attno < entry->attrmap->maplen; attno++)
        {
            AttrNumber    root_attno = map->attnums[attno];

            entry->attrmap->attnums[attno] = attrmap->attnums[root_attno - 1];
        }

In this case, It’s possible that 'attno' points to a dropped column in which
case the root_attno would be '0'. I think in this case we should just set the
entry->attrmap->attnums[attno] to '-1' instead of accessing the
attrmap->attnums[]. I included this change in 0001 because the testcase which
can reproduce these problems are related(we need to ALTER the partition on
subscriber to reproduce it).

2)
    if (entry->attrmap)
        pfree(entry->attrmap);

I think we should use free_attrmap instead of pfree here to avoid memory leak.
And we also need to check the attrmap in logicalrep_rel_open() and
logicalrep_partition_open() and free it if needed. I am not sure shall we put this
in the 0001 patch, so attach a separate patch for this. We can merge later this if needed.

Best regards,
Hou zj

Вложения

Re: Replica Identity check of partition table on subscriber

От
Amit Kapila
Дата:
On Sat, Jun 11, 2022 at 2:36 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> On Saturday, June 11, 2022 9:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Jun 10, 2022 at 2:26 PM Amit Langote <amitlangote09@gmail.com>
> > wrote:
> > >
> > > +logicalrep_partmap_invalidate
> > >
> > > I wonder why not call this logicalrep_partmap_update() to go with
> > > logicalrep_relmap_update()?  It seems confusing to have
> > > logicalrep_partmap_invalidate() right next to
> > > logicalrep_partmap_invalidate_cb().
> > >
> >
> > I am thinking about why we need to update the relmap in this new function
> > logicalrep_partmap_invalidate()? I think it may be better to do it in
> > logicalrep_partition_open() when actually required, otherwise, we end up doing
> > a lot of work that may not be of use unless the corresponding partition is
> > accessed. Also, it seems awkward to me that we do the same thing in this new
> > function
> > logicalrep_partmap_invalidate() and then also in
> > logicalrep_partition_open() under different conditions.
> >
> > One more point about the 0001, it doesn't seem to have a test that validates
> > logicalrep_partmap_invalidate_cb() functionality. I think for that we need to Alter
> > the local table (table on the subscriber side). Can we try to write a test for it?
>
>
> Thanks for Amit. L and Amit. K for your comments ! I agree with this point.
> Here is the version patch set which try to address all these comments.
>
> In addition, when reviewing the code, I found some other related
> problems in the code.
>
> 1)
>                 entry->attrmap = make_attrmap(map->maplen);
>                 for (attno = 0; attno < entry->attrmap->maplen; attno++)
>                 {
>                         AttrNumber      root_attno = map->attnums[attno];
>
>                         entry->attrmap->attnums[attno] = attrmap->attnums[root_attno - 1];
>                 }
>
> In this case, It’s possible that 'attno' points to a dropped column in which
> case the root_attno would be '0'. I think in this case we should just set the
> entry->attrmap->attnums[attno] to '-1' instead of accessing the
> attrmap->attnums[]. I included this change in 0001 because the testcase which
> can reproduce these problems are related(we need to ALTER the partition on
> subscriber to reproduce it).
>

Hmm, this appears to be a different issue. Can we separate out the
bug-fix code for the subscriber-side issue caused by the DDL on the
subscriber?

Few other comments:
+ * Note that we don't update the remoterel information in the entry here,
+ * we will update the information in logicalrep_partition_open to save
+ * unnecessary work.
+ */
+void
+logicalrep_partmap_invalidate(LogicalRepRelation *remoterel)

/to save/to avoid

Also, I agree with Amit L. that it is confusing to have
logicalrep_partmap_invalidate() right next to
logicalrep_partmap_invalidate_cb() and both have somewhat different
kinds of logic. So, we can either name it as
logicalrep_partmap_reset_relmap() or logicalrep_partmap_update()
unless you have any other better suggestions? Accordingly, change the
comment atop this function.

--
With Regards,
Amit Kapila.



RE: Replica Identity check of partition table on subscriber

От
"houzj.fnst@fujitsu.com"
Дата:
On Monday, June 13, 2022 1:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Sat, Jun 11, 2022 at 2:36 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Saturday, June 11, 2022 9:36 AM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > >
> > > On Fri, Jun 10, 2022 at 2:26 PM Amit Langote
> > > <amitlangote09@gmail.com>
> > > wrote:
> > > >
> > > > +logicalrep_partmap_invalidate
> > > >
> > > > I wonder why not call this logicalrep_partmap_update() to go with
> > > > logicalrep_relmap_update()?  It seems confusing to have
> > > > logicalrep_partmap_invalidate() right next to
> > > > logicalrep_partmap_invalidate_cb().
> > > >
> > >
> > > I am thinking about why we need to update the relmap in this new
> > > function logicalrep_partmap_invalidate()? I think it may be better
> > > to do it in
> > > logicalrep_partition_open() when actually required, otherwise, we
> > > end up doing a lot of work that may not be of use unless the
> > > corresponding partition is accessed. Also, it seems awkward to me
> > > that we do the same thing in this new function
> > > logicalrep_partmap_invalidate() and then also in
> > > logicalrep_partition_open() under different conditions.
> > >
> > > One more point about the 0001, it doesn't seem to have a test that
> > > validates
> > > logicalrep_partmap_invalidate_cb() functionality. I think for that
> > > we need to Alter the local table (table on the subscriber side). Can we try to
> write a test for it?
> >
> >
> > Thanks for Amit. L and Amit. K for your comments ! I agree with this point.
> > Here is the version patch set which try to address all these comments.
> >
> > In addition, when reviewing the code, I found some other related
> > problems in the code.
> >
> > 1)
> >                 entry->attrmap = make_attrmap(map->maplen);
> >                 for (attno = 0; attno < entry->attrmap->maplen; attno++)
> >                 {
> >                         AttrNumber      root_attno =
> map->attnums[attno];
> >
> >                         entry->attrmap->attnums[attno] =
> attrmap->attnums[root_attno - 1];
> >                 }
> >
> > In this case, It’s possible that 'attno' points to a dropped column in
> > which case the root_attno would be '0'. I think in this case we should
> > just set the
> > entry->attrmap->attnums[attno] to '-1' instead of accessing the
> > attrmap->attnums[]. I included this change in 0001 because the
> > attrmap->testcase which
> > can reproduce these problems are related(we need to ALTER the
> > partition on subscriber to reproduce it).
> >
> 
> Hmm, this appears to be a different issue. Can we separate out the bug-fix
> code for the subscriber-side issue caused by the DDL on the subscriber?
> 
> Few other comments:
> + * Note that we don't update the remoterel information in the entry
> +here,
> + * we will update the information in logicalrep_partition_open to save
> + * unnecessary work.
> + */
> +void
> +logicalrep_partmap_invalidate(LogicalRepRelation *remoterel)
> 
> /to save/to avoid
> 
> Also, I agree with Amit L. that it is confusing to have
> logicalrep_partmap_invalidate() right next to
> logicalrep_partmap_invalidate_cb() and both have somewhat different kinds of
> logic. So, we can either name it as
> logicalrep_partmap_reset_relmap() or logicalrep_partmap_update() unless you
> have any other better suggestions? Accordingly, change the comment atop
> this function.

Thanks for the comments.

I have separated out the bug-fix for the subscriber-side.
And fix the typo and function name.
Attach the new version patch set.

Best regards,
Hou zj

Вложения

Re: Replica Identity check of partition table on subscriber

От
Amit Langote
Дата:
On Sat, Jun 11, 2022 at 10:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Jun 10, 2022 at 2:26 PM Amit Langote <amitlangote09@gmail.com> wrote:
> >
> > +logicalrep_partmap_invalidate
> >
> > I wonder why not call this logicalrep_partmap_update() to go with
> > logicalrep_relmap_update()?  It seems confusing to have
> > logicalrep_partmap_invalidate() right next to
> > logicalrep_partmap_invalidate_cb().
> >
>
> I am thinking about why we need to update the relmap in this new
> function logicalrep_partmap_invalidate()? I think it may be better to
> do it in logicalrep_partition_open() when actually required,
> otherwise, we end up doing a lot of work that may not be of use unless
> the corresponding partition is accessed. Also, it seems awkward to me
> that we do the same thing in this new function
> logicalrep_partmap_invalidate() and then also in
> logicalrep_partition_open() under different conditions.

Both logicalrep_rel_open() and logicalrel_partition_open() only ever
touch the local Relation, never the LogicalRepRelation.  Updating the
latter is the responsibility of logicalrep_relmap_update(), which is
there to support handling of the RELATION message by
apply_handle_relation().  Given that we make a separate copy of the
parent's LogicalRepRelMapEntry for each partition to put into the
corresponding LogicalRepPartMapEntry, those copies must be updated as
well when a RELATION message targeting the parent's entry arrives.  So
it seems fine that the patch is making it the
logicalrep_relmap_update()'s responsibility to update the partition
copies using the new logicalrep_partition_invalidate/update()
subroutine.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com



Re: Replica Identity check of partition table on subscriber

От
Amit Kapila
Дата:
On Mon, Jun 13, 2022 at 2:20 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Sat, Jun 11, 2022 at 10:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Fri, Jun 10, 2022 at 2:26 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > >
> > > +logicalrep_partmap_invalidate
> > >
> > > I wonder why not call this logicalrep_partmap_update() to go with
> > > logicalrep_relmap_update()?  It seems confusing to have
> > > logicalrep_partmap_invalidate() right next to
> > > logicalrep_partmap_invalidate_cb().
> > >
> >
> > I am thinking about why we need to update the relmap in this new
> > function logicalrep_partmap_invalidate()? I think it may be better to
> > do it in logicalrep_partition_open() when actually required,
> > otherwise, we end up doing a lot of work that may not be of use unless
> > the corresponding partition is accessed. Also, it seems awkward to me
> > that we do the same thing in this new function
> > logicalrep_partmap_invalidate() and then also in
> > logicalrep_partition_open() under different conditions.
>
> Both logicalrep_rel_open() and logicalrel_partition_open() only ever
> touch the local Relation, never the LogicalRepRelation.
>

We do make the copy of remote rel in  logicalrel_partition_open() when
the entry is not found. I feel the same should happen when remote
relation is reset/invalidated by the RELATION message.

>  Updating the
> latter is the responsibility of logicalrep_relmap_update(), which is
> there to support handling of the RELATION message by
> apply_handle_relation().  Given that we make a separate copy of the
> parent's LogicalRepRelMapEntry for each partition to put into the
> corresponding LogicalRepPartMapEntry, those copies must be updated as
> well when a RELATION message targeting the parent's entry arrives.  So
> it seems fine that the patch is making it the
> logicalrep_relmap_update()'s responsibility to update the partition
> copies using the new logicalrep_partition_invalidate/update()
> subroutine.
>

I think we can do that way as well but do you see any benefit in it?
The way I am suggesting will avoid the effort of updating the remote
rel copy till we try to access that particular partition.

-- 
With Regards,
Amit Kapila.



Re: Replica Identity check of partition table on subscriber

От
Amit Kapila
Дата:
On Mon, Jun 13, 2022 at 1:03 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> On Monday, June 13, 2022 1:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> I have separated out the bug-fix for the subscriber-side.
> And fix the typo and function name.
> Attach the new version patch set.
>

The first patch looks good to me. I have slightly modified one of the
comments and the commit message. I think we need to backpatch this
through 13 where we introduced support to replicate into partitioned
tables (commit f1ac27bf). If you guys are fine, I'll push this once
the work for PG14.4 is done.

-- 
With Regards,
Amit Kapila.

Вложения

Re: Replica Identity check of partition table on subscriber

От
Amit Langote
Дата:
On Mon, Jun 13, 2022 at 9:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Jun 13, 2022 at 1:03 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> > On Monday, June 13, 2022 1:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > I have separated out the bug-fix for the subscriber-side.
> > And fix the typo and function name.
> > Attach the new version patch set.
> >
>
> The first patch looks good to me. I have slightly modified one of the
> comments and the commit message. I think we need to backpatch this
> through 13 where we introduced support to replicate into partitioned
> tables (commit f1ac27bf). If you guys are fine, I'll push this once
> the work for PG14.4 is done.

Both the code changes and test cases look good to me.  Just a couple
of minor nitpicks with test changes:

+   CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
+   ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;
+   ALTER TABLE tab5_1 REPLICA IDENTITY USING INDEX tab5_1_a_idx;});

Not sure if we follow it elsewhere, but should we maybe avoid using
the internally generated index name as in the partition's case above?

+# Test the case that target table on subscriber is a partitioned table and
+# check that the changes are replicated correctly after changing the schema of
+# table on subcriber.

The first sentence makes it sound like the tests that follow are the
first ones in the file where the target table is partitioned, which is
not true, so I think we should drop that part.  Also how about being
more specific about the test intent, say:

Test that replication continues to work correctly after altering the
partition of a partitioned target table.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com



Re: Replica Identity check of partition table on subscriber

От
Amit Langote
Дата:
On Mon, Jun 13, 2022 at 6:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Jun 13, 2022 at 2:20 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Sat, Jun 11, 2022 at 10:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > On Fri, Jun 10, 2022 at 2:26 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > >
> > > > +logicalrep_partmap_invalidate
> > > >
> > > > I wonder why not call this logicalrep_partmap_update() to go with
> > > > logicalrep_relmap_update()?  It seems confusing to have
> > > > logicalrep_partmap_invalidate() right next to
> > > > logicalrep_partmap_invalidate_cb().
> > > >
> > >
> > > I am thinking about why we need to update the relmap in this new
> > > function logicalrep_partmap_invalidate()? I think it may be better to
> > > do it in logicalrep_partition_open() when actually required,
> > > otherwise, we end up doing a lot of work that may not be of use unless
> > > the corresponding partition is accessed. Also, it seems awkward to me
> > > that we do the same thing in this new function
> > > logicalrep_partmap_invalidate() and then also in
> > > logicalrep_partition_open() under different conditions.
> >
> > Both logicalrep_rel_open() and logicalrel_partition_open() only ever
> > touch the local Relation, never the LogicalRepRelation.
>
> We do make the copy of remote rel in  logicalrel_partition_open() when
> the entry is not found. I feel the same should happen when remote
> relation is reset/invalidated by the RELATION message.

Hmm, the problem is that a RELATION message will only invalidate the
LogicalRepRelation portion of the target parent's entry, while any
copies that have been made for partitions that were opened till that
point will continue to have the old LogicalRepRelation information.
As things stand, logicalrep_partition_open() won't know that the
parent entry's LogicalRepRelation may have been modified due to a
RELATION message.  It will reconstruct the entry only if the partition
itself was modified locally, that is, if
logicalrep_partman_invalidate_cb() was called on the partition.

> >  Updating the
> > latter is the responsibility of logicalrep_relmap_update(), which is
> > there to support handling of the RELATION message by
> > apply_handle_relation().  Given that we make a separate copy of the
> > parent's LogicalRepRelMapEntry for each partition to put into the
> > corresponding LogicalRepPartMapEntry, those copies must be updated as
> > well when a RELATION message targeting the parent's entry arrives.  So
> > it seems fine that the patch is making it the
> > logicalrep_relmap_update()'s responsibility to update the partition
> > copies using the new logicalrep_partition_invalidate/update()
> > subroutine.
>
> I think we can do that way as well but do you see any benefit in it?
> The way I am suggesting will avoid the effort of updating the remote
> rel copy till we try to access that particular partition.

I don't see any benefit as such to doing it the way the patch does,
it's just that that seems to be the only way to go given the way
things are.

This would have been unnecessary, for example, if the relation map
entry had contained a LogicalRepRelation pointer instead of the
struct.  The partition entries would point to the same entry as the
parent's if that were the case and there would be no need to modify
the partitions' copies explicitly.

Am I missing something?

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com



Re: Replica Identity check of partition table on subscriber

От
Amit Langote
Дата:
On Tue, Jun 14, 2022 at 3:31 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Mon, Jun 13, 2022 at 6:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > I think we can do that way as well but do you see any benefit in it?
> > The way I am suggesting will avoid the effort of updating the remote
> > rel copy till we try to access that particular partition.
>
> I don't see any benefit as such to doing it the way the patch does,
> it's just that that seems to be the only way to go given the way
> things are.

Oh, I see that v4-0002 has this:

+/*
+ * Reset the entries in the partition map that refer to remoterel
+ *
+ * Called when new relation mapping is sent by the publisher to update our
+ * expected view of incoming data from said publisher.
+ *
+ * Note that we don't update the remoterel information in the entry here,
+ * we will update the information in logicalrep_partition_open to avoid
+ * unnecessary work.
+ */
+void
+logicalrep_partmap_reset_relmap(LogicalRepRelation *remoterel)
+{
+   HASH_SEQ_STATUS status;
+   LogicalRepPartMapEntry *part_entry;
+   LogicalRepRelMapEntry *entry;
+
+   if (LogicalRepPartMap == NULL)
+       return;
+
+   hash_seq_init(&status, LogicalRepPartMap);
+   while ((part_entry = (LogicalRepPartMapEntry *)
hash_seq_search(&status)) != NULL)
+   {
+       entry = &part_entry->relmapentry;
+
+       if (entry->remoterel.remoteid != remoterel->remoteid)
+           continue;
+
+       logicalrep_relmap_free_entry(entry);
+
+       memset(entry, 0, sizeof(LogicalRepRelMapEntry));
+   }
+}

The previous versions would also call logicalrep_relmap_update() on
the entry after the memset, which is no longer done, so that is indeed
saving useless work.  I also see that both logicalrep_relmap_update()
and the above function basically invalidate the whole
LogicalRepRelMapEntry before setting the new remote relation info so
that the next logicaprep_rel_open() or logicalrep_partition_open()
have to refill the other members too.

Though, I thought maybe you were saying that we shouldn't need this
function for resetting partitions in the first place, which I guess
you weren't.

v4-0002 looks good btw, except the bitpick about test comment similar
to my earlier comment regarding v5-0001:

+# Change the column order of table on publisher

I think it might be better to say something specific to describe the
test intent, like:

Test that replication into partitioned target table continues to works
correctly when the published table is altered

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com



RE: Replica Identity check of partition table on subscriber

От
"shiy.fnst@fujitsu.com"
Дата:
On Tue, Jun 14, 2022 2:18 PM Amit Langote <amitlangote09@gmail.com> wrote:
> 
> On Mon, Jun 13, 2022 at 9:26 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > On Mon, Jun 13, 2022 at 1:03 PM houzj.fnst@fujitsu.com
> > <houzj.fnst@fujitsu.com> wrote:
> > > On Monday, June 13, 2022 1:53 PM Amit Kapila
> <amit.kapila16@gmail.com> wrote:
> > > I have separated out the bug-fix for the subscriber-side.
> > > And fix the typo and function name.
> > > Attach the new version patch set.
> > >
> >
> > The first patch looks good to me. I have slightly modified one of the
> > comments and the commit message. I think we need to backpatch this
> > through 13 where we introduced support to replicate into partitioned
> > tables (commit f1ac27bf). If you guys are fine, I'll push this once
> > the work for PG14.4 is done.
> 
> Both the code changes and test cases look good to me.  Just a couple
> of minor nitpicks with test changes:

Thanks for your comments.

> 
> +   CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
> +   ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;
> +   ALTER TABLE tab5_1 REPLICA IDENTITY USING INDEX tab5_1_a_idx;});
> 
> Not sure if we follow it elsewhere, but should we maybe avoid using
> the internally generated index name as in the partition's case above?
> 

I saw some existing tests also use internally generated index name (e.g.
replica_identity.sql, ddl.sql and 031_column_list.pl), so maybe it's better to
fix them all in a separate patch. I didn't change this.

> +# Test the case that target table on subscriber is a partitioned table and
> +# check that the changes are replicated correctly after changing the schema
> of
> +# table on subcriber.
> 
> The first sentence makes it sound like the tests that follow are the
> first ones in the file where the target table is partitioned, which is
> not true, so I think we should drop that part.  Also how about being
> more specific about the test intent, say:
> 
> Test that replication continues to work correctly after altering the
> partition of a partitioned target table.
> 

OK, modified.

Attached the new version of patch set, and the patches for pg14 and pg13.

Regards,
Shi yu

Вложения

Re: Replica Identity check of partition table on subscriber

От
Amit Kapila
Дата:
On Tue, Jun 14, 2022 at 1:02 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Tue, Jun 14, 2022 at 3:31 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Mon, Jun 13, 2022 at 6:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > I think we can do that way as well but do you see any benefit in it?
> > > The way I am suggesting will avoid the effort of updating the remote
> > > rel copy till we try to access that particular partition.
> >
> > I don't see any benefit as such to doing it the way the patch does,
> > it's just that that seems to be the only way to go given the way
> > things are.
>
> Oh, I see that v4-0002 has this:
>
> +/*
> + * Reset the entries in the partition map that refer to remoterel
> + *
> + * Called when new relation mapping is sent by the publisher to update our
> + * expected view of incoming data from said publisher.
> + *
> + * Note that we don't update the remoterel information in the entry here,
> + * we will update the information in logicalrep_partition_open to avoid
> + * unnecessary work.
> + */
> +void
> +logicalrep_partmap_reset_relmap(LogicalRepRelation *remoterel)
> +{
> +   HASH_SEQ_STATUS status;
> +   LogicalRepPartMapEntry *part_entry;
> +   LogicalRepRelMapEntry *entry;
> +
> +   if (LogicalRepPartMap == NULL)
> +       return;
> +
> +   hash_seq_init(&status, LogicalRepPartMap);
> +   while ((part_entry = (LogicalRepPartMapEntry *)
> hash_seq_search(&status)) != NULL)
> +   {
> +       entry = &part_entry->relmapentry;
> +
> +       if (entry->remoterel.remoteid != remoterel->remoteid)
> +           continue;
> +
> +       logicalrep_relmap_free_entry(entry);
> +
> +       memset(entry, 0, sizeof(LogicalRepRelMapEntry));
> +   }
> +}
>
> The previous versions would also call logicalrep_relmap_update() on
> the entry after the memset, which is no longer done, so that is indeed
> saving useless work.  I also see that both logicalrep_relmap_update()
> and the above function basically invalidate the whole
> LogicalRepRelMapEntry before setting the new remote relation info so
> that the next logicaprep_rel_open() or logicalrep_partition_open()
> have to refill the other members too.
>
> Though, I thought maybe you were saying that we shouldn't need this
> function for resetting partitions in the first place, which I guess
> you weren't.
>

Right.

> v4-0002 looks good btw, except the bitpick about test comment similar
> to my earlier comment regarding v5-0001:
>
> +# Change the column order of table on publisher
>
> I think it might be better to say something specific to describe the
> test intent, like:
>
> Test that replication into partitioned target table continues to works
> correctly when the published table is altered
>

Okay changed this and slightly modify the comments and commit message.
I am just attaching the HEAD patches for the first two issues.

-- 
With Regards,
Amit Kapila.

Вложения

Re: Replica Identity check of partition table on subscriber

От
Amit Langote
Дата:
On Tue, Jun 14, 2022 at 9:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, Jun 14, 2022 at 1:02 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > +# Change the column order of table on publisher
> > I think it might be better to say something specific to describe the
> > test intent, like:
> >
> > Test that replication into partitioned target table continues to works
> > correctly when the published table is altered
>
> Okay changed this and slightly modify the comments and commit message.
> I am just attaching the HEAD patches for the first two issues.

LGTM, thanks.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com



RE: Replica Identity check of partition table on subscriber

От
"shiy.fnst@fujitsu.com"
Дата:
On Tue, Jun 14, 2022 8:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> > v4-0002 looks good btw, except the bitpick about test comment similar
> > to my earlier comment regarding v5-0001:
> >
> > +# Change the column order of table on publisher
> >
> > I think it might be better to say something specific to describe the
> > test intent, like:
> >
> > Test that replication into partitioned target table continues to works
> > correctly when the published table is altered
> >
> 
> Okay changed this and slightly modify the comments and commit message.
> I am just attaching the HEAD patches for the first two issues.
> 

Thanks for updating the patch.

Attached the new patch set which ran pgindent, and the patches for pg14 and
pg13. (Only the first two patches of the patch set.)

Regards,
Shi yu

Вложения

Re: Replica Identity check of partition table on subscriber

От
Amit Kapila
Дата:
On Wed, Jun 15, 2022 at 8:52 AM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
>
> On Tue, Jun 14, 2022 8:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > v4-0002 looks good btw, except the bitpick about test comment similar
> > > to my earlier comment regarding v5-0001:
> > >
> > > +# Change the column order of table on publisher
> > >
> > > I think it might be better to say something specific to describe the
> > > test intent, like:
> > >
> > > Test that replication into partitioned target table continues to works
> > > correctly when the published table is altered
> > >
> >
> > Okay changed this and slightly modify the comments and commit message.
> > I am just attaching the HEAD patches for the first two issues.
> >
>
> Thanks for updating the patch.
>
> Attached the new patch set which ran pgindent, and the patches for pg14 and
> pg13. (Only the first two patches of the patch set.)
>

I have pushed the first bug-fix patch today.


-- 
With Regards,
Amit Kapila.



RE: Replica Identity check of partition table on subscriber

От
"shiy.fnst@fujitsu.com"
Дата:
On Wed, Jun 15, 2022 8:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> I have pushed the first bug-fix patch today.
> 

Thanks.

Attached the remaining patches which are rebased.

Regards,
Shi yu


Вложения

Re: Replica Identity check of partition table on subscriber

От
Amit Langote
Дата:
Hi,

On Thu, Jun 16, 2022 at 2:07 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
> On Wed, Jun 15, 2022 8:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > I have pushed the first bug-fix patch today.
>
> Attached the remaining patches which are rebased.

Thanks.

Comments on v9-0001:

+ * Don't throw any error here just mark the relation entry as not updatable,
+ * as replica identity is only for updates and deletes but inserts can be
+ * replicated even without it.

I know you're simply copying the old comment, but I think we can
rewrite it to be slightly more useful:

We just mark the relation entry as not updatable here if the local
replica identity is found to be insufficient and leave it to
check_relation_updatable() to throw the actual error if needed.

+   /* Check that replica identity matches. */
+   logicalrep_rel_mark_updatable(entry);

Maybe the comment (there are 2 instances) should say:

Set if the table's replica identity is enough to apply update/delete.

Finally,

+# Alter REPLICA IDENTITY on subscriber.
+# No REPLICA IDENTITY in the partitioned table on subscriber, but what we check
+# is the partition, so it works fine.

For consistency with other recently added comments, I'd suggest the
following wording:

Test that replication works correctly as long as the leaf partition
has the necessary REPLICA IDENTITY, even though the actual target
partitioned table does not.

On v9-0002:

+   /* cleanup the invalid attrmap */

It seems that "invalid" here really means no-longer-useful, so we
should use that phrase as a nearby comment does:

Release the no-longer-useful attrmap, if any.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com



Re: Replica Identity check of partition table on subscriber

От
Amit Kapila
Дата:
On Thu, Jun 16, 2022 at 11:43 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Thu, Jun 16, 2022 at 2:07 PM shiy.fnst@fujitsu.com
> <shiy.fnst@fujitsu.com> wrote:
> > On Wed, Jun 15, 2022 8:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > I have pushed the first bug-fix patch today.
> >
> > Attached the remaining patches which are rebased.
>
> Thanks.
>
> Comments on v9-0001:
>
> + * Don't throw any error here just mark the relation entry as not updatable,
> + * as replica identity is only for updates and deletes but inserts can be
> + * replicated even without it.
>
> I know you're simply copying the old comment, but I think we can
> rewrite it to be slightly more useful:
>
> We just mark the relation entry as not updatable here if the local
> replica identity is found to be insufficient and leave it to
> check_relation_updatable() to throw the actual error if needed.
>

I am fine with improving this comment but it would be better if in
some way we keep the following part of the comment: "as replica
identity is only for updates and deletes but inserts can be replicated
even without it." as that makes it more clear why it is okay to just
mark the entry as not updatable. One idea could be: "We just mark the
relation entry as not updatable here if the local replica identity is
found to be insufficient and leave it to check_relation_updatable() to
throw the actual error if needed. This is because replica identity is
only for updates and deletes but inserts can be replicated even
without it.". Feel free to suggest if you have any better ideas?

-- 
With Regards,
Amit Kapila.



Re: Replica Identity check of partition table on subscriber

От
Amit Langote
Дата:
On Thu, Jun 16, 2022 at 3:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Jun 16, 2022 at 11:43 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > + * Don't throw any error here just mark the relation entry as not updatable,
> > + * as replica identity is only for updates and deletes but inserts can be
> > + * replicated even without it.
> >
> > I know you're simply copying the old comment, but I think we can
> > rewrite it to be slightly more useful:
> >
> > We just mark the relation entry as not updatable here if the local
> > replica identity is found to be insufficient and leave it to
> > check_relation_updatable() to throw the actual error if needed.
>
> I am fine with improving this comment but it would be better if in
> some way we keep the following part of the comment: "as replica
> identity is only for updates and deletes but inserts can be replicated
> even without it." as that makes it more clear why it is okay to just
> mark the entry as not updatable. One idea could be: "We just mark the
> relation entry as not updatable here if the local replica identity is
> found to be insufficient and leave it to check_relation_updatable() to
> throw the actual error if needed. This is because replica identity is
> only for updates and deletes but inserts can be replicated even
> without it.". Feel free to suggest if you have any better ideas?

I thought mentioning check_relation_updatable() would make it clear
that only updates (and deletes) care about a valid local replica
identity, because only apply_handle_{update|delete}() call that
function.  Anyway, how about this:

We just mark the relation entry as not updatable here if the local
replica identity is found to be insufficient for applying
updates/deletes (inserts don't care!) and leave it to
check_relation_updatable() to throw the actual error if needed.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com



Re: Replica Identity check of partition table on subscriber

От
Amit Kapila
Дата:
On Thu, Jun 16, 2022 at 12:30 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Thu, Jun 16, 2022 at 3:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Thu, Jun 16, 2022 at 11:43 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > > + * Don't throw any error here just mark the relation entry as not updatable,
> > > + * as replica identity is only for updates and deletes but inserts can be
> > > + * replicated even without it.
> > >
> > > I know you're simply copying the old comment, but I think we can
> > > rewrite it to be slightly more useful:
> > >
> > > We just mark the relation entry as not updatable here if the local
> > > replica identity is found to be insufficient and leave it to
> > > check_relation_updatable() to throw the actual error if needed.
> >
> > I am fine with improving this comment but it would be better if in
> > some way we keep the following part of the comment: "as replica
> > identity is only for updates and deletes but inserts can be replicated
> > even without it." as that makes it more clear why it is okay to just
> > mark the entry as not updatable. One idea could be: "We just mark the
> > relation entry as not updatable here if the local replica identity is
> > found to be insufficient and leave it to check_relation_updatable() to
> > throw the actual error if needed. This is because replica identity is
> > only for updates and deletes but inserts can be replicated even
> > without it.". Feel free to suggest if you have any better ideas?
>
> I thought mentioning check_relation_updatable() would make it clear
> that only updates (and deletes) care about a valid local replica
> identity, because only apply_handle_{update|delete}() call that
> function.  Anyway, how about this:
>
> We just mark the relation entry as not updatable here if the local
> replica identity is found to be insufficient for applying
> updates/deletes (inserts don't care!) and leave it to
> check_relation_updatable() to throw the actual error if needed.
>

This sounds better to me than the previous text.

-- 
With Regards,
Amit Kapila.



Re: Replica Identity check of partition table on subscriber

От
Amit Kapila
Дата:
On Fri, Jun 10, 2022 at 2:26 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> @@ -1735,6 +1735,13 @@ apply_handle_insert_internal(ApplyExecutionData *edata,
>  static void
>  check_relation_updatable(LogicalRepRelMapEntry *rel)
>  {
> +   /*
> +    * If it is a partitioned table, we don't check it, we will check its
> +    * partition later.
> +    */
> +   if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> +       return;
>
> Why do this?  I mean why if logicalrep_check_updatable() doesn't care
> if the relation is partitioned or not -- it does all the work
> regardless.
>
> I suggest we don't add this check in check_relation_updatable().
>

I think based on this suggestion patch has moved this check to
logicalrep_rel_mark_updatable(). For a partitioned table, it won't
even validate whether it can mark updatable as false which seems odd
to me even though there might not be any bug due to that. Was your
suggestion actually intended to move it to
logicalrep_rel_mark_updatable? If so, why do you think that is a
better place?

I think it is important to have this check to avoid giving error via
check_relation_updatable() when partitioned tables don't have RI but
not clear which is the right place. I think check_relation_updatable()
is better place than logicalrep_rel_mark_updatable() but may be there
is a reason why that is not a good idea.

-- 
With Regards,
Amit Kapila.



Re: Replica Identity check of partition table on subscriber

От
Amit Langote
Дата:
On Thu, Jun 16, 2022 at 6:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Jun 10, 2022 at 2:26 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > @@ -1735,6 +1735,13 @@ apply_handle_insert_internal(ApplyExecutionData *edata,
> >  static void
> >  check_relation_updatable(LogicalRepRelMapEntry *rel)
> >  {
> > +   /*
> > +    * If it is a partitioned table, we don't check it, we will check its
> > +    * partition later.
> > +    */
> > +   if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> > +       return;
> >
> > Why do this?  I mean why if logicalrep_check_updatable() doesn't care
> > if the relation is partitioned or not -- it does all the work
> > regardless.
> >
> > I suggest we don't add this check in check_relation_updatable().
>
> I think based on this suggestion patch has moved this check to
> logicalrep_rel_mark_updatable(). For a partitioned table, it won't
> even validate whether it can mark updatable as false which seems odd
> to me even though there might not be any bug due to that. Was your
> suggestion actually intended to move it to
> logicalrep_rel_mark_updatable?

No, I didn't intend to suggest that we move this check to
logicalrep_rel_mark_updatable(); didn't notice that that's what the
latest patch did.

What I said is that we shouldn't ignore the updatable flag for a
partitioned table in check_relation_updatable(), because
logicalrep_rel_mark_updatable() would have set the updatable flag
correctly even for partitioned tables.  IOW, we should not
special-case partitioned tables anywhere.

I guess the point of adding the check is to allow the case where a
leaf partition's replica identity can be used to apply an update
originally targeting its ancestor that doesn't itself have one.

I wonder if it wouldn't be better to move the
check_relation_updatable() call to
apply_handle_{update|delete}_internal()?  We know for sure that we
only ever get there for leaf tables.  If we do that, we won't need the
relkind check.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com



Re: Replica Identity check of partition table on subscriber

От
Amit Kapila
Дата:
On Thu, Jun 16, 2022 at 5:24 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Thu, Jun 16, 2022 at 6:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Fri, Jun 10, 2022 at 2:26 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > @@ -1735,6 +1735,13 @@ apply_handle_insert_internal(ApplyExecutionData *edata,
> > >  static void
> > >  check_relation_updatable(LogicalRepRelMapEntry *rel)
> > >  {
> > > +   /*
> > > +    * If it is a partitioned table, we don't check it, we will check its
> > > +    * partition later.
> > > +    */
> > > +   if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> > > +       return;
> > >
> > > Why do this?  I mean why if logicalrep_check_updatable() doesn't care
> > > if the relation is partitioned or not -- it does all the work
> > > regardless.
> > >
> > > I suggest we don't add this check in check_relation_updatable().
> >
> > I think based on this suggestion patch has moved this check to
> > logicalrep_rel_mark_updatable(). For a partitioned table, it won't
> > even validate whether it can mark updatable as false which seems odd
> > to me even though there might not be any bug due to that. Was your
> > suggestion actually intended to move it to
> > logicalrep_rel_mark_updatable?
>
> No, I didn't intend to suggest that we move this check to
> logicalrep_rel_mark_updatable(); didn't notice that that's what the
> latest patch did.
>
> What I said is that we shouldn't ignore the updatable flag for a
> partitioned table in check_relation_updatable(), because
> logicalrep_rel_mark_updatable() would have set the updatable flag
> correctly even for partitioned tables.  IOW, we should not
> special-case partitioned tables anywhere.
>
> I guess the point of adding the check is to allow the case where a
> leaf partition's replica identity can be used to apply an update
> originally targeting its ancestor that doesn't itself have one.
>
> I wonder if it wouldn't be better to move the
> check_relation_updatable() call to
> apply_handle_{update|delete}_internal()?  We know for sure that we
> only ever get there for leaf tables.  If we do that, we won't need the
> relkind check.
>

I think this won't work for updates via apply_handle_tuple_routing()
unless we call it from some other place(s) as well. It will do
FindReplTupleInLocalRel() before doing update/delete for CMD_UPDATE
case and will lead to assertion failure.

-- 
With Regards,
Amit Kapila.



Re: Replica Identity check of partition table on subscriber

От
Amit Langote
Дата:
On Thu, Jun 16, 2022 at 9:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Jun 16, 2022 at 5:24 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Thu, Jun 16, 2022 at 6:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > On Fri, Jun 10, 2022 at 2:26 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > > @@ -1735,6 +1735,13 @@ apply_handle_insert_internal(ApplyExecutionData *edata,
> > > >  static void
> > > >  check_relation_updatable(LogicalRepRelMapEntry *rel)
> > > >  {
> > > > +   /*
> > > > +    * If it is a partitioned table, we don't check it, we will check its
> > > > +    * partition later.
> > > > +    */
> > > > +   if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> > > > +       return;
> > > >
> > > > Why do this?  I mean why if logicalrep_check_updatable() doesn't care
> > > > if the relation is partitioned or not -- it does all the work
> > > > regardless.
> > > >
> > > > I suggest we don't add this check in check_relation_updatable().
> > >
> > > I think based on this suggestion patch has moved this check to
> > > logicalrep_rel_mark_updatable(). For a partitioned table, it won't
> > > even validate whether it can mark updatable as false which seems odd
> > > to me even though there might not be any bug due to that. Was your
> > > suggestion actually intended to move it to
> > > logicalrep_rel_mark_updatable?
> >
> > No, I didn't intend to suggest that we move this check to
> > logicalrep_rel_mark_updatable(); didn't notice that that's what the
> > latest patch did.
> >
> > What I said is that we shouldn't ignore the updatable flag for a
> > partitioned table in check_relation_updatable(), because
> > logicalrep_rel_mark_updatable() would have set the updatable flag
> > correctly even for partitioned tables.  IOW, we should not
> > special-case partitioned tables anywhere.
> >
> > I guess the point of adding the check is to allow the case where a
> > leaf partition's replica identity can be used to apply an update
> > originally targeting its ancestor that doesn't itself have one.
> >
> > I wonder if it wouldn't be better to move the
> > check_relation_updatable() call to
> > apply_handle_{update|delete}_internal()?  We know for sure that we
> > only ever get there for leaf tables.  If we do that, we won't need the
> > relkind check.
>
> I think this won't work for updates via apply_handle_tuple_routing()
> unless we call it from some other place(s) as well. It will do
> FindReplTupleInLocalRel() before doing update/delete for CMD_UPDATE
> case and will lead to assertion failure.

You're right.  I guess it's fine then to check the relkind in
check_relation_updatable() the way the original patch did, even though
it would've been nice if it didn't need to.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com



RE: Replica Identity check of partition table on subscriber

От
"shiy.fnst@fujitsu.com"
Дата:
On Thu, Jun 16, 2022 2:13 PM Amit Langote <amitlangote09@gmail.com> wrote:
> 
> Hi,
> 
> On Thu, Jun 16, 2022 at 2:07 PM shiy.fnst@fujitsu.com
> <shiy.fnst@fujitsu.com> wrote:
> > On Wed, Jun 15, 2022 8:30 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > > I have pushed the first bug-fix patch today.
> >
> > Attached the remaining patches which are rebased.
> 
> Thanks.
> 
> Comments on v9-0001:

Thanks for your comments.

> 
> + * Don't throw any error here just mark the relation entry as not updatable,
> + * as replica identity is only for updates and deletes but inserts can be
> + * replicated even without it.
> 
> I know you're simply copying the old comment, but I think we can
> rewrite it to be slightly more useful:
> 
> We just mark the relation entry as not updatable here if the local
> replica identity is found to be insufficient and leave it to
> check_relation_updatable() to throw the actual error if needed.
> 

Modified as you suggested in another mail [1].

> +   /* Check that replica identity matches. */
> +   logicalrep_rel_mark_updatable(entry);
> 
> Maybe the comment (there are 2 instances) should say:
> 
> Set if the table's replica identity is enough to apply update/delete.
> 

Modified as suggested.

> Finally,
> 
> +# Alter REPLICA IDENTITY on subscriber.
> +# No REPLICA IDENTITY in the partitioned table on subscriber, but what we
> check
> +# is the partition, so it works fine.
> 
> For consistency with other recently added comments, I'd suggest the
> following wording:
> 
> Test that replication works correctly as long as the leaf partition
> has the necessary REPLICA IDENTITY, even though the actual target
> partitioned table does not.
> 

Modified as suggested.

> On v9-0002:
> 
> +   /* cleanup the invalid attrmap */
> 
> It seems that "invalid" here really means no-longer-useful, so we
> should use that phrase as a nearby comment does:
> 
> Release the no-longer-useful attrmap, if any.
> 

Modified as suggested.

Attached the new version of patch set. I also moved the partitioned table check
in logicalrep_rel_mark_updatable() to check_relation_updatable() as discussed
[2].

[1] https://www.postgresql.org/message-id/CA%2BHiwqG3Xi%3DwH4rBHm61ku-j0gm%2B-rc5VmDHxf%3DTeFkUsHtooA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CA%2BHiwqHfN789ekiYVE%2B0xsLswMosMrWBwv4cPvYgWREWejw7HA%40mail.gmail.com

Regards,
Shi yu


Вложения

RE: Replica Identity check of partition table on subscriber

От
"shiy.fnst@fujitsu.com"
Дата:
On Fri Jun 17, 2022 11:06 AM shiy.fnst@fujitsu.com <shiy.fnst@fujitsu.com> wrote:
> 
> Attached the new version of patch set. I also moved the partitioned table
> check
> in logicalrep_rel_mark_updatable() to check_relation_updatable() as
> discussed
> [2].
> 

Attached back-branch patches of the first patch.

Regards,
Shi yu

Вложения

Re: Replica Identity check of partition table on subscriber

От
Amit Kapila
Дата:
On Fri, Jun 17, 2022 at 11:22 AM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
>
> On Fri Jun 17, 2022 11:06 AM shiy.fnst@fujitsu.com <shiy.fnst@fujitsu.com> wrote:
> >
> > Attached the new version of patch set. I also moved the partitioned table
> > check
> > in logicalrep_rel_mark_updatable() to check_relation_updatable() as
> > discussed
> > [2].
> >
>
> Attached back-branch patches of the first patch.
>

One minor comment:
+ /*
+ * If it is a partitioned table, we don't check it, we will check its
+ * partition later.
+ */

Can we change the above comment to: "For partitioned tables, we only
need to care if the target partition is updatable (aka has PK or RI
defined for it)."?

Apart from this looks good to me. I'll push this tomorrow unless there
are any more suggestions/comments.

-- 
With Regards,
Amit Kapila.



RE: Replica Identity check of partition table on subscriber

От
"shiy.fnst@fujitsu.com"
Дата:
On Mon, Jun 20, 2022 1:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Fri, Jun 17, 2022 at 11:22 AM shiy.fnst@fujitsu.com
> <shiy.fnst@fujitsu.com> wrote:
> >
> > On Fri Jun 17, 2022 11:06 AM shiy.fnst@fujitsu.com <shiy.fnst@fujitsu.com>
> wrote:
> > >
> > > Attached the new version of patch set. I also moved the partitioned table
> > > check
> > > in logicalrep_rel_mark_updatable() to check_relation_updatable() as
> > > discussed
> > > [2].
> > >
> >
> > Attached back-branch patches of the first patch.
> >
> 
> One minor comment:
> + /*
> + * If it is a partitioned table, we don't check it, we will check its
> + * partition later.
> + */
> 
> Can we change the above comment to: "For partitioned tables, we only
> need to care if the target partition is updatable (aka has PK or RI
> defined for it)."?
> 

Thanks for your comment. Modified in the attached patches. 

Regards,
Shi yu

Вложения

Re: Replica Identity check of partition table on subscriber

От
Amit Langote
Дата:
On Mon, Jun 20, 2022 at 3:46 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
> On Mon, Jun 20, 2022 1:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > One minor comment:
> > + /*
> > + * If it is a partitioned table, we don't check it, we will check its
> > + * partition later.
> > + */
> >
> > Can we change the above comment to: "For partitioned tables, we only
> > need to care if the target partition is updatable (aka has PK or RI
> > defined for it)."?
> >
> Thanks for your comment. Modified in the attached patches.

How about: ...target "leaf" partition is updatable

Regarding the commit message's top line, which is this:

    Fix partition table's RI checking on the subscriber.

I think it should spell out REPLICA IDENTITY explicitly to avoid the
commit being confused to have to do with "Referential Integrity
checking".

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com



Re: Replica Identity check of partition table on subscriber

От
Amit Kapila
Дата:
On Tue, Jun 21, 2022 at 7:49 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Mon, Jun 20, 2022 at 3:46 PM shiy.fnst@fujitsu.com
> <shiy.fnst@fujitsu.com> wrote:
> > On Mon, Jun 20, 2022 1:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > One minor comment:
> > > + /*
> > > + * If it is a partitioned table, we don't check it, we will check its
> > > + * partition later.
> > > + */
> > >
> > > Can we change the above comment to: "For partitioned tables, we only
> > > need to care if the target partition is updatable (aka has PK or RI
> > > defined for it)."?
> > >
> > Thanks for your comment. Modified in the attached patches.
>
> How about: ...target "leaf" partition is updatable
>

I am not very sure if this is an improvement over the current.

> Regarding the commit message's top line, which is this:
>
>     Fix partition table's RI checking on the subscriber.
>
> I think it should spell out REPLICA IDENTITY explicitly to avoid the
> commit being confused to have to do with "Referential Integrity
> checking".
>

This makes sense. I'll take care of this.

-- 
With Regards,
Amit Kapila.



Re: Replica Identity check of partition table on subscriber

От
Amit Kapila
Дата:
On Tue, Jun 21, 2022 at 8:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jun 21, 2022 at 7:49 AM Amit Langote <amitlangote09@gmail.com> wrote:
> >
> >
> > I think it should spell out REPLICA IDENTITY explicitly to avoid the
> > commit being confused to have to do with "Referential Integrity
> > checking".
> >
>
> This makes sense. I'll take care of this.
>

After pushing this patch, buildfarm member prion has failed.
https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=prion&br=HEAD

It seems to me that the problem could be due to the reason that the
entry returned by logicalrep_partition_open() may not have the correct
value for localrel when we found the entry and localrelvalid is also
true. The point is that before this commit we never use localrel value
from the rel entry returned by logicalrep_partition_open. I think we
need to always update the localrel value in
logicalrep_partition_open().

-- 
With Regards,
Amit Kapila.



RE: Replica Identity check of partition table on subscriber

От
"houzj.fnst@fujitsu.com"
Дата:
On Tuesday, June 21, 2022 1:29 PM Amit Kapila <amit.kapila16@gmail.com>:
> 
> On Tue, Jun 21, 2022 at 8:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Jun 21, 2022 at 7:49 AM Amit Langote <amitlangote09@gmail.com>
> wrote:
> > >
> > >
> > > I think it should spell out REPLICA IDENTITY explicitly to avoid the
> > > commit being confused to have to do with "Referential Integrity
> > > checking".
> > >
> >
> > This makes sense. I'll take care of this.
> >
> 
> After pushing this patch, buildfarm member prion has failed.
> https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=prion&br=HE
> AD
> 
> It seems to me that the problem could be due to the reason that the
> entry returned by logicalrep_partition_open() may not have the correct
> value for localrel when we found the entry and localrelvalid is also
> true. The point is that before this commit we never use localrel value
> from the rel entry returned by logicalrep_partition_open. I think we
> need to always update the localrel value in
> logicalrep_partition_open().

Agreed.

And I have confirmed that the failure is due to the segmentation violation when
access the cached relation. I reproduced this by using -DRELCACHE_FORCE_RELEASE
-DCATCACHE_FORCE_RELEASE option which was hinted by Tom.

Stack:
#0 check_relation_updatable (rel=0x1cf4548) at worker.c:1745
#1 0x0000000000909cbb in apply_handle_tuple_routing (edata=0x1cbf4e8, remoteslot=0x1cbf908, newtup=0x0,
operation=CMD_DELETE)at worker.c:2181
 
#2 0x00000000009097a5 in apply_handle_delete (s=0x7ffcef7fd730) at worker.c:2005
#3 0x000000000090a794 in apply_dispatch (s=0x7ffcef7fd730) at worker.c:2503
#4 0x000000000090ad43 in LogicalRepApplyLoop (last_received=22299920) at worker.c:2775
#5 0x000000000090c2ab in start_apply (origin_startpos=0) at worker.c:3549
#6 0x000000000090ca8d in ApplyWorkerMain (main_arg=0) at worker.c:3805
#7 0x00000000008c4c64 in StartBackgroundWorker () at bgworker.c:858
#8 0x00000000008ceaeb in do_start_bgworker (rw=0x1c3c6b0) at postmaster.c:5815
#9 0x00000000008cee97 in maybe_start_bgworkers () at postmaster.c:6039
#10 0x00000000008cdf4e in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5204
#11 <signal handler called>
#12 0x00007fd8fbe0d4ab in select () from /lib64/libc.so.6
#13 0x00000000008c9cfb in ServerLoop () at postmaster.c:1770
#14 0x00000000008c96e4 in PostmasterMain (argc=4, argv=0x1c110a0) at postmaster.c:1478
#15 0x00000000007c665b in main (argc=4, argv=0x1c110a0) at main.c:202
(gdb) p rel->localrel->rd_rel
$5 = (Form_pg_class) 0x7f7f7f7f7f7f7f7f

We didn't hit this problem because we only access that relation when we plan to
report an error[1] and then the worker will restart and cache will be built, so
everything seems OK.

The problem seems already existed and we hit this because we started to access
the cached relation in more places.

I think we should try to update the relation every time as the relation is
opened and closed by caller and here is the patch to do that.

[1]
    /*
     * We are in error mode so it's fine this is somewhat slow. It's better to
     * give user correct error.
     */
    if (OidIsValid(GetRelationIdentityOrPK(rel->localrel)))

Best regards,
Hou zj

Вложения

Re: Replica Identity check of partition table on subscriber

От
Amit Langote
Дата:
On Tue, Jun 21, 2022 at 3:35 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
> On Tuesday, June 21, 2022 1:29 PM Amit Kapila <amit.kapila16@gmail.com>:
> > After pushing this patch, buildfarm member prion has failed.
> > https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=prion&br=HE
> > AD
> >
> > It seems to me that the problem could be due to the reason that the
> > entry returned by logicalrep_partition_open() may not have the correct
> > value for localrel when we found the entry and localrelvalid is also
> > true. The point is that before this commit we never use localrel value
> > from the rel entry returned by logicalrep_partition_open. I think we
> > need to always update the localrel value in
> > logicalrep_partition_open().
>
> Agreed.
>
> And I have confirmed that the failure is due to the segmentation violation when
> access the cached relation. I reproduced this by using -DRELCACHE_FORCE_RELEASE
> -DCATCACHE_FORCE_RELEASE option which was hinted by Tom.
>
> Stack:
> #0 check_relation_updatable (rel=0x1cf4548) at worker.c:1745
> #1 0x0000000000909cbb in apply_handle_tuple_routing (edata=0x1cbf4e8, remoteslot=0x1cbf908, newtup=0x0,
operation=CMD_DELETE)at worker.c:2181
 
> #2 0x00000000009097a5 in apply_handle_delete (s=0x7ffcef7fd730) at worker.c:2005
> #3 0x000000000090a794 in apply_dispatch (s=0x7ffcef7fd730) at worker.c:2503
> #4 0x000000000090ad43 in LogicalRepApplyLoop (last_received=22299920) at worker.c:2775
> #5 0x000000000090c2ab in start_apply (origin_startpos=0) at worker.c:3549
> #6 0x000000000090ca8d in ApplyWorkerMain (main_arg=0) at worker.c:3805
> #7 0x00000000008c4c64 in StartBackgroundWorker () at bgworker.c:858
> #8 0x00000000008ceaeb in do_start_bgworker (rw=0x1c3c6b0) at postmaster.c:5815
> #9 0x00000000008cee97 in maybe_start_bgworkers () at postmaster.c:6039
> #10 0x00000000008cdf4e in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5204
> #11 <signal handler called>
> #12 0x00007fd8fbe0d4ab in select () from /lib64/libc.so.6
> #13 0x00000000008c9cfb in ServerLoop () at postmaster.c:1770
> #14 0x00000000008c96e4 in PostmasterMain (argc=4, argv=0x1c110a0) at postmaster.c:1478
> #15 0x00000000007c665b in main (argc=4, argv=0x1c110a0) at main.c:202
> (gdb) p rel->localrel->rd_rel
> $5 = (Form_pg_class) 0x7f7f7f7f7f7f7f7f
>
> We didn't hit this problem because we only access that relation when we plan to
> report an error[1] and then the worker will restart and cache will be built, so
> everything seems OK.
>
> The problem seems already existed and we hit this because we started to access
> the cached relation in more places.
>
> I think we should try to update the relation every time as the relation is
> opened and closed by caller and here is the patch to do that.

Thanks for the patch.

I agree it's an old bug.  A partition map entry's localrel may point
to a stale Relation pointer, because once the caller had closed the
relation, the relcache subsystem is free to "clear" it, like in the
case of a RELCACHE_FORCE_RELEASE build.

Fixing it the way patch does seems fine, though it feels like
localrelvalid will lose some of its meaning for the partition map
entries -- we will now overwrite localrel even if localrelvalid is
true.

+   /*
+    * Relation is opened and closed by caller, so we need to always update the
+    * partrel in case the cached relation was closed.
+    */
+   entry->localrel = partrel;
+
+   if (entry->localrelvalid)
        return entry;

Maybe we should add a comment here about why it's okay to overwrite
localrel even if localrelvalid is true.  How about the following hunk:

@@ -596,8 +596,20 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,

    entry = &part_entry->relmapentry;

+   /*
+    * We must always overwrite entry->localrel with the latest partition
+    * Relation pointer, because the Relation pointed to by the old value may
+    * have been cleared after the caller would have closed the partition
+    * relation after the last use of this entry.  Note that localrelvalid is
+    * only updated by the relcache invalidation callback, so it may still be
+    * true irrespective of whether the Relation pointed to by localrel has
+    * been cleared or not.
+    */
    if (found && entry->localrelvalid)
+   {
+       entry->localrel = partrel;
        return entry;
+   }

Attached a patch containing the above to consider as an alternative.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Вложения

RE: Replica Identity check of partition table on subscriber

От
"houzj.fnst@fujitsu.com"
Дата:
On Tuesday, June 21, 2022 3:21 PM Amit Langote <amitlangote09@gmail.com> wrote:
> 
> On Tue, Jun 21, 2022 at 3:35 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> > On Tuesday, June 21, 2022 1:29 PM Amit Kapila <amit.kapila16@gmail.com>:
> > > After pushing this patch, buildfarm member prion has failed.
> > >
> https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=prion&br=HE
> > > AD
> > >
> > > It seems to me that the problem could be due to the reason that the
> > > entry returned by logicalrep_partition_open() may not have the correct
> > > value for localrel when we found the entry and localrelvalid is also
> > > true. The point is that before this commit we never use localrel value
> > > from the rel entry returned by logicalrep_partition_open. I think we
> > > need to always update the localrel value in
> > > logicalrep_partition_open().
> >
> > Agreed.
> >
> > And I have confirmed that the failure is due to the segmentation violation
> when
> > access the cached relation. I reproduced this by using
> -DRELCACHE_FORCE_RELEASE
> > -DCATCACHE_FORCE_RELEASE option which was hinted by Tom.
> >
> > Stack:
> > #0 check_relation_updatable (rel=0x1cf4548) at worker.c:1745
> > #1 0x0000000000909cbb in apply_handle_tuple_routing (edata=0x1cbf4e8,
> remoteslot=0x1cbf908, newtup=0x0, operation=CMD_DELETE) at
> worker.c:2181
> > #2 0x00000000009097a5 in apply_handle_delete (s=0x7ffcef7fd730) at
> worker.c:2005
> > #3 0x000000000090a794 in apply_dispatch (s=0x7ffcef7fd730) at
> worker.c:2503
> > #4 0x000000000090ad43 in LogicalRepApplyLoop
> (last_received=22299920) at worker.c:2775
> > #5 0x000000000090c2ab in start_apply (origin_startpos=0) at worker.c:3549
> > #6 0x000000000090ca8d in ApplyWorkerMain (main_arg=0) at
> worker.c:3805
> > #7 0x00000000008c4c64 in StartBackgroundWorker () at bgworker.c:858
> > #8 0x00000000008ceaeb in do_start_bgworker (rw=0x1c3c6b0) at
> postmaster.c:5815
> > #9 0x00000000008cee97 in maybe_start_bgworkers () at postmaster.c:6039
> > #10 0x00000000008cdf4e in sigusr1_handler (postgres_signal_arg=10) at
> postmaster.c:5204
> > #11 <signal handler called>
> > #12 0x00007fd8fbe0d4ab in select () from /lib64/libc.so.6
> > #13 0x00000000008c9cfb in ServerLoop () at postmaster.c:1770
> > #14 0x00000000008c96e4 in PostmasterMain (argc=4, argv=0x1c110a0) at
> postmaster.c:1478
> > #15 0x00000000007c665b in main (argc=4, argv=0x1c110a0) at main.c:202
> > (gdb) p rel->localrel->rd_rel
> > $5 = (Form_pg_class) 0x7f7f7f7f7f7f7f7f
> >
> > We didn't hit this problem because we only access that relation when we plan
> to
> > report an error[1] and then the worker will restart and cache will be built, so
> > everything seems OK.
> >
> > The problem seems already existed and we hit this because we started to
> access
> > the cached relation in more places.
> >
> > I think we should try to update the relation every time as the relation is
> > opened and closed by caller and here is the patch to do that.
> Thanks for the patch.
> 
> I agree it's an old bug.  A partition map entry's localrel may point
> to a stale Relation pointer, because once the caller had closed the
> relation, the relcache subsystem is free to "clear" it, like in the
> case of a RELCACHE_FORCE_RELEASE build.

Hi,

Thanks for replying.

> Fixing it the way patch does seems fine, though it feels like
> localrelvalid will lose some of its meaning for the partition map
> entries -- we will now overwrite localrel even if localrelvalid is
> true.

To me, it seems localrelvalid doesn't have the meaning that the cached relation
pointer is valid. In logicalrep_rel_open(), we also reopen and update the
relation even if the localrelvalid is true.

> +   /*
> +    * Relation is opened and closed by caller, so we need to always update the
> +    * partrel in case the cached relation was closed.
> +    */
> +   entry->localrel = partrel;
> +
> +   if (entry->localrelvalid)
>         return entry;
> 
> Maybe we should add a comment here about why it's okay to overwrite
> localrel even if localrelvalid is true.  How about the following hunk:
> 
> @@ -596,8 +596,20 @@ logicalrep_partition_open(LogicalRepRelMapEntry
> *root,
> 
>     entry = &part_entry->relmapentry;
> 
> +   /*
> +    * We must always overwrite entry->localrel with the latest partition
> +    * Relation pointer, because the Relation pointed to by the old value may
> +    * have been cleared after the caller would have closed the partition
> +    * relation after the last use of this entry.  Note that localrelvalid is
> +    * only updated by the relcache invalidation callback, so it may still be
> +    * true irrespective of whether the Relation pointed to by localrel has
> +    * been cleared or not.
> +    */
>     if (found && entry->localrelvalid)
> +   {
> +       entry->localrel = partrel;
>         return entry;
> +   }
> 
> Attached a patch containing the above to consider as an alternative.

This looks fine to me as well.

Best regards,
Hou zj

Re: Replica Identity check of partition table on subscriber

От
Amit Kapila
Дата:
On Tue, Jun 21, 2022 at 12:50 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Tue, Jun 21, 2022 at 3:35 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
>
> Attached a patch containing the above to consider as an alternative.
>

Thanks, the patch looks good to me. I'll push this after doing some testing.

-- 
With Regards,
Amit Kapila.



RE: Replica Identity check of partition table on subscriber

От
"shiy.fnst@fujitsu.com"
Дата:
On Tuesday, June 21, 2022 4:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Tue, Jun 21, 2022 at 12:50 PM Amit Langote <amitlangote09@gmail.com>
> wrote:
> >
> > On Tue, Jun 21, 2022 at 3:35 PM houzj.fnst@fujitsu.com
> > <houzj.fnst@fujitsu.com> wrote:
> >
> > Attached a patch containing the above to consider as an alternative.
> >
> 
> Thanks, the patch looks good to me. I'll push this after doing some testing.

The patch looks good to me as well.

I also verified that the patch can be applied cleanly on back-branches and I
confirmed that the bug exists on back branches before this patch and is fixed
after applying this patch. The regression tests also passed with and without
RELCACHE_FORCE_RELEASE option in my machine.

Regards,
Shi yu

Re: Replica Identity check of partition table on subscriber

От
Amit Langote
Дата:
On Tue, Jun 21, 2022 at 5:08 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
> On Tuesday, June 21, 2022 3:21 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > Thanks for the patch.
> >
> > I agree it's an old bug.  A partition map entry's localrel may point
> > to a stale Relation pointer, because once the caller had closed the
> > relation, the relcache subsystem is free to "clear" it, like in the
> > case of a RELCACHE_FORCE_RELEASE build.
>
> Hi,
>
> Thanks for replying.
>
> > Fixing it the way patch does seems fine, though it feels like
> > localrelvalid will lose some of its meaning for the partition map
> > entries -- we will now overwrite localrel even if localrelvalid is
> > true.
>
> To me, it seems localrelvalid doesn't have the meaning that the cached relation
> pointer is valid. In logicalrep_rel_open(), we also reopen and update the
> relation even if the localrelvalid is true.

Ah, right.  I guess only the localrelvalid=false case is really
interesting then.  Only in that case, we need to (re-)build other
fields that are computed using localrel.  In the localrelvalid=true
case, we don't need to worry about other fields, but still need to
make sure that localrel points to an up to date relcache entry of the
relation.

> > +   /*
> > +    * Relation is opened and closed by caller, so we need to always update the
> > +    * partrel in case the cached relation was closed.
> > +    */
> > +   entry->localrel = partrel;
> > +
> > +   if (entry->localrelvalid)
> >         return entry;
> >
> > Maybe we should add a comment here about why it's okay to overwrite
> > localrel even if localrelvalid is true.  How about the following hunk:
> >
> > @@ -596,8 +596,20 @@ logicalrep_partition_open(LogicalRepRelMapEntry
> > *root,
> >
> >     entry = &part_entry->relmapentry;
> >
> > +   /*
> > +    * We must always overwrite entry->localrel with the latest partition
> > +    * Relation pointer, because the Relation pointed to by the old value may
> > +    * have been cleared after the caller would have closed the partition
> > +    * relation after the last use of this entry.  Note that localrelvalid is
> > +    * only updated by the relcache invalidation callback, so it may still be
> > +    * true irrespective of whether the Relation pointed to by localrel has
> > +    * been cleared or not.
> > +    */
> >     if (found && entry->localrelvalid)
> > +   {
> > +       entry->localrel = partrel;
> >         return entry;
> > +   }
> >
> > Attached a patch containing the above to consider as an alternative.
>
> This looks fine to me as well.

Thank you.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com



RE: Replica Identity check of partition table on subscriber

От
"houzj.fnst@fujitsu.com"
Дата:
On Tuesday, June 21, 2022 4:49 PM Amit Kapila <amit.kapila16@gmail.com>
> 
> On Tue, Jun 21, 2022 at 12:50 PM Amit Langote <amitlangote09@gmail.com>
> wrote:
> >
> > On Tue, Jun 21, 2022 at 3:35 PM houzj.fnst@fujitsu.com
> > <houzj.fnst@fujitsu.com> wrote:
> >
> > Attached a patch containing the above to consider as an alternative.
> >
> 
> Thanks, the patch looks good to me. I'll push this after doing some testing.

Since the patch has been committed. Attach the last patch to fix the memory leak.

The bug exists on PG10 ~ PG15(HEAD).

For HEAD,PG14,PG13, to fix the memory leak, I think we should use
free_attrmap instead of pfree and release the no-longer-useful attrmap
When rebuilding the map info.

For PG12,PG11,PG10, we only need to add the code to release the
no-longer-useful attrmap when rebuilding the map info. We can still use
pfree() because the attrmap in back-branch is a single array like:

entry->attrmap = palloc(desc->natts * sizeof(AttrNumber));

Best regards,
Hou zj
 

Вложения

Re: Replica Identity check of partition table on subscriber

От
Amit Langote
Дата:
Hi,

On Wed, Jun 22, 2022 at 12:02 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
> Since the patch has been committed. Attach the last patch to fix the memory leak.
>
> The bug exists on PG10 ~ PG15(HEAD).
>
> For HEAD,PG14,PG13, to fix the memory leak, I think we should use
> free_attrmap instead of pfree and release the no-longer-useful attrmap
> When rebuilding the map info.
>
> For PG12,PG11,PG10, we only need to add the code to release the
> no-longer-useful attrmap when rebuilding the map info. We can still use
> pfree() because the attrmap in back-branch is a single array like:
>
> entry->attrmap = palloc(desc->natts * sizeof(AttrNumber));

LGTM, thank you.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com



Re: Replica Identity check of partition table on subscriber

От
Amit Kapila
Дата:
On Wed, Jun 22, 2022 at 10:09 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Wed, Jun 22, 2022 at 12:02 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> > Since the patch has been committed. Attach the last patch to fix the memory leak.
> >
> > The bug exists on PG10 ~ PG15(HEAD).
> >
> > For HEAD,PG14,PG13, to fix the memory leak, I think we should use
> > free_attrmap instead of pfree and release the no-longer-useful attrmap
> > When rebuilding the map info.
> >
> > For PG12,PG11,PG10, we only need to add the code to release the
> > no-longer-useful attrmap when rebuilding the map info. We can still use
> > pfree() because the attrmap in back-branch is a single array like:
> >
> > entry->attrmap = palloc(desc->natts * sizeof(AttrNumber));
>
> LGTM, thank you.
>

LGTM as well. One thing I am not completely sure about is whether to
make this change in PG10 for which the final release is in Nov?
AFAICS, the leak can only occur after the relcache invalidation on the
subscriber which may or may not be a very frequent case. What do you
guys think?

Personally, I feel it is good to fix it in all branches including PG10.

-- 
With Regards,
Amit Kapila.



Re: Replica Identity check of partition table on subscriber

От
Amit Langote
Дата:
On Wed, Jun 22, 2022 at 8:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Jun 22, 2022 at 10:09 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Wed, Jun 22, 2022 at 12:02 PM houzj.fnst@fujitsu.com
> > <houzj.fnst@fujitsu.com> wrote:
> > > Since the patch has been committed. Attach the last patch to fix the memory leak.
> > >
> > > The bug exists on PG10 ~ PG15(HEAD).
> > >
> > > For HEAD,PG14,PG13, to fix the memory leak, I think we should use
> > > free_attrmap instead of pfree and release the no-longer-useful attrmap
> > > When rebuilding the map info.
> > >
> > > For PG12,PG11,PG10, we only need to add the code to release the
> > > no-longer-useful attrmap when rebuilding the map info. We can still use
> > > pfree() because the attrmap in back-branch is a single array like:
> > >
> > > entry->attrmap = palloc(desc->natts * sizeof(AttrNumber));
> >
> > LGTM, thank you.
>
> LGTM as well. One thing I am not completely sure about is whether to
> make this change in PG10 for which the final release is in Nov?
> AFAICS, the leak can only occur after the relcache invalidation on the
> subscriber which may or may not be a very frequent case. What do you
> guys think?

Agree that the leak does not seem very significant, though...

> Personally, I feel it is good to fix it in all branches including PG10.

...yes, why not.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com



RE: Replica Identity check of partition table on subscriber

От
"houzj.fnst@fujitsu.com"
Дата:
On Wednesday, June 22, 2022 7:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Wed, Jun 22, 2022 at 10:09 AM Amit Langote <amitlangote09@gmail.com>
> wrote:
> >
> > On Wed, Jun 22, 2022 at 12:02 PM houzj.fnst@fujitsu.com
> > <houzj.fnst@fujitsu.com> wrote:
> > > Since the patch has been committed. Attach the last patch to fix the
> memory leak.
> > >
> > > The bug exists on PG10 ~ PG15(HEAD).
> > >
> > > For HEAD,PG14,PG13, to fix the memory leak, I think we should use
> > > free_attrmap instead of pfree and release the no-longer-useful
> > > attrmap When rebuilding the map info.
> > >
> > > For PG12,PG11,PG10, we only need to add the code to release the
> > > no-longer-useful attrmap when rebuilding the map info. We can still
> > > use
> > > pfree() because the attrmap in back-branch is a single array like:
> > >
> > > entry->attrmap = palloc(desc->natts * sizeof(AttrNumber));
> >
> > LGTM, thank you.
> >
> 
> LGTM as well. One thing I am not completely sure about is whether to make this
> change in PG10 for which the final release is in Nov?
> AFAICS, the leak can only occur after the relcache invalidation on the subscriber
> which may or may not be a very frequent case. What do you guys think?
> 
> Personally, I feel it is good to fix it in all branches including PG10.
+1

Best regards,
Hou zj

Re: Replica Identity check of partition table on subscriber

От
Amit Kapila
Дата:
On Wed, Jun 22, 2022 at 5:05 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> On Wednesday, June 22, 2022 7:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Jun 22, 2022 at 10:09 AM Amit Langote <amitlangote09@gmail.com>
> > wrote:
> > >
> > > On Wed, Jun 22, 2022 at 12:02 PM houzj.fnst@fujitsu.com
> > > <houzj.fnst@fujitsu.com> wrote:
> > > > Since the patch has been committed. Attach the last patch to fix the
> > memory leak.
> > > >
> > > > The bug exists on PG10 ~ PG15(HEAD).
> > > >
> > > > For HEAD,PG14,PG13, to fix the memory leak, I think we should use
> > > > free_attrmap instead of pfree and release the no-longer-useful
> > > > attrmap When rebuilding the map info.
> > > >
> > > > For PG12,PG11,PG10, we only need to add the code to release the
> > > > no-longer-useful attrmap when rebuilding the map info. We can still
> > > > use
> > > > pfree() because the attrmap in back-branch is a single array like:
> > > >
> > > > entry->attrmap = palloc(desc->natts * sizeof(AttrNumber));
> > >
> > > LGTM, thank you.
> > >
> >
> > LGTM as well. One thing I am not completely sure about is whether to make this
> > change in PG10 for which the final release is in Nov?
> > AFAICS, the leak can only occur after the relcache invalidation on the subscriber
> > which may or may not be a very frequent case. What do you guys think?
> >
> > Personally, I feel it is good to fix it in all branches including PG10.
> +1
>

Pushed!

-- 
With Regards,
Amit Kapila.