Re: row filtering for logical replication

Поиск
Список
Период
Сортировка
От Greg Nancarrow
Тема Re: row filtering for logical replication
Дата
Msg-id CAJcOf-dz0srExG0NPPgXh5X8eL2uxk7C=cZoGTbf8cNqoRUY6w@mail.gmail.com
обсуждение исходный текст
Ответ на RE: row filtering for logical replication  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Ответы Re: row filtering for logical replication  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Fri, Nov 26, 2021 at 12:40 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> When researching and writing a top-up patch about this.
> I found a possible issue which I'd like to confirm first.
>
> It's possible the table is published in two publications A and B, publication A
> only publish "insert" , publication B publish "update". When UPDATE, both row
> filter in A and B will be executed. Is this behavior expected?
>
> For example:
> ---- Publication
> create table tbl1 (a int primary key, b int);
> create publication A for table tbl1 where (b<2) with(publish='insert');
> create publication B for table tbl1 where (a>1) with(publish='update');
>
> ---- Subscription
> create table tbl1 (a int primary key);
> CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres host=localhost
> port=10000' PUBLICATION A,B;
>
> ---- Publication
> update tbl1 set a = 2;
>
> The publication can be created, and when UPDATE, the rowfilter in A (b<2) will
> also been executed but the column in it is not part of replica identity.
> (I am not against this behavior just confirm)
>

There seems to be problems related to allowing the row filter to
include columns that are not part of the replica identity (in the case
of publish=insert).
In your example scenario, the tbl1 WHERE clause "(b < 2)" for
publication A, that publishes inserts only, causes a problem, because
column "b" is not part of the replica identity.
To see this, follow the simple example below:
(and note, for the Subscription, the provided tbl1 definition has an
error, it should also include the 2nd column "b int", same as in the
publisher)

---- Publisher:
INSERT INTO tbl1 VALUES (1,1);
UPDATE tbl1 SET a = 2;

Prior to the UPDATE above:
On pub side, tbl1 contains (1,1).
On sub side, tbl1 contains (1,1)

After the above UPDATE:
On pub side, tbl1 contains (2,1).
On sub side, tbl1 contains (1,1), (2,1)

So the UPDATE on the pub side has resulted in an INSERT of (2,1) on
the sub side.

This is because when (1,1) is UPDATEd to (2,1), it attempts to use the
"insert" filter "(b<2)" to determine whether the old value had been
inserted (published to subscriber), but finds there is no "b" value
(because it only uses RI cols for UPDATE) and so has to assume the old
tuple doesn't exist on the subscriber, hence the UPDATE ends up doing
an INSERT.
INow if the use of RI cols were enforced for the insert filter case,
we'd properly know the answer as to whether the old row value had been
published and it would have correctly performed an UPDATE instead of
an INSERT in this case.
Thoughts?


Regards,
Greg Nancarrow
Fujitsu Australia



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Rationalizing declarations of src/common/ variables
Следующее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Synchronizing slots from primary to standby