Обсуждение: row filtering for logical replication

Поиск

Список

Период

Сортировка

row filtering for logical replication

От

Euler Taveira

Дата:

01 марта 2018 г., 02:03:02

Hi,

The attached patches add support for filtering rows in the publisher.
The output plugin will do the work if a filter was defined in CREATE
PUBLICATION command. An optional WHERE clause can be added after the
table name in the CREATE PUBLICATION such as:

CREATE PUBLICATION foo FOR TABLE departments WHERE (id > 2000 AND id <= 3000);

Row that doesn't match the WHERE clause will not be sent to the subscribers.

Patches 0001 and 0002 are only refactors and can be applied
independently. 0003 doesn't include row filtering on initial
synchronization.

Comments?


-- 
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Вложения

Re: row filtering for logical replication

От

David Fetter

Дата:

01 марта 2018 г., 03:47:52

On Wed, Feb 28, 2018 at 08:03:02PM -0300, Euler Taveira wrote:
> Hi,
> 
> The attached patches add support for filtering rows in the publisher.
> The output plugin will do the work if a filter was defined in CREATE
> PUBLICATION command. An optional WHERE clause can be added after the
> table name in the CREATE PUBLICATION such as:
> 
> CREATE PUBLICATION foo FOR TABLE departments WHERE (id > 2000 AND id <= 3000);
> 
> Row that doesn't match the WHERE clause will not be sent to the subscribers.
> 
> Patches 0001 and 0002 are only refactors and can be applied
> independently. 0003 doesn't include row filtering on initial
> synchronization.
> 
> Comments?

Great feature!  I think a lot of people will like to have the option
of trading a little extra CPU on the pub side for a bunch of network
traffic and some work on the sub side.

I noticed that the WHERE clause applies to all tables in the
publication.  Is that actually the right thing?  I'm thinking of a
case where we have foo(id, ...) and bar(foo_id, ....).  To slice that
correctly, we'd want to do the ids in the foo table and the foo_ids in
the bar table.  In the system as written, that would entail, at least
potentially, writing a lot of publications by hand.

Something like
    WHERE (
        (table_1,..., table_N) HAS (/* WHERE clause here */) AND
        (table_N+1,..., table_M) HAS (/* WHERE clause here */) AND
        ...
    )

could be one way to specify.

I also noticed that in psql, \dRp+ doesn't show the WHERE clause,
which it probably should.

Does it need regression tests?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: row filtering for logical replication

От

Craig Ringer

Дата:

01 марта 2018 г., 03:54:58

On 1 March 2018 at 07:03, Euler Taveira <euler@timbira.com.br> wrote:

Hi,

The attached patches add support for filtering rows in the publisher.
The output plugin will do the work if a filter was defined in CREATE
PUBLICATION command. An optional WHERE clause can be added after the
table name in the CREATE PUBLICATION such as:

CREATE PUBLICATION foo FOR TABLE departments WHERE (id > 2000 AND id <= 3000);

Row that doesn't match the WHERE clause will not be sent to the subscribers.

Patches 0001 and 0002 are only refactors and can be applied
independently. 0003 doesn't include row filtering on initial
synchronization.

Good idea. I haven't read this yet, but one thing to make sure you've handled is limiting the clause to referencing only the current tuple and the catalogs. user-catalog tables are OK, too, anything that is RelationIsAccessibleInLogicalDecoding().

This means only immutable functions may be invoked, since a stable or volatile function might attempt to access a table. And views must be prohibited or recursively checked. (We have tree walkers that would help with this).

It might be worth looking at the current logic for CHECK expressions, since the requirements are similar. In my opinion you could safely not bother with allowing access to user catalog tables in the filter expressions and limit them strictly to immutable functions and the tuple its self.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: row filtering for logical replication

От

Erik Rijkers

Дата:

01 марта 2018 г., 18:27:11

On 2018-03-01 00:03, Euler Taveira wrote:
> The attached patches add support for filtering rows in the publisher.

> 001-Refactor-function-create_estate_for_relation.patch
> 0002-Rename-a-WHERE-node.patch
> 0003-Row-filtering-for-logical-replication.patch

> Comments?

Very, very useful.  I really do hope this patch survives the 
late-arrival-cull.

I built this functionality into a test program I have been using and in 
simple cascading replication tests it works well.

I did find what I think is a bug (a bug easy to avoid but also easy to 
run into):
The test I used was to cascade 3 instances (all on one machine) from 
A->B->C
I ran a pgbench session in instance A, and used:
   in A: alter publication pub0_6515 add table pgbench_accounts where 
(aid between 40000 and 60000-1);
   in B: alter publication pub1_6516 add table pgbench_accounts;

The above worked well, but when I did the same but used the filter in 
both publications:
   in A: alter publication pub0_6515 add table pgbench_accounts where 
(aid between 40000 and 60000-1);
   in B: alter publication pub1_6516 add table pgbench_accounts where 
(aid between 40000 and 60000-1);

then the replication only worked for (pgbench-)scale 1 (hence: very 
little data); with larger scales it became slow (taking many minutes 
where the above had taken less than 1 minute), and ended up using far 
too much memory (or blowing up/crashing altogether).  Something not 
quite right there.

Nevertheless, I am much in favour of acquiring this functionality as 
soon as possible.

Thanks,

Erik Rijkers

Re: row filtering for logical replication

От

Euler Taveira

Дата:

01 марта 2018 г., 18:41:04

2018-02-28 21:47 GMT-03:00 David Fetter <david@fetter.org>:
> I noticed that the WHERE clause applies to all tables in the
> publication.  Is that actually the right thing?  I'm thinking of a
> case where we have foo(id, ...) and bar(foo_id, ....).  To slice that
> correctly, we'd want to do the ids in the foo table and the foo_ids in
> the bar table.  In the system as written, that would entail, at least
> potentially, writing a lot of publications by hand.
>
I didn't make it clear in my previous email and I think you misread
the attached docs. Each table can have an optional WHERE clause. I'll
made it clear when I rewrite the tests. Something like:

CREATE PUBLICATION tap_pub FOR TABLE tab_rowfilter_1 WHERE (a > 1000
AND b <> 'filtered'), tab_rowfilter_2 WHERE (c % 2 = 0),
tab_rowfilter_3;

Such syntax will not block another future feature that will publish
only few columns of the table.

> I also noticed that in psql, \dRp+ doesn't show the WHERE clause,
> which it probably should.
>
Yea, it could be added be I'm afraid of such long WHERE clauses.

> Does it need regression tests?
>
I included some tests just to demonstrate the feature but I'm planning
to add a separate test file for it.


-- 
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: row filtering for logical replication

От

David Fetter

Дата:

01 марта 2018 г., 19:06:42

On Thu, Mar 01, 2018 at 12:41:04PM -0300, Euler Taveira wrote:
> 2018-02-28 21:47 GMT-03:00 David Fetter <david@fetter.org>:
> > I noticed that the WHERE clause applies to all tables in the
> > publication.  Is that actually the right thing?  I'm thinking of a
> > case where we have foo(id, ...) and bar(foo_id, ....).  To slice that
> > correctly, we'd want to do the ids in the foo table and the foo_ids in
> > the bar table.  In the system as written, that would entail, at least
> > potentially, writing a lot of publications by hand.
> >
> I didn't make it clear in my previous email and I think you misread
> the attached docs. Each table can have an optional WHERE clause. I'll
> made it clear when I rewrite the tests. Something like:

Sorry I misunderstood.

> CREATE PUBLICATION tap_pub FOR TABLE tab_rowfilter_1 WHERE (a > 1000
> AND b <> 'filtered'), tab_rowfilter_2 WHERE (c % 2 = 0),
> tab_rowfilter_3;

That's great!

> Such syntax will not block another future feature that will publish
> only few columns of the table.
> 
> > I also noticed that in psql, \dRp+ doesn't show the WHERE clause,
> > which it probably should.
> >
> Yea, it could be added be I'm afraid of such long WHERE clauses.

I think of + as signifying, "I am ready to get a LOT of output in
order to see more detail."  Perhaps that's just me.

> > Does it need regression tests?
> >
> I included some tests just to demonstrate the feature but I'm
> planning to add a separate test file for it.

Excellent. This feature looks like a nice big chunk of the user-space
infrastructure needed for sharding, among other things.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: row filtering for logical replication

От

Erik Rijkers

Дата:

02 марта 2018 г., 00:25:02

On 2018-03-01 16:27, Erik Rijkers wrote:
> On 2018-03-01 00:03, Euler Taveira wrote:
>> The attached patches add support for filtering rows in the publisher.
> 
>> 001-Refactor-function-create_estate_for_relation.patch
>> 0002-Rename-a-WHERE-node.patch
>> 0003-Row-filtering-for-logical-replication.patch
> 
>> Comments?
> 
> Very, very useful.  I really do hope this patch survives the 
> late-arrival-cull.
> 
> I built this functionality into a test program I have been using and
> in simple cascading replication tests it works well.
> 
> I did find what I think is a bug (a bug easy to avoid but also easy to
> run into):
> The test I used was to cascade 3 instances (all on one machine) from 
> A->B->C
> I ran a pgbench session in instance A, and used:
>   in A: alter publication pub0_6515 add table pgbench_accounts where
> (aid between 40000 and 60000-1);
>   in B: alter publication pub1_6516 add table pgbench_accounts;
> 
> The above worked well, but when I did the same but used the filter in
> both publications:
>   in A: alter publication pub0_6515 add table pgbench_accounts where
> (aid between 40000 and 60000-1);
>   in B: alter publication pub1_6516 add table pgbench_accounts where
> (aid between 40000 and 60000-1);
> 
> then the replication only worked for (pgbench-)scale 1 (hence: very
> little data); with larger scales it became slow (taking many minutes
> where the above had taken less than 1 minute), and ended up using far
> too much memory (or blowing up/crashing altogether).  Something not
> quite right there.
> 
> Nevertheless, I am much in favour of acquiring this functionality as
> soon as possible.


Attached is 'logrep_rowfilter.sh', a demonstration of above-described 
bug.

The program runs initdb for 3 instances in /tmp (using ports 6515, 6516, 
and 6517) and sets up logical replication from 1->2->3.

It can be made to work by removing de where-clause on the second 'create 
publication' ( i.e., outcomment the $where2 variable ).


> Thanks,
> 
> 
> Erik Rijkers

Вложения

logrep_rowfilter.sh

Re: row filtering for logical replication

От

Andres Freund

Дата:

02 марта 2018 г., 00:27:56

Hi,

On 2018-03-01 16:27:11 +0100, Erik Rijkers wrote:
> Very, very useful.  I really do hope this patch survives the
> late-arrival-cull.

FWIW, I don't think it'd be fair or prudent. There's definitely some
issues (see e.g. Craig's reply), and I don't see why this patch'd
deserve an exemption from the "nontrivial patches shouldn't be submitted
to the last CF" policy?

- Andres

Re: row filtering for logical replication

От

David Steele

Дата:

02 марта 2018 г., 00:51:39

Hi,

On 3/1/18 4:27 PM, Andres Freund wrote:
> On 2018-03-01 16:27:11 +0100, Erik Rijkers wrote:
>> Very, very useful.  I really do hope this patch survives the
>> late-arrival-cull.
> 
> FWIW, I don't think it'd be fair or prudent. There's definitely some
> issues (see e.g. Craig's reply), and I don't see why this patch'd
> deserve an exemption from the "nontrivial patches shouldn't be submitted
> to the last CF" policy?

I'm unable to find this in the CF under the title or author name.  If it
didn't get entered then it is definitely out.

If it does have an entry, then I agree with Andres that it should be
pushed to the next CF.

-- 
-David
david@pgmasters.net

Re: row filtering for logical replication

От

Euler Taveira

Дата:

02 марта 2018 г., 02:00:37

2018-03-01 18:27 GMT-03:00 Andres Freund <andres@anarazel.de>:
> FWIW, I don't think it'd be fair or prudent. There's definitely some
> issues (see e.g. Craig's reply), and I don't see why this patch'd
> deserve an exemption from the "nontrivial patches shouldn't be submitted
> to the last CF" policy?
>
I forgot to mention but this feature is for v12. I know the rules and
that is why I didn't add it to the in progress CF.


-- 
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: row filtering for logical replication

От

Euler Taveira

Дата:

02 марта 2018 г., 02:03:42

2018-03-01 18:25 GMT-03:00 Erik Rijkers <er@xs4all.nl>:
> Attached is 'logrep_rowfilter.sh', a demonstration of above-described bug.
>
Thanks for testing. I will figure out what is happening. There are
some leaks around. I'll post another version when I fix some of those
bugs.


-- 
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: row filtering for logical replication

От

Euler Taveira

Дата:

02 марта 2018 г., 02:15:37

2018-02-28 21:54 GMT-03:00 Craig Ringer <craig@2ndquadrant.com>:
> Good idea. I haven't read this yet, but one thing to make sure you've
> handled is limiting the clause to referencing only the current tuple and the
> catalogs. user-catalog tables are OK, too, anything that is
> RelationIsAccessibleInLogicalDecoding().
>
> This means only immutable functions may be invoked, since a stable or
> volatile function might attempt to access a table. And views must be
> prohibited or recursively checked. (We have tree walkers that would help
> with this).
>
> It might be worth looking at the current logic for CHECK expressions, since
> the requirements are similar. In my opinion you could safely not bother with
> allowing access to user catalog tables in the filter expressions and limit
> them strictly to immutable functions and the tuple its self.
>
IIRC implementation is similar to RLS expressions. I'll check all of
these rules.


-- 
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: row filtering for logical replication

От

David Steele

Дата:

02 марта 2018 г., 02:16:17

On 3/1/18 6:00 PM, Euler Taveira wrote:
> 2018-03-01 18:27 GMT-03:00 Andres Freund <andres@anarazel.de>:
>> FWIW, I don't think it'd be fair or prudent. There's definitely some
>> issues (see e.g. Craig's reply), and I don't see why this patch'd
>> deserve an exemption from the "nontrivial patches shouldn't be submitted
>> to the last CF" policy?
>>
> I forgot to mention but this feature is for v12. I know the rules and
> that is why I didn't add it to the in progress CF.

That was the right thing to do, thank you!

-- 
-David
david@pgmasters.net

Re: row filtering for logical replication

От

Michael Paquier

Дата:

02 октября 2018 г., 08:03:15

On Thu, Mar 01, 2018 at 06:16:17PM -0500, David Steele wrote:
> That was the right thing to do, thank you!

This patch has been waiting on author for a couple of months and does
not apply anymore, so I am marking as returned with feedback.  If you
can rebase, please feel free to resubmit.
--
Michael

Вложения

signature.asc

Re: row filtering for logical replication

От

Euler Taveira

Дата:

01 ноября 2018 г., 00:29:59

Em qua, 28 de fev de 2018 às 20:03, Euler Taveira
<euler@timbira.com.br> escreveu:
> The attached patches add support for filtering rows in the publisher.
>
I rebased the patch. I added row filtering for initial
synchronization, pg_dump support and psql support. 0001 removes unused
code. 0002 reduces memory use. 0003 passes only structure member that
is used in create_estate_for_relation. 0004 reuses a parser node for
row filtering. 0005 is the feature. 0006 prints WHERE expression in
psql. 0007 adds pg_dump support. 0008 is only for debug purposes (I'm
not sure some of these messages will be part of the final patch).
0001, 0002, 0003 and 0008 are not mandatory for this feature.

Comments?


--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Вложения

Re: row filtering for logical replication

От

Erik Rijkers

Дата:

01 ноября 2018 г., 07:56:48

On 2018-11-01 01:29, Euler Taveira wrote:
> Em qua, 28 de fev de 2018 às 20:03, Euler Taveira
> <euler@timbira.com.br> escreveu:
>> The attached patches add support for filtering rows in the publisher.
>> 

I ran pgbench-over-logical-replication with a WHERE-clause and could not 
get this to do a correct replication.  Below is the output of the 
attached test program.


$ ./logrep_rowfilter.sh
-- 
/home/aardvark/pg_stuff/pg_installations/pgsql.logrep_rowfilter/bin.fast/initdb 
--pgdata=/tmp/cascade/instance1/data --encoding=UTF8 --pwfile=/tmp/bugs
-- 
/home/aardvark/pg_stuff/pg_installations/pgsql.logrep_rowfilter/bin.fast/initdb 
--pgdata=/tmp/cascade/instance2/data --encoding=UTF8 --pwfile=/tmp/bugs
-- 
/home/aardvark/pg_stuff/pg_installations/pgsql.logrep_rowfilter/bin.fast/initdb 
--pgdata=/tmp/cascade/instance3/data --encoding=UTF8 --pwfile=/tmp/bugs
sleep 3s
dropping old tables...
creating tables...
generating data...
100000 of 100000 tuples (100%) done (elapsed 0.09 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done.
create publication pub_6515_to_6516;
alter publication pub_6515_to_6516 add table pgbench_accounts where (aid 
between 40000 and 60000-1) ; --> where 1
alter publication pub_6515_to_6516 add table pgbench_branches;
alter publication pub_6515_to_6516 add table pgbench_tellers;
alter publication pub_6515_to_6516 add table pgbench_history;
create publication pub_6516_to_6517;
alter publication pub_6516_to_6517 add table pgbench_accounts ; -- where 
(aid between 40000 and 60000-1) ; --> where 2
alter publication pub_6516_to_6517 add table pgbench_branches;
alter publication pub_6516_to_6517 add table pgbench_tellers;
alter publication pub_6516_to_6517 add table pgbench_history;

create subscription pub_6516_from_6515 connection 'port=6515 
application_name=rowfilter'
        publication pub_6515_to_6516 with(enabled=false);
alter subscription pub_6516_from_6515 enable;
create subscription pub_6517_from_6516 connection 'port=6516 
application_name=rowfilter'
        publication pub_6516_to_6517 with(enabled=false);
alter subscription pub_6517_from_6516 enable;
-- pgbench -p 6515 -c 16 -j 8 -T 5 -n postgres    #  scale 1
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 16
number of threads: 8
duration: 5 s
number of transactions actually processed: 80
latency average = 1178.106 ms
tps = 13.581120 (including connections establishing)
tps = 13.597443 (excluding connections establishing)

        accounts  branches   tellers   history
        --------- --------- --------- ---------
6515   6546b1f0f 2d328ed28 7406473b0 7c1351523    e8c07347b
6516   6546b1f0f 2d328ed28 d41d8cd98 d41d8cd98    e7235f541
6517   f7c0791c8 d9c63e471 d41d8cd98 d41d8cd98    30892eea1   NOK

6515   6546b1f0f 2d328ed28 7406473b0 7c1351523    e8c07347b
6516   6546b1f0f 2d328ed28 7406473b0 5a54cf7c5    191ae1af3
6517   6546b1f0f 2d328ed28 7406473b0 5a54cf7c5    191ae1af3   NOK

6515   6546b1f0f 2d328ed28 7406473b0 7c1351523    e8c07347b
6516   6546b1f0f 2d328ed28 7406473b0 5a54cf7c5    191ae1af3
6517   6546b1f0f 2d328ed28 7406473b0 5a54cf7c5    191ae1af3   NOK

[...]

I let that run for 10 minutes or so but that pgbench_history table 
md5-values (of ports 6516 and 6517) do not change anymore, which shows 
that it is and remains different from the original pgbench_history table 
in 6515.


When there is a where-clause this goes *always* wrong.

Without a where-clause all logical replication tests were OK.  Perhaps 
the error is not in our patch but something in logical replication.

Attached is the test program (will need some tweaking of PATHs, 
PG-variables (PGPASSFILE) etc).  This is the same program I used in 
march when you first posted a version of this patch alhough the error is 
different.


thanks,


Erik Rijkers

Вложения

logrep_rowfilter.sh

Re: row filtering for logical replication

От

Erik Rijkers

Дата:

01 ноября 2018 г., 08:30:36

On 2018-11-01 08:56, Erik Rijkers wrote:
> On 2018-11-01 01:29, Euler Taveira wrote:
>> Em qua, 28 de fev de 2018 às 20:03, Euler Taveira
>> <euler@timbira.com.br> escreveu:
>>> The attached patches add support for filtering rows in the publisher.
>>> 
> 
> I ran pgbench-over-logical-replication with a WHERE-clause and could
> not get this to do a correct replication.  Below is the output of the
> attached test program.
> 
> 
> $ ./logrep_rowfilter.sh

I have noticed that the failure to replicate correctly can be avoided by 
putting a wait state of (on my machine) at least 3 seconds between the 
setting up of the subscription and the start of pgbench.  See the bash 
program I attached in my previous mail.  The bug can be avoided by a 
'sleep 5' just before the start of the actual pgbench run.

So it seems this bug is due to some timing error in your patch (or 
possibly in logical replication itself).

Erik Rijkers

Re: row filtering for logical replication

От

Euler Taveira

Дата:

02 ноября 2018 г., 01:59:36

Em qui, 1 de nov de 2018 às 05:30, Erik Rijkers <er@xs4all.nl> escreveu:
> > I ran pgbench-over-logical-replication with a WHERE-clause and could
> > not get this to do a correct replication.  Below is the output of the
> > attached test program.
> >
> >
> > $ ./logrep_rowfilter.sh
>
Erik, thanks for testing.

> So it seems this bug is due to some timing error in your patch (or
> possibly in logical replication itself).
>
It is a bug in the new synchronization code. I'm doing some code
cleanup/review and will post a new patchset after I finish it. If you
want to give it a try again, apply the following patch.

diff --git a/src/backend/replication/logical/tablesync.c
b/src/backend/replication/logical/tablesync.c
index e0eb73c..4797e0b 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -757,7 +757,7 @@ fetch_remote_table_info(char *nspname, char *relname,

        /* Fetch row filtering info */
        resetStringInfo(&cmd);
-       appendStringInfo(&cmd, "SELECT pg_get_expr(prrowfilter,
prrelid) FROM pg_publication p INNER JOIN pg_publication_rel pr ON
(p.oid = pr.prpubid) WHERE pr.prrelid = %u AND p.pubname IN (",
MyLogicalRepWorker->relid);
+       appendStringInfo(&cmd, "SELECT pg_get_expr(prrowfilter,
prrelid) FROM pg_publication p INNER JOIN pg_publication_rel pr ON
(p.oid = pr.prpubid) WHERE pr.prrelid = %u AND p.pubname IN (",
lrel->remoteid);


--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: row filtering for logical replication

От

Erik Rijkers

Дата:

02 ноября 2018 г., 06:55:05

On 2018-11-02 02:59, Euler Taveira wrote:
> Em qui, 1 de nov de 2018 às 05:30, Erik Rijkers <er@xs4all.nl> 
> escreveu:
>> > I ran pgbench-over-logical-replication with a WHERE-clause and could
>> > not get this to do a correct replication.  Below is the output of the
>> > attached test program.
>> >
>> >
>> > $ ./logrep_rowfilter.sh
>> 
> Erik, thanks for testing.
> 
>> So it seems this bug is due to some timing error in your patch (or
>> possibly in logical replication itself).
>> 
> It is a bug in the new synchronization code. I'm doing some code
> cleanup/review and will post a new patchset after I finish it. If you
> want to give it a try again, apply the following patch.
> 
> diff --git a/src/backend/replication/logical/tablesync.c
> b/src/backend/replication/logical/tablesync.c
> index e0eb73c..4797e0b 100644
> --- a/src/backend/replication/logical/tablesync.c
> +++ b/src/backend/replication/logical/tablesync.c
> [...]


That does indeed fix it.

Thank you,

Erik Rijkers

Re: row filtering for logical replication

От

Hironobu SUZUKI

Дата:

21 ноября 2018 г., 05:51:34

On 2018/11/01 0:29, Euler Taveira wrote:
> Em qua, 28 de fev de 2018 às 20:03, Euler Taveira
> <euler@timbira.com.br> escreveu:
>> The attached patches add support for filtering rows in the publisher.
>>
> I rebased the patch. I added row filtering for initial
> synchronization, pg_dump support and psql support. 0001 removes unused
> code. 0002 reduces memory use. 0003 passes only structure member that
> is used in create_estate_for_relation. 0004 reuses a parser node for
> row filtering. 0005 is the feature. 0006 prints WHERE expression in
> psql. 0007 adds pg_dump support. 0008 is only for debug purposes (I'm
> not sure some of these messages will be part of the final patch).
> 0001, 0002, 0003 and 0008 are not mandatory for this feature.
> 
> Comments?
> 
> 

Hi,

I reviewed your patches and I found a bug when I tested ALTER 
PUBLICATION statement.

In short, ALTER PUBLICATION SET with a WHERE clause does not applied new 
WHERE clause.

I describe the outline of the test I did and my conclusion.

[TEST]
I show the test case I tried in below.

(1)Publisher and Subscriber

I executed each statement on the publisher and the subscriber.

```
testdb=# CREATE PUBLICATION pub_testdb_t FOR TABLE t WHERE (id > 10);
CREATE PUBLICATION
```

```
testdb=# CREATE SUBSCRIPTION sub_testdb_t CONNECTION 'dbname=testdb 
port=5432 user=postgres' PUBLICATION pub_testdb_t;
NOTICE:  created replication slot "sub_testdb_t" on publisher
CREATE SUBSCRIPTION
```

(2)Publisher

I executed these statements shown below.

testdb=# INSERT INTO t VALUES (1,1);
INSERT 0 1
testdb=# INSERT INTO t VALUES (11,11);
INSERT 0 1

(3)Subscriber

I confirmed that the CREATE PUBLICATION statement worked well.

```
testdb=# SELECT * FROM t;
  id | data
----+------
  11 |   11
(1 row)
```

(4)Publisher
After that, I executed ALTER PUBLICATION with a WHERE clause and 
inserted a new row.

```
testdb=# ALTER  PUBLICATION pub_testdb_t SET TABLE t WHERE (id > 5);
ALTER PUBLICATION

testdb=# INSERT INTO t VALUES (7,7);
INSERT 0 1

testdb=# SELECT * FROM t;
  id | data
----+------
   1 |    1
  11 |   11
   7 |    7
(3 rows)
```

(5)Subscriber
I confirmed that the change of WHERE clause set by ALTER PUBLICATION 
statement was ignored.

```
testdb=# SELECT * FROM t;
  id | data
----+------
  11 |   11
(1 row)
```

[Conclusion]
I think AlterPublicationTables()@publicationcmds.c has a bug.

In the foreach(oldlc, oldrelids) loop, oldrel must be appended to 
delrels if oldrel or newrel has a WHERE clause. However, the current 
implementation does not, therefore, old WHERE clause is not deleted and 
the new WHERE clause is ignored.

This is my speculation. It may not be correct, but , at least, it is a 
fact that ALTER PUBLICATION with a WHERE clause is not functioned in my 
environment and my operation described in above.

Best regards,

Re: row filtering for logical replication

От

Petr Jelinek

Дата:

22 ноября 2018 г., 23:03:27

On 01/11/2018 01:29, Euler Taveira wrote:
> Em qua, 28 de fev de 2018 às 20:03, Euler Taveira
> <euler@timbira.com.br> escreveu:
>> The attached patches add support for filtering rows in the publisher.
>>
> I rebased the patch. I added row filtering for initial
> synchronization, pg_dump support and psql support. 0001 removes unused
> code. 0002 reduces memory use. 0003 passes only structure member that
> is used in create_estate_for_relation. 0004 reuses a parser node for
> row filtering. 0005 is the feature. 0006 prints WHERE expression in
> psql. 0007 adds pg_dump support. 0008 is only for debug purposes (I'm
> not sure some of these messages will be part of the final patch).
> 0001, 0002, 0003 and 0008 are not mandatory for this feature.
> 
> Comments?
> 

Hi,

I think there are two main topics that still need to be discussed about
this patch.

Firstly, I am not sure if it's wise to allow UDFs in the filter clause
for the table. The reason for that is that we can't record all necessary
dependencies there because the functions are black box for parser. That
means if somebody drops object that an UDF used in replication filter
depends on, that function will start failing. But unlike for user
sessions it will start failing during decoding (well processing in
output plugin). And that's not recoverable by reading the missing
object, the only way to get out of that is either to move slot forward
which means losing part of replication stream and need for manual resync
or full rebuild of replication. Neither of which are good IMHO.

Secondly, do we want to at least notify user on filters (or maybe even
disallow them) with combination of action + column where column value
will not be logged? I mean for example we do this when processing the
filter against a row:

> +        ExecStoreHeapTuple(new_tuple ? new_tuple : old_tuple, ecxt->ecxt_scantuple, false);

But if user has expression on column which is not part of replica
identity that expression will always return NULL for DELETEs because
only replica identity is logged with actual values and everything else
in NULL in old_tuple. So if publication replicates deletes we should
check for this somehow.

Btw about code (you already fixed the wrong reloid in sync so skipping
that).

0002:
> +    for (tupn = 0; tupn < walres->ntuples; tupn++)
>      {
> -        char       *cstrs[MaxTupleAttributeNumber];
> +        char    **cstrs;
>  
>          CHECK_FOR_INTERRUPTS();
>  
>          /* Do the allocations in temporary context. */
>          oldcontext = MemoryContextSwitchTo(rowcontext);
>  
> +        cstrs = palloc(nfields * sizeof(char *));

Not really sure that this is actually worth it given that we have to
allocate and free this in a loop now while before it was just sitting on
a stack.

0005:
> @@ -654,5 +740,10 @@ rel_sync_cache_publication_cb(Datum arg, int cacheid, uint32 hashvalue)
>       */
>      hash_seq_init(&status, RelationSyncCache);
>      while ((entry = (RelationSyncEntry *) hash_seq_search(&status)) != NULL)
> +    {
>          entry->replicate_valid = false;
> +        if (list_length(entry->row_filter) > 0)
> +            list_free(entry->row_filter);
> +        entry->row_filter = NIL;
> +    }

Won't this leak memory? The list_free only frees the list cells, but not
the nodes you stored there before.

Also I think we should document here that the expression is run with the
session environment of the replication connection (so that it's more
obvious that things like CURRENT_USER will not return user which changed
tuple but the replication user).

It would be nice if 0006 had regression test and 0007 TAP test.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Re: row filtering for logical replication

От

Stephen Frost

Дата:

23 ноября 2018 г., 02:02:35

Greetings,

* Euler Taveira (euler@timbira.com.br) wrote:
> 2018-02-28 21:54 GMT-03:00 Craig Ringer <craig@2ndquadrant.com>:
> > Good idea. I haven't read this yet, but one thing to make sure you've
> > handled is limiting the clause to referencing only the current tuple and the
> > catalogs. user-catalog tables are OK, too, anything that is
> > RelationIsAccessibleInLogicalDecoding().
> >
> > This means only immutable functions may be invoked, since a stable or
> > volatile function might attempt to access a table. And views must be
> > prohibited or recursively checked. (We have tree walkers that would help
> > with this).
> >
> > It might be worth looking at the current logic for CHECK expressions, since
> > the requirements are similar. In my opinion you could safely not bother with
> > allowing access to user catalog tables in the filter expressions and limit
> > them strictly to immutable functions and the tuple its self.
>
> IIRC implementation is similar to RLS expressions. I'll check all of
> these rules.

Given the similarity to RLS and the nearby discussion about allowing
non-superusers to create subscriptions, and probably publications later,
I wonder if we shouldn't be somehow associating this with RLS policies
instead of having the publication filtering be entirely independent..

Thanks!

Stephen

Вложения

signature.asc

Re: row filtering for logical replication

От

Petr Jelinek

Дата:

23 ноября 2018 г., 14:40:18

On 23/11/2018 03:02, Stephen Frost wrote:
> Greetings,
> 
> * Euler Taveira (euler@timbira.com.br) wrote:
>> 2018-02-28 21:54 GMT-03:00 Craig Ringer <craig@2ndquadrant.com>:
>>> Good idea. I haven't read this yet, but one thing to make sure you've
>>> handled is limiting the clause to referencing only the current tuple and the
>>> catalogs. user-catalog tables are OK, too, anything that is
>>> RelationIsAccessibleInLogicalDecoding().
>>>
>>> This means only immutable functions may be invoked, since a stable or
>>> volatile function might attempt to access a table. And views must be
>>> prohibited or recursively checked. (We have tree walkers that would help
>>> with this).
>>>
>>> It might be worth looking at the current logic for CHECK expressions, since
>>> the requirements are similar. In my opinion you could safely not bother with
>>> allowing access to user catalog tables in the filter expressions and limit
>>> them strictly to immutable functions and the tuple its self.
>>
>> IIRC implementation is similar to RLS expressions. I'll check all of
>> these rules.
> 
> Given the similarity to RLS and the nearby discussion about allowing
> non-superusers to create subscriptions, and probably publications later,
> I wonder if we shouldn't be somehow associating this with RLS policies
> instead of having the publication filtering be entirely independent..
> 
I do see the appeal here, if you consider logical replication to be a
streaming select it probably applies well.

But given that this is happening inside output plugin which does not
have full executor setup and has catalog-only snapshot I am not sure how
feasible it is to try to merge these two things. As per my previous
email it's possible that we'll have to be stricter about what we allow
in expressions here.

The other issue with merging this is that the use-case for filtering out
the data in logical replication is not necessarily about security, but
often about sending only relevant data. So it makes sense to have filter
on publication without RLS enabled on table and if we'd force that, we'd
limit usefulness of this feature.

We definitely want to eventually create subscriptions as non-superuser
but that has zero effect on this as everything here is happening on
different server than where subscription lives (we already allow
creation of publications with just CREATE privilege on database and
ownership of the table).

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Re: row filtering for logical replication

От

Euler Taveira

Дата:

23 ноября 2018 г., 15:17:12

Em sex, 23 de nov de 2018 às 11:40, Petr Jelinek
<petr.jelinek@2ndquadrant.com> escreveu:
> But given that this is happening inside output plugin which does not
> have full executor setup and has catalog-only snapshot I am not sure how
> feasible it is to try to merge these two things. As per my previous
> email it's possible that we'll have to be stricter about what we allow
> in expressions here.
>
This feature should be as simple as possible. I don't want to
introduce a huge overhead just for filtering some data. Data sharding
generally uses simple expressions.

> The other issue with merging this is that the use-case for filtering out
> the data in logical replication is not necessarily about security, but
> often about sending only relevant data. So it makes sense to have filter
> on publication without RLS enabled on table and if we'd force that, we'd
> limit usefulness of this feature.
>
Use the same infrastructure as RLS could be a good idea but use RLS
for row filtering is not. RLS is complex.


--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: row filtering for logical replication

От

Euler Taveira

Дата:

23 ноября 2018 г., 16:15:08

Em qui, 22 de nov de 2018 às 20:03, Petr Jelinek
<petr.jelinek@2ndquadrant.com> escreveu:
> Firstly, I am not sure if it's wise to allow UDFs in the filter clause
> for the table. The reason for that is that we can't record all necessary
> dependencies there because the functions are black box for parser. That
> means if somebody drops object that an UDF used in replication filter
> depends on, that function will start failing. But unlike for user
> sessions it will start failing during decoding (well processing in
> output plugin). And that's not recoverable by reading the missing
> object, the only way to get out of that is either to move slot forward
> which means losing part of replication stream and need for manual resync
> or full rebuild of replication. Neither of which are good IMHO.
>
It is a foot gun but there are several ways to do bad things in
postgres. CREATE PUBLICATION is restricted to superusers and role with
CREATE privilege in current database. AFAICS a role with CREATE
privilege cannot drop objects whose owner is not himself. I wouldn't
like to disallow UDFs in row filtering expressions just because
someone doesn't set permissions correctly. Do you have any other case
in mind?

> Secondly, do we want to at least notify user on filters (or maybe even
> disallow them) with combination of action + column where column value
> will not be logged? I mean for example we do this when processing the
> filter against a row:
>
> > +             ExecStoreHeapTuple(new_tuple ? new_tuple : old_tuple, ecxt->ecxt_scantuple, false);
>
We could emit a LOG message. That could possibly be an option but it
could be too complex for the first version.

> But if user has expression on column which is not part of replica
> identity that expression will always return NULL for DELETEs because
> only replica identity is logged with actual values and everything else
> in NULL in old_tuple. So if publication replicates deletes we should
> check for this somehow.
>
In this case, we should document this behavior. That is a recurring
question in wal2json issues. Besides that we should explain that
UPDATE/DELETE tuples doesn't log all columns (people think the
behavior is equivalent to triggers; it is not unless you set REPLICA
IDENTITY FULL).

> Not really sure that this is actually worth it given that we have to
> allocate and free this in a loop now while before it was just sitting on
> a stack.
>
That is a experimentation code that should be in a separate patch.
Don't you think low memory use is a good goal? I also think that
MaxTupleAttributeNumber is an extreme value. I didn't some preliminary
tests and didn't notice overheads. I'll leave these modifications in a
separate patch.

> Won't this leak memory? The list_free only frees the list cells, but not
> the nodes you stored there before.
>
Good catch. It should be list_free_deep.

> Also I think we should document here that the expression is run with the
> session environment of the replication connection (so that it's more
> obvious that things like CURRENT_USER will not return user which changed
> tuple but the replication user).
>
Sure.

> It would be nice if 0006 had regression test and 0007 TAP test.
>
Sure.

Besides the problem presented by Hironobu-san, I'm doing some cleanup
and improving docs. I also forget to declare pg_publication_rel TOAST
table.

Thanks for your review.


--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: row filtering for logical replication

От

Alvaro Herrera

Дата:

23 ноября 2018 г., 16:19:33

On 2018-Nov-23, Euler Taveira wrote:

> Em qui, 22 de nov de 2018 às 20:03, Petr Jelinek
> <petr.jelinek@2ndquadrant.com> escreveu:

> > Won't this leak memory? The list_free only frees the list cells, but not
> > the nodes you stored there before.
>
> Good catch. It should be list_free_deep.

Actually, if the nodes have more structure (say you palloc one list
item, but that list item also contains pointers to a Node) then a
list_free_deep won't be enough either.  I'd suggest to create a bespoke
memory context, which you can delete afterwards.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: row filtering for logical replication

От

David Fetter

Дата:

23 ноября 2018 г., 16:39:14

On Fri, Nov 23, 2018 at 12:03:27AM +0100, Petr Jelinek wrote:
> On 01/11/2018 01:29, Euler Taveira wrote:
> > Em qua, 28 de fev de 2018 às 20:03, Euler Taveira
> > <euler@timbira.com.br> escreveu:
> >> The attached patches add support for filtering rows in the publisher.
> >>
> > I rebased the patch. I added row filtering for initial
> > synchronization, pg_dump support and psql support. 0001 removes unused
> > code. 0002 reduces memory use. 0003 passes only structure member that
> > is used in create_estate_for_relation. 0004 reuses a parser node for
> > row filtering. 0005 is the feature. 0006 prints WHERE expression in
> > psql. 0007 adds pg_dump support. 0008 is only for debug purposes (I'm
> > not sure some of these messages will be part of the final patch).
> > 0001, 0002, 0003 and 0008 are not mandatory for this feature.
> 
> Hi,
> 
> I think there are two main topics that still need to be discussed about
> this patch.
> 
> Firstly, I am not sure if it's wise to allow UDFs in the filter clause
> for the table. The reason for that is that we can't record all necessary
> dependencies there because the functions are black box for parser.

Some UDFs are not a black box for the parser, namely ones written in
SQL. Would it make sense at least not to foreclose the non-(black box)
option?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: row filtering for logical replication

От

Petr Jelinek

Дата:

23 ноября 2018 г., 17:53:42

On 23/11/2018 17:39, David Fetter wrote:
> On Fri, Nov 23, 2018 at 12:03:27AM +0100, Petr Jelinek wrote:
>> On 01/11/2018 01:29, Euler Taveira wrote:
>>> Em qua, 28 de fev de 2018 às 20:03, Euler Taveira
>>> <euler@timbira.com.br> escreveu:
>>>> The attached patches add support for filtering rows in the publisher.
>>>>
>>> I rebased the patch. I added row filtering for initial
>>> synchronization, pg_dump support and psql support. 0001 removes unused
>>> code. 0002 reduces memory use. 0003 passes only structure member that
>>> is used in create_estate_for_relation. 0004 reuses a parser node for
>>> row filtering. 0005 is the feature. 0006 prints WHERE expression in
>>> psql. 0007 adds pg_dump support. 0008 is only for debug purposes (I'm
>>> not sure some of these messages will be part of the final patch).
>>> 0001, 0002, 0003 and 0008 are not mandatory for this feature.
>>
>> Hi,
>>
>> I think there are two main topics that still need to be discussed about
>> this patch.
>>
>> Firstly, I am not sure if it's wise to allow UDFs in the filter clause
>> for the table. The reason for that is that we can't record all necessary
>> dependencies there because the functions are black box for parser.
> 
> Some UDFs are not a black box for the parser, namely ones written in
> SQL. Would it make sense at least not to foreclose the non-(black box)
> option?
> 

Yeah inlinable SQL functions should be fine, we just need the ability to
extract dependencies.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Re: row filtering for logical replication

От

Petr Jelinek

Дата:

23 ноября 2018 г., 17:54:58

On 23/11/2018 17:15, Euler Taveira wrote:
> Em qui, 22 de nov de 2018 às 20:03, Petr Jelinek
> <petr.jelinek@2ndquadrant.com> escreveu:
>> Firstly, I am not sure if it's wise to allow UDFs in the filter clause
>> for the table. The reason for that is that we can't record all necessary
>> dependencies there because the functions are black box for parser. That
>> means if somebody drops object that an UDF used in replication filter
>> depends on, that function will start failing. But unlike for user
>> sessions it will start failing during decoding (well processing in
>> output plugin). And that's not recoverable by reading the missing
>> object, the only way to get out of that is either to move slot forward
>> which means losing part of replication stream and need for manual resync
>> or full rebuild of replication. Neither of which are good IMHO.
>>
> It is a foot gun but there are several ways to do bad things in
> postgres. CREATE PUBLICATION is restricted to superusers and role with
> CREATE privilege in current database. AFAICS a role with CREATE
> privilege cannot drop objects whose owner is not himself. I wouldn't
> like to disallow UDFs in row filtering expressions just because
> someone doesn't set permissions correctly. Do you have any other case
> in mind?

I don't think this has anything to do with security. Stupid example:

user1: CREATE EXTENSION citext;

user2: CREATE FUNCTION myfilter(col1 text, col2 text) returns boolean
language plpgsql as
$$BEGIN
RETURN col1::citext = col2::citext;
END;$$

user2: ALTER PUBLICATION mypub ADD TABLE mytab WHERE (myfilter(a,b));

[... replication happening ...]

user1: DROP EXTENSION citext;

And now replication is broken and unrecoverable without data loss.
Recreating extension will not help because the changes happening in
meantime will not see it in the historical snapshot.

I don't think it's okay to do completely nothing about this.

> 
>> Secondly, do we want to at least notify user on filters (or maybe even
>> disallow them) with combination of action + column where column value
>> will not be logged? I mean for example we do this when processing the
>> filter against a row:
>>
>>> +             ExecStoreHeapTuple(new_tuple ? new_tuple : old_tuple, ecxt->ecxt_scantuple, false);
>>
> We could emit a LOG message. That could possibly be an option but it
> could be too complex for the first version.
>

Well, it needs walker which extracts Vars from the expression and checks
them against replica identity columns. We already have a way to fetch
replica identity columns and the walker could be something like
simplified version of the find_expr_references_walker used by the
recordDependencyOnSingleRelExpr (I don't think there is anything ready
made already).

>> But if user has expression on column which is not part of replica
>> identity that expression will always return NULL for DELETEs because
>> only replica identity is logged with actual values and everything else
>> in NULL in old_tuple. So if publication replicates deletes we should
>> check for this somehow.
>>
> In this case, we should document this behavior. That is a recurring
> question in wal2json issues. Besides that we should explain that
> UPDATE/DELETE tuples doesn't log all columns (people think the
> behavior is equivalent to triggers; it is not unless you set REPLICA
> IDENTITY FULL).
> 
>> Not really sure that this is actually worth it given that we have to
>> allocate and free this in a loop now while before it was just sitting on
>> a stack.
>>
> That is a experimentation code that should be in a separate patch.
> Don't you think low memory use is a good goal? I also think that
> MaxTupleAttributeNumber is an extreme value. I didn't some preliminary
> tests and didn't notice overheads. I'll leave these modifications in a
> separate patch.
> 

It's static memory and it's a few KB of it (it's just single array of
pointers, not array of data, and does not depend on the number of rows).
Palloc will definitely need more CPU cycles.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Re: row filtering for logical replication

От

Fabrízio de Royes Mello

Дата:

23 ноября 2018 г., 18:05:21

On Fri, Nov 23, 2018 at 3:55 PM Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
>
> On 23/11/2018 17:15, Euler Taveira wrote:
> > Em qui, 22 de nov de 2018 às 20:03, Petr Jelinek
> > <petr.jelinek@2ndquadrant.com> escreveu:
> >> Firstly, I am not sure if it's wise to allow UDFs in the filter clause
> >> for the table. The reason for that is that we can't record all necessary
> >> dependencies there because the functions are black box for parser. That
> >> means if somebody drops object that an UDF used in replication filter
> >> depends on, that function will start failing. But unlike for user
> >> sessions it will start failing during decoding (well processing in
> >> output plugin). And that's not recoverable by reading the missing
> >> object, the only way to get out of that is either to move slot forward
> >> which means losing part of replication stream and need for manual resync
> >> or full rebuild of replication. Neither of which are good IMHO.
> >>
> > It is a foot gun but there are several ways to do bad things in
> > postgres. CREATE PUBLICATION is restricted to superusers and role with
> > CREATE privilege in current database. AFAICS a role with CREATE
> > privilege cannot drop objects whose owner is not himself. I wouldn't
> > like to disallow UDFs in row filtering expressions just because
> > someone doesn't set permissions correctly. Do you have any other case
> > in mind?
>
> I don't think this has anything to do with security. Stupid example:
>
> user1: CREATE EXTENSION citext;
>
> user2: CREATE FUNCTION myfilter(col1 text, col2 text) returns boolean
> language plpgsql as
> $$BEGIN
> RETURN col1::citext = col2::citext;
> END;$$
>
> user2: ALTER PUBLICATION mypub ADD TABLE mytab WHERE (myfilter(a,b));
>
> [... replication happening ...]
>
> user1: DROP EXTENSION citext;
>
> And now replication is broken and unrecoverable without data loss.
> Recreating extension will not help because the changes happening in
> meantime will not see it in the historical snapshot.
>
> I don't think it's okay to do completely nothing about this.
>

If carefully documented I see no problem with it... we already have an analogous problem with functional indexes.

Regards,

--
Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: row filtering for logical replication

От

Petr Jelinek

Дата:

23 ноября 2018 г., 18:13:53

On 23/11/2018 19:05, Fabrízio de Royes Mello wrote:
> On Fri, Nov 23, 2018 at 3:55 PM Petr Jelinek
> <petr.jelinek@2ndquadrant.com <mailto:petr.jelinek@2ndquadrant.com>> wrote:
>>
>> On 23/11/2018 17:15, Euler Taveira wrote:
>> > Em qui, 22 de nov de 2018 às 20:03, Petr Jelinek
>> > <petr.jelinek@2ndquadrant.com <mailto:petr.jelinek@2ndquadrant.com>>
> escreveu:
>> >> Firstly, I am not sure if it's wise to allow UDFs in the filter clause
>> >> for the table. The reason for that is that we can't record all
> necessary
>> >> dependencies there because the functions are black box for parser. That
>> >> means if somebody drops object that an UDF used in replication filter
>> >> depends on, that function will start failing. But unlike for user
>> >> sessions it will start failing during decoding (well processing in
>> >> output plugin). And that's not recoverable by reading the missing
>> >> object, the only way to get out of that is either to move slot forward
>> >> which means losing part of replication stream and need for manual
> resync
>> >> or full rebuild of replication. Neither of which are good IMHO.
>> >>
>> > It is a foot gun but there are several ways to do bad things in
>> > postgres. CREATE PUBLICATION is restricted to superusers and role with
>> > CREATE privilege in current database. AFAICS a role with CREATE
>> > privilege cannot drop objects whose owner is not himself. I wouldn't
>> > like to disallow UDFs in row filtering expressions just because
>> > someone doesn't set permissions correctly. Do you have any other case
>> > in mind?
>>
>> I don't think this has anything to do with security. Stupid example:
>>
>> user1: CREATE EXTENSION citext;
>>
>> user2: CREATE FUNCTION myfilter(col1 text, col2 text) returns boolean
>> language plpgsql as
>> $$BEGIN
>> RETURN col1::citext = col2::citext;
>> END;$$
>>
>> user2: ALTER PUBLICATION mypub ADD TABLE mytab WHERE (myfilter(a,b));
>>
>> [... replication happening ...]
>>
>> user1: DROP EXTENSION citext;
>>
>> And now replication is broken and unrecoverable without data loss.
>> Recreating extension will not help because the changes happening in
>> meantime will not see it in the historical snapshot.
>>
>> I don't think it's okay to do completely nothing about this.
>>
> 
> If carefully documented I see no problem with it... we already have an
> analogous problem with functional indexes.

The difference is that with functional indexes you can recreate the
missing object and everything is okay again. With logical replication
recreating the object will not help.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Re: row filtering for logical replication

От

Fabrízio de Royes Mello

Дата:

23 ноября 2018 г., 18:29:16

On Fri, Nov 23, 2018 at 4:13 PM Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
>

> >
> > If carefully documented I see no problem with it... we already have an
> > analogous problem with functional indexes.
>
> The difference is that with functional indexes you can recreate the
> missing object and everything is okay again. With logical replication
> recreating the object will not help.
>

In this case with logical replication you should rsync the object. That is the price of misunderstanding / bad use of the new feature.

As usual, there are no free beer ;-)

Regards,

--
Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: row filtering for logical replication

От

Stephen Frost

Дата:

23 ноября 2018 г., 19:03:30

Greetings,

* Fabrízio de Royes Mello (fabriziomello@gmail.com) wrote:
> On Fri, Nov 23, 2018 at 4:13 PM Petr Jelinek <petr.jelinek@2ndquadrant.com>
> wrote:
> > > If carefully documented I see no problem with it... we already have an
> > > analogous problem with functional indexes.
> >
> > The difference is that with functional indexes you can recreate the
> > missing object and everything is okay again. With logical replication
> > recreating the object will not help.
> >
>
> In this case with logical replication you should rsync the object. That is
> the price of misunderstanding / bad use of the new feature.
>
> As usual, there are no free beer ;-)

There's also certainly no shortage of other ways to break logical
replication, including ways that would also be hard to recover from
today other than doing a full resync.

What that seems to indicate, to me at least, is that it'd be awful nice
to have a way to resync the data which doesn't necessairly involve
transferring all of it over again.

Of course, it'd be nice if we could track those dependencies too,
but that's yet another thing.

In short, I'm not sure that I agree with the idea that we shouldn't
allow this and instead I'd rather we realize it and put the logical
replication into some kind of an error state that requires a resync.

Thanks!

Stephen

Вложения

signature.asc

Re: row filtering for logical replication

От

Petr Jelinek

Дата:

23 ноября 2018 г., 19:14:12

On 23/11/2018 19:29, Fabrízio de Royes Mello wrote:
> 
> On Fri, Nov 23, 2018 at 4:13 PM Petr Jelinek
> <petr.jelinek@2ndquadrant.com <mailto:petr.jelinek@2ndquadrant.com>> wrote:
>>
>> >
>> > If carefully documented I see no problem with it... we already have an
>> > analogous problem with functional indexes.
>>
>> The difference is that with functional indexes you can recreate the
>> missing object and everything is okay again. With logical replication
>> recreating the object will not help.
>>
> 
> In this case with logical replication you should rsync the object. That
> is the price of misunderstanding / bad use of the new feature.
> 
> As usual, there are no free beer ;-)
> 

Yeah but you have to resync whole subscription, not just single table
(removing table from the publication will also not help), that's pretty
severe punishment. What if you have triggers downstream that do
calculations or logging which you can't recover by simply rebuilding
replica? I think it's better to err on the side of no data loss.

We could also try to figure out a way to recover from this that does not
require resync, ie perhaps we could somehow temporarily force evaluation
of the expression to have current snapshot.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Re: row filtering for logical replication

От

Tomas Vondra

Дата:

23 ноября 2018 г., 20:00:46

On 11/23/18 8:03 PM, Stephen Frost wrote:
> Greetings,
> 
> * Fabrízio de Royes Mello (fabriziomello@gmail.com) wrote:
>> On Fri, Nov 23, 2018 at 4:13 PM Petr Jelinek <petr.jelinek@2ndquadrant.com>
>> wrote:
>>>> If carefully documented I see no problem with it... we already have an
>>>> analogous problem with functional indexes.
>>>
>>> The difference is that with functional indexes you can recreate the
>>> missing object and everything is okay again. With logical replication
>>> recreating the object will not help.
>>>
>>
>> In this case with logical replication you should rsync the object. That is
>> the price of misunderstanding / bad use of the new feature.
>>
>> As usual, there are no free beer ;-)
> 
> There's also certainly no shortage of other ways to break logical
> replication, including ways that would also be hard to recover from
> today other than doing a full resync.
> 

Sure, but that seems more like an argument against creating additional
ones (and for preventing those that already exist). I'm not sure this
particular feature is where we should draw the line, though.

> What that seems to indicate, to me at least, is that it'd be awful
> nice to have a way to resync the data which doesn't necessairly
> involve transferring all of it over again.
> 
> Of course, it'd be nice if we could track those dependencies too,
> but that's yet another thing.

Yep, that seems like a good idea in general. Both here and for
functional indexes (although I suppose sure is a technical reason why it
wasn't implemented right away for them).

> 
> In short, I'm not sure that I agree with the idea that we shouldn't
> allow this and instead I'd rather we realize it and put the logical
> replication into some kind of an error state that requires a resync.
> 

That would still mean a need to resync the data to recover, so I'm not
sure it's really an improvement. And I suppose it'd require tracking the
dependencies, because how else would you mark the subscription as
requiring a resync? At which point we could decline the DROP without a
CASCADE, just like we do elsewhere, no?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: row filtering for logical replication

От

Tomas Vondra

Дата:

23 ноября 2018 г., 20:02:15


On 11/23/18 8:14 PM, Petr Jelinek wrote:
> On 23/11/2018 19:29, Fabrízio de Royes Mello wrote:
>>
>> On Fri, Nov 23, 2018 at 4:13 PM Petr Jelinek
>> <petr.jelinek@2ndquadrant.com <mailto:petr.jelinek@2ndquadrant.com>> wrote:
>>>
>>>>
>>>> If carefully documented I see no problem with it... we already have an
>>>> analogous problem with functional indexes.
>>>
>>> The difference is that with functional indexes you can recreate the
>>> missing object and everything is okay again. With logical replication
>>> recreating the object will not help.
>>>
>>
>> In this case with logical replication you should rsync the object. That
>> is the price of misunderstanding / bad use of the new feature.
>>
>> As usual, there are no free beer ;-)
>>
> 
> Yeah but you have to resync whole subscription, not just single table
> (removing table from the publication will also not help), that's pretty
> severe punishment. What if you have triggers downstream that do
> calculations or logging which you can't recover by simply rebuilding
> replica? I think it's better to err on the side of no data loss.
> 

Yeah, having to resync everything because you accidentally dropped a
function is quite annoying. Of course, you should notice that while
testing the upgrade in a testing environment, but still ...

> We could also try to figure out a way to recover from this that does not
> require resync, ie perhaps we could somehow temporarily force evaluation
> of the expression to have current snapshot.
> 

That seems like huge a can of worms ...


cheers

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: row filtering for logical replication

От

Stephen Frost

Дата:

14 декабря 2018 г., 15:38:36

Greetings,

* Petr Jelinek (petr.jelinek@2ndquadrant.com) wrote:
> On 23/11/2018 03:02, Stephen Frost wrote:
> > * Euler Taveira (euler@timbira.com.br) wrote:
> >> 2018-02-28 21:54 GMT-03:00 Craig Ringer <craig@2ndquadrant.com>:
> >>> Good idea. I haven't read this yet, but one thing to make sure you've
> >>> handled is limiting the clause to referencing only the current tuple and the
> >>> catalogs. user-catalog tables are OK, too, anything that is
> >>> RelationIsAccessibleInLogicalDecoding().
> >>>
> >>> This means only immutable functions may be invoked, since a stable or
> >>> volatile function might attempt to access a table. And views must be
> >>> prohibited or recursively checked. (We have tree walkers that would help
> >>> with this).
> >>>
> >>> It might be worth looking at the current logic for CHECK expressions, since
> >>> the requirements are similar. In my opinion you could safely not bother with
> >>> allowing access to user catalog tables in the filter expressions and limit
> >>> them strictly to immutable functions and the tuple its self.
> >>
> >> IIRC implementation is similar to RLS expressions. I'll check all of
> >> these rules.
> >
> > Given the similarity to RLS and the nearby discussion about allowing
> > non-superusers to create subscriptions, and probably publications later,
> > I wonder if we shouldn't be somehow associating this with RLS policies
> > instead of having the publication filtering be entirely independent..
>
> I do see the appeal here, if you consider logical replication to be a
> streaming select it probably applies well.
>
> But given that this is happening inside output plugin which does not
> have full executor setup and has catalog-only snapshot I am not sure how
> feasible it is to try to merge these two things. As per my previous
> email it's possible that we'll have to be stricter about what we allow
> in expressions here.

I can certainly understand the concern about trying to combine the
implementation of this with that of RLS; perhaps that isn't a good fit
due to the additional constraints put on logical decoding.

That said, I still think it might make sense to consider these filters
for logical decoding to be policies and, ideally, to allow users to use
the same policy for both.

In the end, the idea of having to build a single large and complex
'create publication' command which has a bunch of tables, each with
their own filter clauses, just strikes me as pretty painful.

> The other issue with merging this is that the use-case for filtering out
> the data in logical replication is not necessarily about security, but
> often about sending only relevant data. So it makes sense to have filter
> on publication without RLS enabled on table and if we'd force that, we'd
> limit usefulness of this feature.

I definitely have a serious problem if we are going to say that you
can't use this filtering for security-sensitive cases.

> We definitely want to eventually create subscriptions as non-superuser
> but that has zero effect on this as everything here is happening on
> different server than where subscription lives (we already allow
> creation of publications with just CREATE privilege on database and
> ownership of the table).

What I wasn't clear about above was the idea that we might allow a user
other than the table owner to publish a given table, but that such a
publication should certanily only be allowed to include the rows which
that user has access to- as regulated by RLS.  If the RLS policy is too
complex to allow that then I would think we'd simply throw an error at
the create publication time and the would-be publisher would need to
figure that out with the table owner.

I'll admit that this might seem like a stretch, but what happens today?
Today, people write cronjobs to try to sync between tables with FDWs and
you don't need to own a table to use it as the target of a foreign
table.

I do think that we'll need to have some additional privileges around who
is allowed to create publications, I'm not entirely thrilled with that
being combined with the ability to create schemas; the two seem quite
different to me.

* Euler Taveira (euler@timbira.com.br) wrote:
> Em sex, 23 de nov de 2018 às 11:40, Petr Jelinek
> <petr.jelinek@2ndquadrant.com> escreveu:
> > But given that this is happening inside output plugin which does not
> > have full executor setup and has catalog-only snapshot I am not sure how
> > feasible it is to try to merge these two things. As per my previous
> > email it's possible that we'll have to be stricter about what we allow
> > in expressions here.
>
> This feature should be as simple as possible. I don't want to
> introduce a huge overhead just for filtering some data. Data sharding
> generally uses simple expressions.

RLS often uses simple filters too.

> > The other issue with merging this is that the use-case for filtering out
> > the data in logical replication is not necessarily about security, but
> > often about sending only relevant data. So it makes sense to have filter
> > on publication without RLS enabled on table and if we'd force that, we'd
> > limit usefulness of this feature.
>
> Use the same infrastructure as RLS could be a good idea but use RLS
> for row filtering is not. RLS is complex.

Right, this was along the lines I was thinking of- using the
infrastructure and the policy system, in particular.

Thanks!

Stephen

Вложения

signature.asc

Re: row filtering for logical replication

От

Stephen Frost

Дата:

14 декабря 2018 г., 15:56:45

Greetings,

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> On 11/23/18 8:03 PM, Stephen Frost wrote:
> > * Fabrízio de Royes Mello (fabriziomello@gmail.com) wrote:
> >> On Fri, Nov 23, 2018 at 4:13 PM Petr Jelinek <petr.jelinek@2ndquadrant.com>
> >> wrote:
> >>>> If carefully documented I see no problem with it... we already have an
> >>>> analogous problem with functional indexes.
> >>>
> >>> The difference is that with functional indexes you can recreate the
> >>> missing object and everything is okay again. With logical replication
> >>> recreating the object will not help.
> >>>
> >>
> >> In this case with logical replication you should rsync the object. That is
> >> the price of misunderstanding / bad use of the new feature.
> >>
> >> As usual, there are no free beer ;-)
> >
> > There's also certainly no shortage of other ways to break logical
> > replication, including ways that would also be hard to recover from
> > today other than doing a full resync.
>
> Sure, but that seems more like an argument against creating additional
> ones (and for preventing those that already exist). I'm not sure this
> particular feature is where we should draw the line, though.

I was actually going in the other direction- we should allow it because
advanced users may know what they're doing better than we do and we
shouldn't prevent things just because they might be misused or
misunderstood by a user.

> > What that seems to indicate, to me at least, is that it'd be awful
> > nice to have a way to resync the data which doesn't necessairly
> > involve transferring all of it over again.
> >
> > Of course, it'd be nice if we could track those dependencies too,
> > but that's yet another thing.
>
> Yep, that seems like a good idea in general. Both here and for
> functional indexes (although I suppose sure is a technical reason why it
> wasn't implemented right away for them).

We don't track function dependencies in general and I could certainly
see cases where you really wouldn't want to do so, at least not in the
same way that we track FKs or similar.  I do wonder if maybe we didn't
track function dependencies because we didn't (yet) have create or
replace function and that now we should.  We don't track dependencies
inside a function either though.

> > In short, I'm not sure that I agree with the idea that we shouldn't
> > allow this and instead I'd rather we realize it and put the logical
> > replication into some kind of an error state that requires a resync.
>
> That would still mean a need to resync the data to recover, so I'm not
> sure it's really an improvement. And I suppose it'd require tracking the
> dependencies, because how else would you mark the subscription as
> requiring a resync? At which point we could decline the DROP without a
> CASCADE, just like we do elsewhere, no?

I was actually thinking more along the lines of just simply marking the
publication/subscription as being in a 'failed' state when a failure
actually happens, and maybe even at that point basically throwing away
everything except the shell of the publication/subscription (so the user
can see that it failed and come in and properly drop it); I'm thinking
about this as perhaps similar to a transaction being aborted.

Thanks!

Stephen

Вложения

signature.asc

Re: row filtering for logical replication

От

Petr Jelinek

Дата:

15 декабря 2018 г., 14:13:41

On 14/12/2018 16:38, Stephen Frost wrote:
> Greetings,
> 
> * Petr Jelinek (petr.jelinek@2ndquadrant.com) wrote:
>> On 23/11/2018 03:02, Stephen Frost wrote:
>>> * Euler Taveira (euler@timbira.com.br) wrote:
>>>> 2018-02-28 21:54 GMT-03:00 Craig Ringer <craig@2ndquadrant.com>:
>>>>> Good idea. I haven't read this yet, but one thing to make sure you've
>>>>> handled is limiting the clause to referencing only the current tuple and the
>>>>> catalogs. user-catalog tables are OK, too, anything that is
>>>>> RelationIsAccessibleInLogicalDecoding().
>>>>>
>>>>> This means only immutable functions may be invoked, since a stable or
>>>>> volatile function might attempt to access a table. And views must be
>>>>> prohibited or recursively checked. (We have tree walkers that would help
>>>>> with this).
>>>>>
>>>>> It might be worth looking at the current logic for CHECK expressions, since
>>>>> the requirements are similar. In my opinion you could safely not bother with
>>>>> allowing access to user catalog tables in the filter expressions and limit
>>>>> them strictly to immutable functions and the tuple its self.
>>>>
>>>> IIRC implementation is similar to RLS expressions. I'll check all of
>>>> these rules.
>>>
>>> Given the similarity to RLS and the nearby discussion about allowing
>>> non-superusers to create subscriptions, and probably publications later,
>>> I wonder if we shouldn't be somehow associating this with RLS policies
>>> instead of having the publication filtering be entirely independent..
>>
>> I do see the appeal here, if you consider logical replication to be a
>> streaming select it probably applies well.
>>
>> But given that this is happening inside output plugin which does not
>> have full executor setup and has catalog-only snapshot I am not sure how
>> feasible it is to try to merge these two things. As per my previous
>> email it's possible that we'll have to be stricter about what we allow
>> in expressions here.
> 
> I can certainly understand the concern about trying to combine the
> implementation of this with that of RLS; perhaps that isn't a good fit
> due to the additional constraints put on logical decoding.
> 
> That said, I still think it might make sense to consider these filters
> for logical decoding to be policies and, ideally, to allow users to use
> the same policy for both.
> 

I am not against that as long as it's possible to have policy for
logical replication without having it for RLS and vice versa.

I also wonder if policies are flexible enough to allow for specifying
OLD and NEW - the replication filtering deals with DML, not with what's
visible, it might very well depend on differences between these (that's
something the current patch is missing as well BTW).

> In the end, the idea of having to build a single large and complex
> 'create publication' command which has a bunch of tables, each with
> their own filter clauses, just strikes me as pretty painful.
> 
>> The other issue with merging this is that the use-case for filtering out
>> the data in logical replication is not necessarily about security, but
>> often about sending only relevant data. So it makes sense to have filter
>> on publication without RLS enabled on table and if we'd force that, we'd
>> limit usefulness of this feature.
> 
> I definitely have a serious problem if we are going to say that you
> can't use this filtering for security-sensitive cases.

I am saying it should not be tied to only security sensitive cases,
because it has use cases that have nothing to do with security (ie, I
don't want this to depend on RLS being enabled for a table).

> 
>> We definitely want to eventually create subscriptions as non-superuser
>> but that has zero effect on this as everything here is happening on
>> different server than where subscription lives (we already allow
>> creation of publications with just CREATE privilege on database and
>> ownership of the table).
> 
> What I wasn't clear about above was the idea that we might allow a user
> other than the table owner to publish a given table, but that such a
> publication should certanily only be allowed to include the rows which
> that user has access to- as regulated by RLS.  If the RLS policy is too
> complex to allow that then I would think we'd simply throw an error at
> the create publication time and the would-be publisher would need to
> figure that out with the table owner.

My opinion is that this is useful, but not necessarily something v1
patch needs to solve. Having too many publications and subscriptions to
various places is not currently practical anyway due to decoding
duplicating all the work for every connection.

> 
> * Euler Taveira (euler@timbira.com.br) wrote:
>> Em sex, 23 de nov de 2018 às 11:40, Petr Jelinek
>> <petr.jelinek@2ndquadrant.com> escreveu:
> 
>>> The other issue with merging this is that the use-case for filtering out
>>> the data in logical replication is not necessarily about security, but
>>> often about sending only relevant data. So it makes sense to have filter
>>> on publication without RLS enabled on table and if we'd force that, we'd
>>> limit usefulness of this feature.
>>
>> Use the same infrastructure as RLS could be a good idea but use RLS
>> for row filtering is not. RLS is complex.
> 
> Right, this was along the lines I was thinking of- using the
> infrastructure and the policy system, in particular.
> 

Yeah that part is definitely worth investigating.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Re: row filtering for logical replication

От

Petr Jelinek

Дата:

15 декабря 2018 г., 14:23:39

On 14/12/2018 16:56, Stephen Frost wrote:
> Greetings,
> 
> * Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
>> On 11/23/18 8:03 PM, Stephen Frost wrote:
>>> * Fabrízio de Royes Mello (fabriziomello@gmail.com) wrote:
>>>> On Fri, Nov 23, 2018 at 4:13 PM Petr Jelinek <petr.jelinek@2ndquadrant.com>
>>>> wrote:
>>>>>> If carefully documented I see no problem with it... we already have an
>>>>>> analogous problem with functional indexes.
>>>>>
>>>>> The difference is that with functional indexes you can recreate the
>>>>> missing object and everything is okay again. With logical replication
>>>>> recreating the object will not help.
>>>>>
>>>>
>>>> In this case with logical replication you should rsync the object. That is
>>>> the price of misunderstanding / bad use of the new feature.
>>>>
>>>> As usual, there are no free beer ;-)
>>>
>>> There's also certainly no shortage of other ways to break logical
>>> replication, including ways that would also be hard to recover from
>>> today other than doing a full resync.
>>
>> Sure, but that seems more like an argument against creating additional
>> ones (and for preventing those that already exist). I'm not sure this
>> particular feature is where we should draw the line, though.
> 
> I was actually going in the other direction- we should allow it because
> advanced users may know what they're doing better than we do and we
> shouldn't prevent things just because they might be misused or
> misunderstood by a user.
> 

That's all good, but we need good escape hatch for when things go south
and we don't have it and IMHO it's not as easy to have one as you might
think.

That's why I would do the simple and safe way first before allowing
more, otherwise we'll be discussing this for next couple of PG versions.

>>> What that seems to indicate, to me at least, is that it'd be awful
>>> nice to have a way to resync the data which doesn't necessairly
>>> involve transferring all of it over again.
>>>
>>> Of course, it'd be nice if we could track those dependencies too,
>>> but that's yet another thing.
>>
>> Yep, that seems like a good idea in general. Both here and for
>> functional indexes (although I suppose sure is a technical reason why it
>> wasn't implemented right away for them).
> 
> We don't track function dependencies in general and I could certainly
> see cases where you really wouldn't want to do so, at least not in the
> same way that we track FKs or similar.  I do wonder if maybe we didn't
> track function dependencies because we didn't (yet) have create or
> replace function and that now we should.  We don't track dependencies
> inside a function either though.

Yeah we can't always have dependencies, it would break some perfectly
valid usage scenarios. Also it's not exactly clear to me how we'd track
dependencies of say plpython function...

> 
>>> In short, I'm not sure that I agree with the idea that we shouldn't
>>> allow this and instead I'd rather we realize it and put the logical
>>> replication into some kind of an error state that requires a resync.
>>
>> That would still mean a need to resync the data to recover, so I'm not
>> sure it's really an improvement. And I suppose it'd require tracking the
>> dependencies, because how else would you mark the subscription as
>> requiring a resync? At which point we could decline the DROP without a
>> CASCADE, just like we do elsewhere, no?
> 
> I was actually thinking more along the lines of just simply marking the
> publication/subscription as being in a 'failed' state when a failure
> actually happens, and maybe even at that point basically throwing away
> everything except the shell of the publication/subscription (so the user
> can see that it failed and come in and properly drop it); I'm thinking
> about this as perhaps similar to a transaction being aborted.

There are several problems with that. First this happens in historic
snapshot which can't write and on top of that we are in the middle of
error processing so we have our hands tied a bit, it's definitely going
to need bit of creative thinking to do this.

Second, and that's more soft issue (which is probably harder to solve)
what do we do with the slot and subscription. There is one failed
publication, but the subscription may be subscribed to 20 of them, do we
kill the whole subscription because of single failed publication? If we
don't do we continue replicating like nothing has happened but with data
in the failed publication missing (which can be considered data
loss/corruption from the view of user). If we stop replication, do we
clean the slot so that we don't keep back wal/catalog xmin forever
(which could lead to server stopping) or do we keep the slot so that
user can somehow fix the issue (reconfigure subscription to not care
about that publication for example) and continue replication without
further loss?

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Re: row filtering for logical replication

От

Stephen Frost

Дата:

27 декабря 2018 г., 19:05:11

Greetings,

* Petr Jelinek (petr.jelinek@2ndquadrant.com) wrote:
> On 14/12/2018 16:38, Stephen Frost wrote:
> > * Petr Jelinek (petr.jelinek@2ndquadrant.com) wrote:
> >> I do see the appeal here, if you consider logical replication to be a
> >> streaming select it probably applies well.
> >>
> >> But given that this is happening inside output plugin which does not
> >> have full executor setup and has catalog-only snapshot I am not sure how
> >> feasible it is to try to merge these two things. As per my previous
> >> email it's possible that we'll have to be stricter about what we allow
> >> in expressions here.
> >
> > I can certainly understand the concern about trying to combine the
> > implementation of this with that of RLS; perhaps that isn't a good fit
> > due to the additional constraints put on logical decoding.
> >
> > That said, I still think it might make sense to consider these filters
> > for logical decoding to be policies and, ideally, to allow users to use
> > the same policy for both.
>
> I am not against that as long as it's possible to have policy for
> logical replication without having it for RLS and vice versa.

RLS already is able to be enabled/disabled on a per-table basis.  I
could see how we might want to extend the existing policy system to have
a way to enable/disable individual policies for RLS but that should be
reasonably straight-forward to do, I would think.

> I also wonder if policies are flexible enough to allow for specifying
> OLD and NEW - the replication filtering deals with DML, not with what's
> visible, it might very well depend on differences between these (that's
> something the current patch is missing as well BTW).

The policy system already has the notion of a 'visible' check and a
'does the new row match this' check (USING vs. WITH CHECK policies).
Perhaps if you could outline the specific use-cases that you're thinking
about, we could discuss them and make sure that they fit within those
mechanisms- or, if not, discuss if such a use-case would make sense for
RLS as well and, if so, figure out a way to support that for both.

> > In the end, the idea of having to build a single large and complex
> > 'create publication' command which has a bunch of tables, each with
> > their own filter clauses, just strikes me as pretty painful.
> >
> >> The other issue with merging this is that the use-case for filtering out
> >> the data in logical replication is not necessarily about security, but
> >> often about sending only relevant data. So it makes sense to have filter
> >> on publication without RLS enabled on table and if we'd force that, we'd
> >> limit usefulness of this feature.
> >
> > I definitely have a serious problem if we are going to say that you
> > can't use this filtering for security-sensitive cases.
>
> I am saying it should not be tied to only security sensitive cases,
> because it has use cases that have nothing to do with security (ie, I
> don't want this to depend on RLS being enabled for a table).

I'm fine with this being able to be independently enabled/disabled,
apart from RLS.

> >> We definitely want to eventually create subscriptions as non-superuser
> >> but that has zero effect on this as everything here is happening on
> >> different server than where subscription lives (we already allow
> >> creation of publications with just CREATE privilege on database and
> >> ownership of the table).
> >
> > What I wasn't clear about above was the idea that we might allow a user
> > other than the table owner to publish a given table, but that such a
> > publication should certanily only be allowed to include the rows which
> > that user has access to- as regulated by RLS.  If the RLS policy is too
> > complex to allow that then I would think we'd simply throw an error at
> > the create publication time and the would-be publisher would need to
> > figure that out with the table owner.
>
> My opinion is that this is useful, but not necessarily something v1
> patch needs to solve. Having too many publications and subscriptions to
> various places is not currently practical anyway due to decoding
> duplicating all the work for every connection.

I agree that supporting this could be done in a later patch, however, I
do feel that when we go to add support for non-owners to create
publications then RLS needs to be supported at that point (and by more
than just 'throw an error').  I can agree with incremental improvements
but I don't want to get to a point where we've got a bunch of
independent things only half of which work with other parts of the
system.

> > * Euler Taveira (euler@timbira.com.br) wrote:
> >> Em sex, 23 de nov de 2018 às 11:40, Petr Jelinek
> >> <petr.jelinek@2ndquadrant.com> escreveu:
> >
> >>> The other issue with merging this is that the use-case for filtering out
> >>> the data in logical replication is not necessarily about security, but
> >>> often about sending only relevant data. So it makes sense to have filter
> >>> on publication without RLS enabled on table and if we'd force that, we'd
> >>> limit usefulness of this feature.
> >>
> >> Use the same infrastructure as RLS could be a good idea but use RLS
> >> for row filtering is not. RLS is complex.
> >
> > Right, this was along the lines I was thinking of- using the
> > infrastructure and the policy system, in particular.
>
> Yeah that part is definitely worth investigating.

Glad to hear that.

Thanks!

Stephen

Вложения

signature.asc

Re: row filtering for logical replication

От

Stephen Frost

Дата:

27 декабря 2018 г., 19:19:25

Greetings,

* Petr Jelinek (petr.jelinek@2ndquadrant.com) wrote:
> On 14/12/2018 16:56, Stephen Frost wrote:
> > * Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> >> On 11/23/18 8:03 PM, Stephen Frost wrote:
> >>> * Fabrízio de Royes Mello (fabriziomello@gmail.com) wrote:
> >>>> On Fri, Nov 23, 2018 at 4:13 PM Petr Jelinek <petr.jelinek@2ndquadrant.com>
> >>>> wrote:
> >>>>>> If carefully documented I see no problem with it... we already have an
> >>>>>> analogous problem with functional indexes.
> >>>>>
> >>>>> The difference is that with functional indexes you can recreate the
> >>>>> missing object and everything is okay again. With logical replication
> >>>>> recreating the object will not help.
> >>>>>
> >>>>
> >>>> In this case with logical replication you should rsync the object. That is
> >>>> the price of misunderstanding / bad use of the new feature.
> >>>>
> >>>> As usual, there are no free beer ;-)
> >>>
> >>> There's also certainly no shortage of other ways to break logical
> >>> replication, including ways that would also be hard to recover from
> >>> today other than doing a full resync.
> >>
> >> Sure, but that seems more like an argument against creating additional
> >> ones (and for preventing those that already exist). I'm not sure this
> >> particular feature is where we should draw the line, though.
> >
> > I was actually going in the other direction- we should allow it because
> > advanced users may know what they're doing better than we do and we
> > shouldn't prevent things just because they might be misused or
> > misunderstood by a user.
>
> That's all good, but we need good escape hatch for when things go south
> and we don't have it and IMHO it's not as easy to have one as you might
> think.

We don't have a great solution but we should be able to at least drop
and recreate the publication or subscription, even today, can't we?
Sure, that means having to recopy everything, but that's what you get if
you break your publication/subscription.  If we allow the user to get to
a point where the system can't be fixed then I agree that's a serious
issue, but hopefully that isn't the case.

> >>> What that seems to indicate, to me at least, is that it'd be awful
> >>> nice to have a way to resync the data which doesn't necessairly
> >>> involve transferring all of it over again.
> >>>
> >>> Of course, it'd be nice if we could track those dependencies too,
> >>> but that's yet another thing.
> >>
> >> Yep, that seems like a good idea in general. Both here and for
> >> functional indexes (although I suppose sure is a technical reason why it
> >> wasn't implemented right away for them).
> >
> > We don't track function dependencies in general and I could certainly
> > see cases where you really wouldn't want to do so, at least not in the
> > same way that we track FKs or similar.  I do wonder if maybe we didn't
> > track function dependencies because we didn't (yet) have create or
> > replace function and that now we should.  We don't track dependencies
> > inside a function either though.
>
> Yeah we can't always have dependencies, it would break some perfectly
> valid usage scenarios. Also it's not exactly clear to me how we'd track
> dependencies of say plpython function...

Well, we could at leasts depend on the functions explicitly listed at
the top level and I don't believe we even do that today.  I can't think
of any downside off-hand to that, given that we have create-or-replace
function.

> >>> In short, I'm not sure that I agree with the idea that we shouldn't
> >>> allow this and instead I'd rather we realize it and put the logical
> >>> replication into some kind of an error state that requires a resync.
> >>
> >> That would still mean a need to resync the data to recover, so I'm not
> >> sure it's really an improvement. And I suppose it'd require tracking the
> >> dependencies, because how else would you mark the subscription as
> >> requiring a resync? At which point we could decline the DROP without a
> >> CASCADE, just like we do elsewhere, no?
> >
> > I was actually thinking more along the lines of just simply marking the
> > publication/subscription as being in a 'failed' state when a failure
> > actually happens, and maybe even at that point basically throwing away
> > everything except the shell of the publication/subscription (so the user
> > can see that it failed and come in and properly drop it); I'm thinking
> > about this as perhaps similar to a transaction being aborted.
>
> There are several problems with that. First this happens in historic
> snapshot which can't write and on top of that we are in the middle of
> error processing so we have our hands tied a bit, it's definitely going
> to need bit of creative thinking to do this.

We can't write to things inside the database in a historic snapshot and
we do have to deal with the fact that we're in error processing.  What
about writing somewhere that's outside of the regular database system?
Maybe a pg_logical/failed directory?  There's all the usual
complications from that around dealing with durable writes (if we need
to worry about that and I'm not sure that we do...  if we fail to
persist a write saying "X failed" and we restart.. well, it's gonna fail
again and we write it then), and cleaning things up as needed (but maybe
this is handled as part of the DROP, and we WAL that, so we can re-do
the removal of the failed marker file...), and if we need to think about
what should happen on replicas (is there anything?).

> Second, and that's more soft issue (which is probably harder to solve)
> what do we do with the slot and subscription. There is one failed
> publication, but the subscription may be subscribed to 20 of them, do we
> kill the whole subscription because of single failed publication? If we
> don't do we continue replicating like nothing has happened but with data
> in the failed publication missing (which can be considered data
> loss/corruption from the view of user). If we stop replication, do we
> clean the slot so that we don't keep back wal/catalog xmin forever
> (which could lead to server stopping) or do we keep the slot so that
> user can somehow fix the issue (reconfigure subscription to not care
> about that publication for example) and continue replication without
> further loss?

I would think we'd have to fail the whole publication if there's a
failure for any part of it.  Replicating a partial set definitely sounds
wrong to me.  Once we stop replication, yes, we should clean the slot
and mark it failed so that we don't keep back WAL and so that we allow
the catalog xmin to move forward so that the failed publication doesn't
run the server out of disk space.

If we really think there's a use-case for keeping the replication slot
and allowing it to cause WAL to spool on the server and keep the catalog
xmin back then I'd suggest we make this behavior configurable- so that
users can choose on a publication if they want a failure to be
considered a 'soft' fail or a 'hard' fail.  A 'soft' fail would keep the
slot and keep the WAL and keep the catalog xmin, with the expectation
that the user will either drop the slot themselves or somehow fix it,
while a 'hard' fail would clean everything up except the skeleton of the
slot itself which the user would need to drop.

Thanks!

Stephen

Вложения

signature.asc

Re: row filtering for logical replication

От

Petr Jelinek

Дата:

27 декабря 2018 г., 23:15:10

Hi,

On 27/12/2018 20:05, Stephen Frost wrote:
> Greetings,
> 
> * Petr Jelinek (petr.jelinek@2ndquadrant.com) wrote:
>> On 14/12/2018 16:38, Stephen Frost wrote:
>>> * Petr Jelinek (petr.jelinek@2ndquadrant.com) wrote:
>>>> I do see the appeal here, if you consider logical replication to be a
>>>> streaming select it probably applies well.
>>>>
>>>> But given that this is happening inside output plugin which does not
>>>> have full executor setup and has catalog-only snapshot I am not sure how
>>>> feasible it is to try to merge these two things. As per my previous
>>>> email it's possible that we'll have to be stricter about what we allow
>>>> in expressions here.
>>>
>>> I can certainly understand the concern about trying to combine the
>>> implementation of this with that of RLS; perhaps that isn't a good fit
>>> due to the additional constraints put on logical decoding.
>>>
>>> That said, I still think it might make sense to consider these filters
>>> for logical decoding to be policies and, ideally, to allow users to use
>>> the same policy for both.
>>
>> I am not against that as long as it's possible to have policy for
>> logical replication without having it for RLS and vice versa.
> 
> RLS already is able to be enabled/disabled on a per-table basis.  I
> could see how we might want to extend the existing policy system to have
> a way to enable/disable individual policies for RLS but that should be
> reasonably straight-forward to do, I would think.

Sure, I was mostly referring to having ability of enable/disable this
independently of enabling/disabling RLS which you are okay with based on
bellow so no issue there from my side.

> 
>> I also wonder if policies are flexible enough to allow for specifying
>> OLD and NEW - the replication filtering deals with DML, not with what's
>> visible, it might very well depend on differences between these (that's
>> something the current patch is missing as well BTW).
> 
> The policy system already has the notion of a 'visible' check and a
> 'does the new row match this' check (USING vs. WITH CHECK policies).
> Perhaps if you could outline the specific use-cases that you're thinking
> about, we could discuss them and make sure that they fit within those
> mechanisms- or, if not, discuss if such a use-case would make sense for
> RLS as well and, if so, figure out a way to support that for both.

So we'd use USING for old row images (UPDATE/DELETE) and WITH CHECK for
new ones (UPDATE/INSERT)? I think OLD/NEW is somewhat more natural
naming of this as there is no "SELECT" part of operation here, but as
long as the functionality is there I don't mind syntax that much.

> 
>>> In the end, the idea of having to build a single large and complex
>>> 'create publication' command which has a bunch of tables, each with
>>> their own filter clauses, just strikes me as pretty painful.
>>>
>>>> The other issue with merging this is that the use-case for filtering out
>>>> the data in logical replication is not necessarily about security, but
>>>> often about sending only relevant data. So it makes sense to have filter
>>>> on publication without RLS enabled on table and if we'd force that, we'd
>>>> limit usefulness of this feature.
>>>
>>> I definitely have a serious problem if we are going to say that you
>>> can't use this filtering for security-sensitive cases.
>>
>> I am saying it should not be tied to only security sensitive cases,
>> because it has use cases that have nothing to do with security (ie, I
>> don't want this to depend on RLS being enabled for a table).
> 
> I'm fine with this being able to be independently enabled/disabled,
> apart from RLS.
> 

Cool.

>>>> We definitely want to eventually create subscriptions as non-superuser
>>>> but that has zero effect on this as everything here is happening on
>>>> different server than where subscription lives (we already allow
>>>> creation of publications with just CREATE privilege on database and
>>>> ownership of the table).
>>>
>>> What I wasn't clear about above was the idea that we might allow a user
>>> other than the table owner to publish a given table, but that such a
>>> publication should certanily only be allowed to include the rows which
>>> that user has access to- as regulated by RLS.  If the RLS policy is too
>>> complex to allow that then I would think we'd simply throw an error at
>>> the create publication time and the would-be publisher would need to
>>> figure that out with the table owner.
>>
>> My opinion is that this is useful, but not necessarily something v1
>> patch needs to solve. Having too many publications and subscriptions to
>> various places is not currently practical anyway due to decoding
>> duplicating all the work for every connection.
> 
> I agree that supporting this could be done in a later patch, however, I
> do feel that when we go to add support for non-owners to create
> publications then RLS needs to be supported at that point (and by more
> than just 'throw an error').  I can agree with incremental improvements
> but I don't want to get to a point where we've got a bunch of
> independent things only half of which work with other parts of the
> system.

Yes, using RLS infrastructure now will make it easier to add support for
publishing without being owner at some later point, just let's please
not make publishing without being owner part of requirements for this.


-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Re: row filtering for logical replication

От

Petr Jelinek

Дата:

27 декабря 2018 г., 23:36:25

On 27/12/2018 20:19, Stephen Frost wrote:
> Greetings,
> 
> * Petr Jelinek (petr.jelinek@2ndquadrant.com) wrote:
>> On 14/12/2018 16:56, Stephen Frost wrote:
>>> * Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
>>>> On 11/23/18 8:03 PM, Stephen Frost wrote:
>>>>> * Fabrízio de Royes Mello (fabriziomello@gmail.com) wrote:
>>>>>> On Fri, Nov 23, 2018 at 4:13 PM Petr Jelinek <petr.jelinek@2ndquadrant.com>
>>>>>> wrote:
>>>>>>>> If carefully documented I see no problem with it... we already have an
>>>>>>>> analogous problem with functional indexes.
>>>>>>>
>>>>>>> The difference is that with functional indexes you can recreate the
>>>>>>> missing object and everything is okay again. With logical replication
>>>>>>> recreating the object will not help.
>>>>>>>
>>>>>>
>>>>>> In this case with logical replication you should rsync the object. That is
>>>>>> the price of misunderstanding / bad use of the new feature.
>>>>>>
>>>>>> As usual, there are no free beer ;-)
>>>>>
>>>>> There's also certainly no shortage of other ways to break logical
>>>>> replication, including ways that would also be hard to recover from
>>>>> today other than doing a full resync.
>>>>
>>>> Sure, but that seems more like an argument against creating additional
>>>> ones (and for preventing those that already exist). I'm not sure this
>>>> particular feature is where we should draw the line, though.
>>>
>>> I was actually going in the other direction- we should allow it because
>>> advanced users may know what they're doing better than we do and we
>>> shouldn't prevent things just because they might be misused or
>>> misunderstood by a user.
>>
>> That's all good, but we need good escape hatch for when things go south
>> and we don't have it and IMHO it's not as easy to have one as you might
>> think.
> 
> We don't have a great solution but we should be able to at least drop
> and recreate the publication or subscription, even today, can't we?

Well we can drop thing always, yes, not having ability to drop things
when they break would be bad design. I am debating ability to recover
without rebuilding everything a there are cases where you simply can't
rebuild everything (ie we allow filtering out deletes). I don't like
disabling UDFs either as that means that user created types are unusable
in filters, I just wonder if saying "sorry your replica is gone" is any
better.

> Sure, that means having to recopy everything, but that's what you get if
> you break your publication/subscription.

This is but off-topic here, but I really wonder how are you currently
breaking your publications/subscriptions.

>>>>> What that seems to indicate, to me at least, is that it'd be awful
>>>>> nice to have a way to resync the data which doesn't necessairly
>>>>> involve transferring all of it over again.
>>>>>
>>>>> Of course, it'd be nice if we could track those dependencies too,
>>>>> but that's yet another thing.
>>>>
>>>> Yep, that seems like a good idea in general. Both here and for
>>>> functional indexes (although I suppose sure is a technical reason why it
>>>> wasn't implemented right away for them).
>>>
>>> We don't track function dependencies in general and I could certainly
>>> see cases where you really wouldn't want to do so, at least not in the
>>> same way that we track FKs or similar.  I do wonder if maybe we didn't
>>> track function dependencies because we didn't (yet) have create or
>>> replace function and that now we should.  We don't track dependencies
>>> inside a function either though.
>>
>> Yeah we can't always have dependencies, it would break some perfectly
>> valid usage scenarios. Also it's not exactly clear to me how we'd track
>> dependencies of say plpython function...
> 
> Well, we could at leasts depend on the functions explicitly listed at
> the top level and I don't believe we even do that today.  I can't think
> of any downside off-hand to that, given that we have create-or-replace
> function.
> 

I dunno how much is that worth it TBH, the situations where I've seen
this issue (pglogical has this feature for long time and suffers from
the same lack of dependency tracking) is that somebody drops table/type
used in a function that is used as filter.

>>>>> In short, I'm not sure that I agree with the idea that we shouldn't
>>>>> allow this and instead I'd rather we realize it and put the logical
>>>>> replication into some kind of an error state that requires a resync.
>>>>
>>>> That would still mean a need to resync the data to recover, so I'm not
>>>> sure it's really an improvement. And I suppose it'd require tracking the
>>>> dependencies, because how else would you mark the subscription as
>>>> requiring a resync? At which point we could decline the DROP without a
>>>> CASCADE, just like we do elsewhere, no?
>>>
>>> I was actually thinking more along the lines of just simply marking the
>>> publication/subscription as being in a 'failed' state when a failure
>>> actually happens, and maybe even at that point basically throwing away
>>> everything except the shell of the publication/subscription (so the user
>>> can see that it failed and come in and properly drop it); I'm thinking
>>> about this as perhaps similar to a transaction being aborted.
>>
>> There are several problems with that. First this happens in historic
>> snapshot which can't write and on top of that we are in the middle of
>> error processing so we have our hands tied a bit, it's definitely going
>> to need bit of creative thinking to do this.
> 
> We can't write to things inside the database in a historic snapshot and
> we do have to deal with the fact that we're in error processing.  What
> about writing somewhere that's outside of the regular database system?
> Maybe a pg_logical/failed directory?  There's all the usual
> complications from that around dealing with durable writes (if we need
> to worry about that and I'm not sure that we do...  if we fail to
> persist a write saying "X failed" and we restart.. well, it's gonna fail
> again and we write it then), and cleaning things up as needed (but maybe
> this is handled as part of the DROP, and we WAL that, so we can re-do
> the removal of the failed marker file...), and if we need to think about
> what should happen on replicas (is there anything?).

That sounds pretty reasonable. Given that this is corner-case user error
we could perhaps do extra work to ensure things are fsynced even if it's
all not too fast...

> 
>> Second, and that's more soft issue (which is probably harder to solve)
>> what do we do with the slot and subscription. There is one failed
>> publication, but the subscription may be subscribed to 20 of them, do we
>> kill the whole subscription because of single failed publication? If we
>> don't do we continue replicating like nothing has happened but with data
>> in the failed publication missing (which can be considered data
>> loss/corruption from the view of user). If we stop replication, do we
>> clean the slot so that we don't keep back wal/catalog xmin forever
>> (which could lead to server stopping) or do we keep the slot so that
>> user can somehow fix the issue (reconfigure subscription to not care
>> about that publication for example) and continue replication without
>> further loss?
> 
> I would think we'd have to fail the whole publication if there's a
> failure for any part of it.  Replicating a partial set definitely sounds
> wrong to me.  Once we stop replication, yes, we should clean the slot
> and mark it failed so that we don't keep back WAL and so that we allow
> the catalog xmin to move forward so that the failed publication doesn't
> run the server out of disk space.
> 

I agree that continuing replication where some part of publication is
broken seems wrong and that we should stop replication at that point.

> If we really think there's a use-case for keeping the replication slot

It's not so much about use-case as it is about complete change of
behavior - there is no current error where we remove existing slot.
The use case for keeping slot is a) investigation of the issue, b) just
skipping the broken part of stream by advancing origin on subscription
and continuing replication, with some luck that can mean only single
table needs resyncing, which is better than rebuilding everything.

I think some kind of automated slot cleanup is desirable, but likely
separate feature that should be designed based on amount of outstanding
wal or something.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Re: row filtering for logical replication

От

Andres Freund

Дата:

03 февраля 2019 г., 10:14:39

Hi,

On 2018-11-23 13:15:08 -0300, Euler Taveira wrote:
> Besides the problem presented by Hironobu-san, I'm doing some cleanup
> and improving docs. I also forget to declare pg_publication_rel TOAST
> table.
> 
> Thanks for your review.

As far as I can tell, the patch has not been refreshed since. So I'm
marking this as returned with feedback for now. Please resubmit once
ready.

Greetings,

Andres Freund

Re: row filtering for logical replication

От

a.kondratov@postgrespro.ru

Дата:

27 августа 2019 г., 21:10:47

Hi Euler,

On 2019-02-03 13:14, Andres Freund wrote:
> 
> On 2018-11-23 13:15:08 -0300, Euler Taveira wrote:
>> Besides the problem presented by Hironobu-san, I'm doing some cleanup
>> and improving docs. I also forget to declare pg_publication_rel TOAST
>> table.
>> 
>> Thanks for your review.
> 
> As far as I can tell, the patch has not been refreshed since. So I'm
> marking this as returned with feedback for now. Please resubmit once
> ready.
> 

Do you have any plans for continuing working on this patch and 
submitting it again on the closest September commitfest? There are only 
a few days left. Anyway, I will be glad to review the patch if you do 
submit it, though I didn't yet dig deeply into the code.

I've rebased recently the entire patch set (attached) and it works fine. 
Your tap test is passed. Also I've added a new test case (see 0009 
attached) with real life example of bidirectional replication (BDR) 
utilising this new WHERE clause. This naive BDR is implemented using 
is_cloud flag, which is set to TRUE/FALSE on cloud/remote nodes 
respectively.

Although almost all new tests are passed, there is a problem with DELETE 
replication, so 1 out of 10 tests is failed. It isn't replicated if the 
record was created with is_cloud=TRUE on cloud, replicated to remote; 
then updated with is_cloud=FALSE on remote, replicated to cloud; then 
deleted on remote.

Regards
--
Alexey Kondratov
Postgres Professional https://www.postgrespro.com
Russian Postgres Company

Вложения

Re: row filtering for logical replication

От

Euler Taveira

Дата:

01 сентября 2019 г., 00:28:16

Em dom, 3 de fev de 2019 às 07:14, Andres Freund <andres@anarazel.de> escreveu:
>
> As far as I can tell, the patch has not been refreshed since. So I'm
> marking this as returned with feedback for now. Please resubmit once
> ready.
>
I fix all of the bugs pointed in this thread. I decide to disallow
UDFs in filters (it is safer for a first version). We can add this
functionality later. However, I'll check if allow "safe" functions
(aka builtin functions) are ok. I add more docs explaining that
expressions are executed with the role used for replication connection
and also that columns used in expressions must be part of PK or
REPLICA IDENTITY. I add regression tests.

Comments?



--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Вложения

Re: row filtering for logical replication

От

Euler Taveira

Дата:

01 сентября 2019 г., 00:31:52

Em ter, 27 de ago de 2019 às 18:10, <a.kondratov@postgrespro.ru> escreveu:
>
> Do you have any plans for continuing working on this patch and
> submitting it again on the closest September commitfest? There are only
> a few days left. Anyway, I will be glad to review the patch if you do
> submit it, though I didn't yet dig deeply into the code.
>
Sure. See my last email to this thread. I appreciate if you can review it.

> Although almost all new tests are passed, there is a problem with DELETE
> replication, so 1 out of 10 tests is failed. It isn't replicated if the
> record was created with is_cloud=TRUE on cloud, replicated to remote;
> then updated with is_cloud=FALSE on remote, replicated to cloud; then
> deleted on remote.
>
That's because you don't include is_cloud in PK or REPLICA IDENTITY. I
add a small note in docs.


--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: row filtering for logical replication

От

Alexey Zagarin

Дата:

01 сентября 2019 г., 05:25:41

I think that I also have found one shortcoming when using the setup described by Alexey Kondratov. The problem that I face is that if both (cloud and remote) tables already have data the moment I add the subscription, then the whole table is copied in both directions initially. Which leads to duplicated data and broken replication because COPY doesn't take into account the filtering condition. In case there are filters in a publication, the COPY command that is executed when adding a subscription (or altering one to refresh a publication) should also filter the data based on the same condition, e.g. COPY (SELECT * FROM ... WHERE ...) TO ...

The current workaround is to always use WITH copy_data = false when subscribing or refreshing, and then manually copy data with the above statement.

Alexey Zagarin

On 1 Sep 2019 12:11 +0700, Euler Taveira <euler@timbira.com.br>, wrote:

Em ter, 27 de ago de 2019 às 18:10, <a.kondratov@postgrespro.ru> escreveu:

Do you have any plans for continuing working on this patch and
submitting it again on the closest September commitfest? There are only
a few days left. Anyway, I will be glad to review the patch if you do
submit it, though I didn't yet dig deeply into the code.

Sure. See my last email to this thread. I appreciate if you can review it.

Although almost all new tests are passed, there is a problem with DELETE
replication, so 1 out of 10 tests is failed. It isn't replicated if the
record was created with is_cloud=TRUE on cloud, replicated to remote;
then updated with is_cloud=FALSE on remote, replicated to cloud; then
deleted on remote.

That's because you don't include is_cloud in PK or REPLICA IDENTITY. I
add a small note in docs.

--
Euler Taveira Timbira -
http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: row filtering for logical replication

От

Erik Rijkers

Дата:

01 сентября 2019 г., 09:09:11

On 2019-09-01 02:28, Euler Taveira wrote:
> Em dom, 3 de fev de 2019 às 07:14, Andres Freund <andres@anarazel.de> 
> escreveu:
>> 
>> As far as I can tell, the patch has not been refreshed since. So I'm
>> marking this as returned with feedback for now. Please resubmit once
>> ready.
>> 
> I fix all of the bugs pointed in this thread. I decide to disallow

> 0001-Remove-unused-atttypmod-column-from-initial-table-sy.patch
> 0002-Store-number-of-tuples-in-WalRcvExecResult.patch
> 0003-Refactor-function-create_estate_for_relation.patch
> 0004-Rename-a-WHERE-node.patch
> 0005-Row-filtering-for-logical-replication.patch
> 0006-Print-publication-WHERE-condition-in-psql.patch
> 0007-Publication-where-condition-support-for-pg_dump.patch
> 0008-Debug-for-row-filtering.patch

Hi,

The first 4 of these apply without error, but I can't get 0005 to apply. 
This is what I use:

patch --dry-run -b -l -F 5 -p 1 < 
/home/aardvark/download/pgpatches/0130/logrep_rowfilter/20190901/0005-Row-filtering-for-logical-replication.patch

checking file doc/src/sgml/catalogs.sgml
Hunk #1 succeeded at 5595 (offset 8 lines).
checking file doc/src/sgml/ref/alter_publication.sgml
checking file doc/src/sgml/ref/create_publication.sgml
checking file src/backend/catalog/pg_publication.c
checking file src/backend/commands/publicationcmds.c
Hunk #1 succeeded at 352 (offset 8 lines).
Hunk #2 succeeded at 381 (offset 8 lines).
Hunk #3 succeeded at 539 (offset 8 lines).
Hunk #4 succeeded at 570 (offset 8 lines).
Hunk #5 succeeded at 601 (offset 8 lines).
Hunk #6 succeeded at 626 (offset 8 lines).
Hunk #7 succeeded at 647 (offset 8 lines).
Hunk #8 succeeded at 679 (offset 8 lines).
Hunk #9 succeeded at 693 (offset 8 lines).
checking file src/backend/parser/gram.y
checking file src/backend/parser/parse_agg.c
checking file src/backend/parser/parse_expr.c
Hunk #4 succeeded at 3571 (offset -2 lines).
checking file src/backend/parser/parse_func.c
Hunk #1 succeeded at 2516 (offset -13 lines).
checking file src/backend/replication/logical/tablesync.c
checking file src/backend/replication/logical/worker.c
checking file src/backend/replication/pgoutput/pgoutput.c
Hunk #1 FAILED at 12.
Hunk #2 succeeded at 60 (offset 2 lines).
Hunk #3 succeeded at 336 (offset 2 lines).
Hunk #4 succeeded at 630 (offset 2 lines).
Hunk #5 succeeded at 647 (offset 2 lines).
Hunk #6 succeeded at 738 (offset 2 lines).
1 out of 6 hunks FAILED
checking file src/include/catalog/pg_publication.h
checking file src/include/catalog/pg_publication_rel.h
checking file src/include/catalog/toasting.h
checking file src/include/nodes/nodes.h
checking file src/include/nodes/parsenodes.h
Hunk #1 succeeded at 3461 (offset -1 lines).
Hunk #2 succeeded at 3486 (offset -1 lines).
checking file src/include/parser/parse_node.h
checking file src/include/replication/logicalrelation.h
checking file src/test/regress/expected/publication.out
Hunk #1 succeeded at 116 (offset 9 lines).
checking file src/test/regress/sql/publication.sql
Hunk #1 succeeded at 69 with fuzz 1 (offset 9 lines).
checking file src/test/subscription/t/013_row_filter.pl

perhaps that can be fixed?

thanks,

Erik Rijkers

Re: row filtering for logical replication

От

Euler Taveira

Дата:

01 сентября 2019 г., 23:43:57

Em dom, 1 de set de 2019 às 06:09, Erik Rijkers <er@xs4all.nl> escreveu:
>
> The first 4 of these apply without error, but I can't get 0005 to apply.
> This is what I use:
>
Erik, I generate a new patch set with patience diff algorithm. It
seems it applies cleanly.


--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Вложения

Re: row filtering for logical replication

От

Erik Rijkers

Дата:

02 сентября 2019 г., 23:08:25

On 2019-09-02 01:43, Euler Taveira wrote:
> Em dom, 1 de set de 2019 às 06:09, Erik Rijkers <er@xs4all.nl> 
> escreveu:
>> 
>> The first 4 of these apply without error, but I can't get 0005 to 
>> apply.
>> This is what I use:
>> 
> Erik, I generate a new patch set with patience diff algorithm. It
> seems it applies cleanly.
> 

It did apply cleanly, thanks.

But I can't get it to correctly do the partial replication in the 
attached pgbench-script (similar versions of which script I also used 
for earlier versions of the patch, last year).

There are complaints in the log (both pub and sub) like:
ERROR:  trying to store a heap tuple into wrong type of slot

I have no idea what causes that.

I attach a zip:

$ unzip -l logrep_rowfilter.zip
Archive:  logrep_rowfilter.zip
   Length      Date    Time    Name
---------  ---------- -----   ----
     17942  2019-09-03 00:47   logfile.6525
     10412  2019-09-03 00:47   logfile.6526
      6913  2019-09-03 00:47   logrep_rowfilter_2_nodes.sh
      3371  2019-09-03 00:47   output.txt
---------                     -------
     38638                     4 files

That bash script runs 2 instances (as compiled on my local setup so it 
will not run as-is) and tries for one minute to get a slice of the 
pgbench_accounts table replicated.  One minute is short but I wanted 
short logfiles; I have tried the same up to 20 minutes without the 
replication completing.  I'll try even longer but in the meantime I hope 
you can figure out why these errors occur.

thanks,

Erik Rijkers

Вложения

logrep_rowfilter.zip

Re: row filtering for logical replication

От

Alexey Zagarin

Дата:

03 сентября 2019 г., 03:15:44

There are complaints in the log (both pub and sub) like:
ERROR: trying to store a heap tuple into wrong type of slot

I have no idea what causes that.

Yeah, I've seen that too. It was fixed by Alexey Kondratov, in line 955 of 0005-Row-filtering-for-logical-replication.patch it should be &TTSOpsHeapTuple instead of &TTSOpsVirtual.

Re: row filtering for logical replication

От

Euler Taveira

Дата:

03 сентября 2019 г., 03:32:12

Em ter, 3 de set de 2019 às 00:16, Alexey Zagarin <zagarin@gmail.com> escreveu:
>
> There are complaints in the log (both pub and sub) like:
> ERROR: trying to store a heap tuple into wrong type of slot
>
> I have no idea what causes that.
>
>
> Yeah, I've seen that too. It was fixed by Alexey Kondratov, in line 955 of
0005-Row-filtering-for-logical-replication.patchit should be &TTSOpsHeapTuple instead of &TTSOpsVirtual. 
>
Ops... exact. That was an oversight while poking with different types of slots.


--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: row filtering for logical replication

От

Erik Rijkers

Дата:

03 сентября 2019 г., 05:50:29

On 2019-09-03 05:32, Euler Taveira wrote:
> Em ter, 3 de set de 2019 às 00:16, Alexey Zagarin <zagarin@gmail.com> 
> escreveu:
>> 
>> There are complaints in the log (both pub and sub) like:
>> ERROR: trying to store a heap tuple into wrong type of slot
>> 
>> I have no idea what causes that.
>> 
>> Yeah, I've seen that too. It was fixed by Alexey Kondratov, in line 
>> 955 of 0005-Row-filtering-for-logical-replication.patch it should be 
>> &TTSOpsHeapTuple instead of &TTSOpsVirtual.
>> 
> Ops... exact. That was an oversight while poking with different types 
> of slots.

OK, I'll consider Alexey Kondratov's set of patches as the current 
state-of-the-art then.  (They still apply.)

I found a problem where I'm not sure it's a bug:

The attached bash script does a test by setting up pgbench tables on 
both master and replica, and then sets up logical replication for a 
slice of pgbench_accounts. Then it does a short pgbench run, and loops 
until the results become identical(ok) (or breaks out after a certain 
time (NOK=not ok)).

It turns out this did not work until I added a wait state after the 
CREATE SUBSCRIPTION.  It always fails without the wait state, and always 
works with the wait state.

Do you agree this is a bug?

thanks (also to both Alexeys :))

Erik Rijkers

PS
by the way, this script won't run as-is on other machines; it has stuff 
particular to my local setup.

Вложения

logrep_rowfilter_2_nodes.sh

Re: row filtering for logical replication

От

Alexey Zagarin

Дата:

04 сентября 2019 г., 01:21:27

OK, I'll consider Alexey Kondratov's set of patches as the current
state-of-the-art then. (They still apply.)

Alexey's patch is the rebased version of previous Euler's patch set, with slot type mistake fixed, and adapted to current changes in the master branch. It also has testing improvements. On the other hand, the new patches from Euler include more fixes and the implementation of filtering in COPY (as far as I can tell from code) which addresses my particular pain point with BDR. Hope they'll be joined soon. :)

It turns out this did not work until I added a wait state after the
CREATE SUBSCRIPTION. It always fails without the wait state, and always
works with the wait state.

Do you agree this is a bug?

I'm not sure this is a bug as after the subscription is added (or a new table added to the publication and then the subscription is refreshed), the whole table is synchronized using COPY statement. Depending on size of the table it can take some time. You may want to check srsubstate in pg_subscription_rel instead of just sleep for more reliable implementation.

Alexey

Re: row filtering for logical replication

От

Euler Taveira

Дата:

04 сентября 2019 г., 13:11:14

Em ter, 3 de set de 2019 às 00:32, Euler Taveira
<euler@timbira.com.br> escreveu:
>
> Ops... exact. That was an oversight while poking with different types of slots.
>
Here is a rebased version including this small fix.


--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Em qua, 25 de set de 2019 às 08:08, Euler Taveira
<euler@timbira.com.br> escreveu:
>
> I'll send a patchset later today.
>
... and it is attached.


--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Вложения

Re: Re: row filtering for logical replication

От

"movead.li@highgo.ca"

Дата:

26 сентября 2019 г., 01:38:39

>Which regression?

Apply the 0001.patch~0005.patch and then do a 'make check', then there be a

failed item. And when you apply the 0006.patch, the failed item disappeared.

>There should be an error because you don't have a PK or REPLICA IDENTITY.

No. I have done the 'ALTER TABLE cities REPLICA IDENTITY FULL'.

>Even if you create a PK or REPLICA IDENTITY, it won't turn this UPDATE

>into a INSERT and send it to the other node (indeed UPDATE will be

>sent however there isn't a tuple to update). Also, filter columns must

>be in PK or REPLICA IDENTITY. I explain this in documentation.

You should considered the Result2:

On publication:

insert into cities values('t1',1,135);

update cities set altitude=300 where altitude=135;

postgres=# table cities ;

name | population | altitude

------+------------+----------

t1 | 123 | 134

t1 | 1 | 300

(2 rows)

On subscription:

ostgres=# table cities ;

name | population | altitude

------+------------+----------

t1 | 1 | 135

The tuple ('t1',1,135) appeared in both publication and subscription,
but after an update on publication, the tuple is disappeared on
publication and change nothing on subscription.

The same with Result1, they puzzled me today and I think they will
puzzle the users in the future. It should have a more wonderful design,
for example, a log to notify users that there be a problem during replication
at least.

---

Highgo Software (Canada/China/Pakistan)

URL : www.highgo.ca

EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

Re: row filtering for logical replication

От

Amit Langote

Дата:

25 ноября 2019 г., 02:38:51

Hi Euler,

Thanks for working on this.  I have reviewed the patches, as I too am
working on a patch related to logical replication [1].

On Thu, Sep 26, 2019 at 8:20 AM Euler Taveira <euler@timbira.com.br> wrote:
>
> Em qua, 25 de set de 2019 às 08:08, Euler Taveira
> <euler@timbira.com.br> escreveu:
> >
> > I'll send a patchset later today.
> >
> ... and it is attached.

Needed to be rebased, which I did, to be able to test them; patches attached.

Some comments:

* 0001: seems a no-brainer

* 0002: seems, um, unnecessary?  The only place ntuples will be used is here:

@@ -702,9 +702,8 @@ fetch_remote_table_info(char *nspname, char *relname,
                 (errmsg("could not fetch table info for table \"%s.%s\": %s",
                         nspname, relname, res->err)));

-    /* We don't know the number of rows coming, so allocate enough space. */
-    lrel->attnames = palloc0(MaxTupleAttributeNumber * sizeof(char *));
-    lrel->atttyps = palloc0(MaxTupleAttributeNumber * sizeof(Oid));
+    lrel->attnames = palloc0(res->ntuples * sizeof(char *));
+    lrel->atttyps = palloc0(res->ntuples * sizeof(Oid));

but you might as well use tuplestore_tuple_count(res->tuplestore).  My
point is that if ntuples that this patch is adding was widely useful
(as would be shown by the number of places that could be refactored to
use it), it would have been worthwhile to add it.

* 0003: seems fine to me.

* 0004: seems fine too, although maybe preproc.y should be updated too?

* 0005: naturally many comments here :)

+      <entry>Expression tree (in the form of a
+      <function>nodeToString()</function> representation) for the relation's

Minor nitpicking: "in the form of a" seems unnecessary.  Other places
that mention nodeToString() just say "in
<function>nodeToString()</function> representation"

+  Columns used in the <literal>WHERE</literal> clause must be part of the
+  primary key or be covered by <literal>REPLICA IDENTITY</literal> otherwise
+  <command>UPDATE</command> and <command>DELETE</command> operations will not
+  be replicated.
+  </para>

Can you please explain the reasoning behind this restriction.  Sorry
if this is already covered in the up-thread discussion.

 /*
+ * Gets list of PublicationRelationQuals for a publication.
+ */
+List *
+GetPublicationRelationQuals(Oid pubid)
+{
...
+        relqual->relation = table_open(pubrel->prrelid,
ShareUpdateExclusiveLock);

I think it's a bad idea to open the table in one file and rely on
something else in the other file closing it.  I know you're having it
to do it because you're using PublicationRelationQual to return
individual tables, but why not just store the table's OID in it and
only open and close the relation where it's needed.  Keeping the
opening and closing of relation close to each other is better as long
as it doesn't need to be done many times over in many different
functions.  In this case, pg_publication.c: publication_add_relation()
is the only place that needs to look at the open relation, so opening
and closing should both be done there.  Nothing else needs to look at
the open relation.

Actually, OpenTableList() should also not open the relation.  Then we
don't need CloseTableList().  I think it would be better to refactor
things around this and include the patch in this series.

+    /* Find all publications associated with the relation. */
+    pubrelsrel = table_open(PublicationRelRelationId, AccessShareLock);

I guess you meant:

/* Get all relations associated with this publication. */

+        relqual->whereClause = copyObject(qual_expr);

Is copying really necessary?

+    /*
+     * ALTER PUBLICATION ... DROP TABLE cannot contain a WHERE clause.  Use
+     * publication_table_list node (that accepts a WHERE clause) but forbid
+     * the WHERE clause in it.  The use of relation_expr_list node just for
+     * the DROP TABLE part does not worth the trouble.
+     */

This comment is not very helpful, as it's not clear what the various
names are referring to.  I'd just just write:

    /*
     * Although ALTER PUBLICATION's grammar allows WHERE clause to be
     * specified for DROP TABLE action, it doesn't makes sense to allow it.
     * We implement that rule here, instead of complicating grammar to enforce
     * it.
     */

+                         errmsg("cannot use a WHERE clause for
removing table from publication \"%s\"",

I think: s/for/when/g

+            /*
+             * Remove publication / relation mapping iif (i) table is not
+             * found in the new list or (ii) table is found in the new list,
+             * however, its qual does not match the old one (in this case, a
+             * simple tuple update is not enough because of the dependencies).
+             */

Aside from the typo on the 1st line (iif), I suggest writing this as:

            /*-----------
             * Remove the publication-table mapping if:
             *
             * 1) Table is not found the new list of tables
             *
             * 2) Table is being re-added with a different qual expression
             *
             * For (2), simply updating the existing tuple is not enough,
             * because of the qual expression's dependencies.
             */

+                 errmsg("functions are not allowed in WHERE"),

Maybe:

functions are now allowed in publication WHERE expressions

+            err = _("cannot use subquery in publication WHERE expression");

s/expression/expressions/g

+        case EXPR_KIND_PUBLICATION_WHERE:
+            return "publication expression";

Maybe:

publication WHERE expression
or
publication qual

-    int         natt;
+    int         n;

Are this and other related changes really needed?

+        appendStringInfoString(&cmd, "COPY (SELECT ");
+        /* list of attribute names */
+        first = true;
+        foreach(lc, attnamelist)
+        {
+            char       *col = strVal(lfirst(lc));
+
+            if (first)
+                first = false;
+            else
+                appendStringInfoString(&cmd, ", ");
+            appendStringInfo(&cmd, "%s", quote_identifier(col));
+        }

Hmm, why wouldn't SELECT * suffice?

+        estate = create_estate_for_relation(relation);
+
+        /* prepare context per tuple */
+        ecxt = GetPerTupleExprContext(estate);
+        oldcxt = MemoryContextSwitchTo(estate->es_query_cxt);
+        ecxt->ecxt_scantuple = ExecInitExtraTupleSlot(estate,
tupdesc, &TTSOpsHeapTuple);
...
+        ExecDropSingleTupleTableSlot(ecxt->ecxt_scantuple);
+        FreeExecutorState(estate);

Creating and destroying the EState (that too with the ResultRelInfo
that is never used) for every tuple seems wasteful.  You could store
the standalone ExprContext in RelationSyncEntry and use it for every
tuple.

+            /* evaluates row filter */
+            expr_type = exprType(qual);
+            expr = (Expr *) coerce_to_target_type(NULL, qual,
expr_type, BOOLOID, -1, COERCION_ASSIGNMENT, COERCE_IMPLICIT_CAST,
-1);
+            expr = expression_planner(expr);
+            expr_state = ExecInitExpr(expr, NULL);

Also, there appears to be no need to repeat this for every tuple?  I
think this should be done only once, that is, RelationSyncEntry.qual
should cache ExprState nodes, not bare Expr nodes.

Given the above comments, the following seems unnecessary:

+extern EState *create_estate_for_relation(Relation rel);

By the way, make check doesn't pass.  I see the following failure:

-    "public.testpub_rf_tbl3"  WHERE ((e > 300) AND (e < 500))
+    "public.testpub_rf_tbl3"

but I guess applying subsequent patches takes care of that.

* 0006 and 0007: small enough that I think it might be better to merge
them into 0005.

* 0008: no comments as it's not intended to be committed. :)

Thanks,
Amit

[1] https://commitfest.postgresql.org/25/2301/

Re: row filtering for logical replication

От

Amit Langote

Дата:

25 ноября 2019 г., 02:48:29

On Mon, Nov 25, 2019 at 11:38 AM Amit Langote <amitlangote09@gmail.com> wrote:
> Needed to be rebased, which I did, to be able to test them; patches attached.

Oops, really attached this time.

Thanks,
Amit

Вложения

Re: row filtering for logical replication

От

Michael Paquier

Дата:

28 ноября 2019 г., 02:32:01

On Mon, Nov 25, 2019 at 11:48:29AM +0900, Amit Langote wrote:
> On Mon, Nov 25, 2019 at 11:38 AM Amit Langote <amitlangote09@gmail.com> wrote:
>> Needed to be rebased, which I did, to be able to test them; patches attached.
>
> Oops, really attached this time.

Euler, this thread is waiting for input from you regarding the latest
comments from Amit.
--
Michael

Вложения

signature.asc

Re: row filtering for logical replication

От

Tomas Vondra

Дата:

16 января 2020 г., 21:57:30

On Thu, Nov 28, 2019 at 11:32:01AM +0900, Michael Paquier wrote:
>On Mon, Nov 25, 2019 at 11:48:29AM +0900, Amit Langote wrote:
>> On Mon, Nov 25, 2019 at 11:38 AM Amit Langote <amitlangote09@gmail.com> wrote:
>>> Needed to be rebased, which I did, to be able to test them; patches attached.
>>
>> Oops, really attached this time.
>
>Euler, this thread is waiting for input from you regarding the latest
>comments from Amit.

Euler, this patch is still in "waiting on author" since 11/25. Do you
plan to review changes made by Amit in the patches he submitted, or what
are your plans with this patch?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: row filtering for logical replication

От

Euler Taveira

Дата:

16 января 2020 г., 23:58:24

Em qui., 16 de jan. de 2020 às 18:57, Tomas Vondra
<tomas.vondra@2ndquadrant.com> escreveu:
>
> Euler, this patch is still in "waiting on author" since 11/25. Do you
> plan to review changes made by Amit in the patches he submitted, or what
> are your plans with this patch?
>
Yes, I'm working on Amit suggestions. I'll post a new patch as soon as possible.


--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: row filtering for logical replication

От

Craig Ringer

Дата:

21 января 2020 г., 07:32:45

On Fri, 17 Jan 2020 at 07:58, Euler Taveira <euler@timbira.com.br> wrote:
>
> Em qui., 16 de jan. de 2020 às 18:57, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> escreveu:
> >
> > Euler, this patch is still in "waiting on author" since 11/25. Do you
> > plan to review changes made by Amit in the patches he submitted, or what
> > are your plans with this patch?
> >
> Yes, I'm working on Amit suggestions. I'll post a new patch as soon as possible.

Great. I think this'd be nice to see.

Were you able to fully address the following points that came up in
the discussion?

* Make sure row filters cannot access non-catalog, non-user-catalog
relations i.e. can only use RelationIsAccessibleInLogicalDecoding rels

* Prevent filters from attempting to access attributes that may not be
WAL-logged in a given change record, or give them a way to test for
this. Unchanged TOASTed atts are not logged. There's also REPLICA
IDENTITY FULL to consider if exposing access to the old tuple in the
filter.

Also, while I'm not sure if it was raised earlier, experience with row
filtering in pglogical has shown that error handling is challenging.
Because row filters are read from a historic snapshot of the catalogs
you cannot change them or any SQL or plpgsql functions they use if a
problem causes an ERROR when executing the filter expression. You can
fix the current snapshot's definition but the decoding session won't
see it and will continue to ERROR. We don't really have a good answer
for that yet in pglogical; right now you have to either intervene with
low level tools or drop the subscription and re-create it. Neither of
which is ideal.

You can't just read the row filter from the current snapshot as the
relation definition (atts etc) may not match. Plus that creates a
variety of issues with which txns get which version of a row filter
applied during decoding, consistency between multiple subscribers,
etc.

One option I've thought about was a GUC that allows users to specify
what should be done for errors in row filter expressions: drop the row
as if the filter rejected it; pass the row as if the filter matched;
propagate the ERROR and end the decoding session (default).

I'd welcome ideas about this one. I don't think it's a showstopper for
accepting the feature either, we just have to document that great care
is required with any operator or function that could raise an error in
a row filter. But there are just so many often non-obvious ways you
can land up with an ERROR being thrown that I think it's a bit of a
user foot-gun.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise

Re: row filtering for logical replication

От

David Steele

Дата:

03 марта 2020 г., 17:39:03

Hi Euler,

On 1/21/20 2:32 AM, Craig Ringer wrote:
> On Fri, 17 Jan 2020 at 07:58, Euler Taveira <euler@timbira.com.br> wrote:
>>
>> Em qui., 16 de jan. de 2020 às 18:57, Tomas Vondra
>> <tomas.vondra@2ndquadrant.com> escreveu:
>>>
>>> Euler, this patch is still in "waiting on author" since 11/25. Do you
>>> plan to review changes made by Amit in the patches he submitted, or what
>>> are your plans with this patch?
>>>
>> Yes, I'm working on Amit suggestions. I'll post a new patch as soon as possible.
> 
> Great. I think this'd be nice to see.

The last CF for PG13 has started. Do you have a new patch ready?

Regards,
-- 
-David
david@pgmasters.net

Re: row filtering for logical replication

От

David Steele

Дата:

16 марта 2020 г., 13:58:27

On 3/3/20 12:39 PM, David Steele wrote:
> Hi Euler,
> 
> On 1/21/20 2:32 AM, Craig Ringer wrote:
>> On Fri, 17 Jan 2020 at 07:58, Euler Taveira <euler@timbira.com.br> wrote:
>>>
>>> Em qui., 16 de jan. de 2020 às 18:57, Tomas Vondra
>>> <tomas.vondra@2ndquadrant.com> escreveu:
>>>>
>>>> Euler, this patch is still in "waiting on author" since 11/25. Do you
>>>> plan to review changes made by Amit in the patches he submitted, or 
>>>> what
>>>> are your plans with this patch?
>>>>
>>> Yes, I'm working on Amit suggestions. I'll post a new patch as soon 
>>> as possible.
>>
>> Great. I think this'd be nice to see.
> 
> The last CF for PG13 has started. Do you have a new patch ready?

I have marked this patch Returned with Feedback since no new patch has 
been posted.

Please submit to a future CF when a new patch is available.

Regards,
-- 
-David
david@pgmasters.net

Re: row filtering for logical replication

От

Önder Kalacı

Дата:

17 декабря 2020 г., 06:43:30

Hi all,

I'm also interested in this patch. I rebased the changes to the current master branch and attached. The rebase had two issues. First, patch-8 was conflicting, and that seems only helpful for debugging purposes during development. So, I dropped it for simplicity. Second, the changes have a conflict with `publish_via_partition_root` changes. I tried to fix the issues, but ended-up having a limitation for now. The limitation is that "cannot create publication with WHERE clause on the partitioned table without publish_via_partition_root is set to true". This restriction can be lifted, though I left out for the sake of focusing on the some issues that I observed on this patch.

Please see my review:

+ if (list_length(relentry->qual) > 0)

+ {

+ HeapTuple old_tuple;

+ HeapTuple new_tuple;

+ TupleDesc tupdesc;

+ EState *estate;

+ ExprContext *ecxt;

+ MemoryContext oldcxt;

+ ListCell *lc;

+ bool matched = true;

+ old_tuple = change->data.tp.oldtuple ? &change->data.tp.oldtuple->tuple : NULL;

+ new_tuple = change->data.tp.newtuple ? &change->data.tp.newtuple->tuple : NULL;

+ tupdesc = RelationGetDescr(relation);

+ estate = create_estate_for_relation(relation);

+ /* prepare context per tuple */

+ ecxt = GetPerTupleExprContext(estate);

+ oldcxt = MemoryContextSwitchTo(estate->es_query_cxt);

+ ecxt->ecxt_scantuple = ExecInitExtraTupleSlot(estate, tupdesc, &TTSOpsHeapTuple);

+ ExecStoreHeapTuple(new_tuple ? new_tuple : old_tuple, ecxt->ecxt_scantuple, false);

+ foreach(lc, relentry->qual)

+ {

+ Node *qual;

+ ExprState *expr_state;

+ Expr *expr;

+ Oid expr_type;

+ Datum res;

+ bool isnull;

+ qual = (Node *) lfirst(lc);

+ /* evaluates row filter */

+ expr_type = exprType(qual);

+ expr = (Expr *) coerce_to_target_type(NULL, qual, expr_type, BOOLOID, -1, COERCION_ASSIGNMENT, COERCE_IMPLICIT_CAST, -1);

+ expr = expression_planner(expr);

+ expr_state = ExecInitExpr(expr, NULL);

+ res = ExecEvalExpr(expr_state, ecxt, &isnull);

+ /* if tuple does not match row filter, bail out */

+ if (!DatumGetBool(res) || isnull)

+ {

+ matched = false;

+ break;

+ }

+ MemoryContextSwitchTo(oldcxt);

The above part can be considered the core of the logic, executed per tuple. As far as I can see, it has two downsides.

First, calling `expression_planner()` for every tuple can be quite expensive. I created a sample table, loaded data and ran a quick benchmark to see its effect. I attached the very simple script that I used to reproduce the issue on my laptop. I'm pretty sure you can find nicer ways of doing similar perf tests, just sharing as a reference.

The idea of the test is to add a WHERE clause to a table, but none of the tuples are filtered out. They just go through this code-path and send it to the remote node.

#rows Patched | Master

1M 00:00:25.067536 | 00:00:16.633988

10M 00:04:50.770791 | 00:02:40.945358

So, it seems a significant overhead to me. What do you think?

Secondly, probably more importantly, allowing any operator is as dangerous as allowing any function as users can create/overload operator(s). For example, assume that users create an operator which modifies the table that is being filtered out:

```

CREATE OR REPLACE FUNCTION function_that_modifies_table(left_art INTEGER, right_arg INTEGER)

RETURNS BOOL AS

BEGIN

INSERT INTO test SELECT * FROM test;

return left_art > right_arg;

END;

$$ LANGUAGE PLPGSQL VOLATILE;

CREATE OPERATOR >>= (

PROCEDURE = function_that_modifies_table,

LEFTARG = INTEGER,

RIGHTARG = INTEGER

);

CREATE PUBLICATION pub FOR TABLE test WHERE (key >>= 0);

With the above, we seem to be in trouble. Although the above is an extreme example, it felt useful to share to the extent of the problem. We probably cannot allow any free-form SQL to be on the filters.

To overcome these issues, one approach could be to rely on known safe operators and functions. I believe the btree and hash operators should provide a pretty strong coverage across many use cases. As far as I can see, the procs that the following query returns can be our baseline:

```

select DISTINCT amproc.amproc::regproc AS opfamily_procedure

from pg_am am,

pg_opfamily opf,

pg_amproc amproc

where opf.opfmethod = am.oid

and amproc.amprocfamily = opf.oid

order by

opfamily_procedure;

```

With that, we aim to prevent users easily shooting themselves by the foot.

The other problematic area was the performance, as calling `expression_planner()` for every tuple can be very expensive. To avoid that, it might be considered to ask users to provide a function instead of a free form WHERE clause, such that if the function returns true, the tuple is sent. The allowed functions need to be immutable SQL functions with bool return type. As we can parse the SQL functions, we should be able to allow only functions that rely on the above mentioned procs. We can apply as many restrictions (such as no modification query) as possible. For example, see below:

```

CREATE OR REPLACE function filter_tuples_for_test(int) returns bool as

$body$

select $1 > 100;

$body$

language sql immutable;

CREATE PUBLICATION pub FOR TABLE test FILTER = filter_tuples_for_tes(key);

```

In terms of performance, calling the function should avoid calling the `expression_planner()` and yield better performance. Though, this needs to be verified.

If such an approach makes sense, I'd be happy to work on the patch. Please provide me feedback.

Thanks,

Onder KALACI

Software Engineer at Microsoft &

Developing the Citus database extension for PostgreSQL

David Steele <david@pgmasters.net>, 16 Ara 2020 Çar, 21:43 tarihinde şunu yazdı:

On 3/3/20 12:39 PM, David Steele wrote:
> Hi Euler,
>
> On 1/21/20 2:32 AM, Craig Ringer wrote:
>> On Fri, 17 Jan 2020 at 07:58, Euler Taveira <euler@timbira.com.br> wrote:
>>>
>>> Em qui., 16 de jan. de 2020 às 18:57, Tomas Vondra
>>> <tomas.vondra@2ndquadrant.com> escreveu:
>>>>
>>>> Euler, this patch is still in "waiting on author" since 11/25. Do you
>>>> plan to review changes made by Amit in the patches he submitted, or
>>>> what
>>>> are your plans with this patch?
>>>>
>>> Yes, I'm working on Amit suggestions. I'll post a new patch as soon
>>> as possible.
>>
>> Great. I think this'd be nice to see.
>
> The last CF for PG13 has started. Do you have a new patch ready?

I have marked this patch Returned with Feedback since no new patch has
been posted.

Please submit to a future CF when a new patch is available.

Regards,
--
-David
david@pgmasters.net

Вложения

Re: row filtering for logical replication

От

Masahiko Sawada

Дата:

29 декабря 2020 г., 08:08:11

Hi Önder,

On Thu, Dec 17, 2020 at 3:43 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
> Hi all,
>
> I'm also interested in this patch. I rebased the changes to the current master branch and attached. The rebase had
twoissues. First, patch-8 was conflicting, and that seems only helpful for debugging purposes during development. So, I
droppedit for simplicity. Second, the changes have a conflict with `publish_via_partition_root` changes. I tried to fix
theissues, but ended-up having a limitation for now. The limitation is that "cannot create publication with WHERE
clauseon the partitioned table without publish_via_partition_root is set to true". This restriction can be lifted,
thoughI left out for the sake of focusing on the some issues that I observed on this patch. 
>
> Please see my review:
>
> +       if (list_length(relentry->qual) > 0)
> +       {
> +               HeapTuple       old_tuple;
> +               HeapTuple       new_tuple;
> +               TupleDesc       tupdesc;
> +               EState     *estate;
> +               ExprContext *ecxt;
> +               MemoryContext oldcxt;
> +               ListCell   *lc;
> +               bool            matched = true;
> +
> +               old_tuple = change->data.tp.oldtuple ? &change->data.tp.oldtuple->tuple : NULL;
> +               new_tuple = change->data.tp.newtuple ? &change->data.tp.newtuple->tuple : NULL;
> +               tupdesc = RelationGetDescr(relation);
> +               estate = create_estate_for_relation(relation);
> +
> +               /* prepare context per tuple */
> +               ecxt = GetPerTupleExprContext(estate);
> +               oldcxt = MemoryContextSwitchTo(estate->es_query_cxt);
> +               ecxt->ecxt_scantuple = ExecInitExtraTupleSlot(estate, tupdesc, &TTSOpsHeapTuple);
> +
> +               ExecStoreHeapTuple(new_tuple ? new_tuple : old_tuple, ecxt->ecxt_scantuple, false);
> +
> +               foreach(lc, relentry->qual)
> +               {
> +                       Node       *qual;
> +                       ExprState  *expr_state;
> +                       Expr       *expr;
> +                       Oid                     expr_type;
> +                       Datum           res;
> +                       bool            isnull;
> +
> +                       qual = (Node *) lfirst(lc);
> +
> +                       /* evaluates row filter */
> +                       expr_type = exprType(qual);
> +                       expr = (Expr *) coerce_to_target_type(NULL, qual, expr_type, BOOLOID, -1,
COERCION_ASSIGNMENT,COERCE_IMPLICIT_CAST, -1); 
> +                       expr = expression_planner(expr);
> +                       expr_state = ExecInitExpr(expr, NULL);
> +                       res = ExecEvalExpr(expr_state, ecxt, &isnull);
> +
> +                       /* if tuple does not match row filter, bail out */
> +                       if (!DatumGetBool(res) || isnull)
> +                       {
> +                               matched = false;
> +                               break;
> +                       }
> +               }
> +
> +               MemoryContextSwitchTo(oldcxt);
> +
>
>
> The above part can be considered the core of the logic, executed per tuple. As far as I can see, it has two
downsides.
>
> First, calling `expression_planner()` for every tuple can be quite expensive. I created a sample table, loaded data
andran a quick benchmark to see its effect. I attached the very simple script that I used to reproduce the issue on my
laptop.I'm pretty sure you can find nicer ways of doing similar perf tests, just sharing as a reference. 
>
> The idea of the test is to add a WHERE clause to a table, but none of the tuples are filtered out. They just go
throughthis code-path and send it to the remote node. 
>
> #rows       Patched    | Master
> 1M  00:00:25.067536    | 00:00:16.633988
> 10M  00:04:50.770791    | 00:02:40.945358
>
>
> So, it seems a significant overhead to me. What do you think?
>
> Secondly, probably more importantly, allowing any operator is as dangerous as allowing any function as users can
create/overloadoperator(s). For example, assume that users create an operator which modifies the table that is being
filteredout: 
>
> ```
> CREATE OR REPLACE FUNCTION function_that_modifies_table(left_art INTEGER, right_arg INTEGER)
> RETURNS BOOL AS
> $$
> BEGIN
>
>   INSERT INTO test SELECT * FROM test;
>
>   return left_art > right_arg;
>  END;
> $$ LANGUAGE PLPGSQL VOLATILE;
>
> CREATE OPERATOR >>= (
>   PROCEDURE = function_that_modifies_table,
>   LEFTARG = INTEGER,
>   RIGHTARG = INTEGER
> );
>
> CREATE PUBLICATION pub FOR TABLE test WHERE (key >>= 0);
> ``
>
> With the above, we seem to be in trouble. Although the above is an extreme example, it felt useful to share to the
extentof the problem. We probably cannot allow any free-form SQL to be on the filters. 
>
> To overcome these issues, one approach could be to rely on known safe operators and functions. I believe the btree
andhash operators should provide a pretty strong coverage across many use cases. As far as I can see, the procs that
thefollowing query returns can be our baseline: 
>
> ```
> select   DISTINCT amproc.amproc::regproc AS opfamily_procedure
> from     pg_am am,
>          pg_opfamily opf,
>          pg_amproc amproc
> where    opf.opfmethod = am.oid
> and      amproc.amprocfamily = opf.oid
> order by
>          opfamily_procedure;
> ```
>
> With that, we aim to prevent users easily shooting themselves by the foot.
>
> The other problematic area was the performance, as calling `expression_planner()` for every tuple can be very
expensive.To avoid that, it might be considered to ask users to provide a function instead of a free form WHERE clause,
suchthat if the function returns true, the tuple is sent. The allowed functions need to be immutable SQL functions with
boolreturn type. As we can parse the  SQL functions, we should be able to allow only functions that rely on the above
mentionedprocs. We can apply as many restrictions (such as no modification query) as possible. For example, see below: 
> ```
>
> CREATE OR REPLACE function filter_tuples_for_test(int) returns bool as
> $body$
>     select $1 > 100;
> $body$
> language sql immutable;
>
> CREATE PUBLICATION pub FOR TABLE test FILTER = filter_tuples_for_tes(key);
> ```
>
> In  terms of performance, calling the function should avoid calling the `expression_planner()` and yield better
performance.Though, this needs to be verified. 
>
> If such an approach makes sense, I'd be happy to work on the patch. Please provide me feedback.
>

You sent in your patch to pgsql-hackers on Dec 17, but you did not
post it to the next CommitFest[1] (I found the old entry of this
patch[2] but it's marked as "Returned with feedback"). If this was
intentional, then you need to take no action.  However, if you want
your patch to be reviewed as part of the upcoming CommitFest, then you
need to add it yourself before 2021-01-01 AoE[3]. Thanks for your
contributions.

Regards,

[1] https://commitfest.postgresql.org/31/
[2] https://commitfest.postgresql.org/20/1862/
[2] https://en.wikipedia.org/wiki/Anywhere_on_Earth

--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/

Re: row filtering for logical replication

От

Önder Kalacı

Дата:

04 января 2021 г., 13:37:39

Hi Masahiko,

You sent in your patch to pgsql-hackers on Dec 17, but you did not
post it to the next CommitFest[1] (I found the old entry of this
patch[2] but it's marked as "Returned with feedback"). If this was
intentional, then you need to take no action. However, if you want
your patch to be reviewed as part of the upcoming CommitFest, then you
need to add it yourself before 2021-01-01 AoE[3]. Thanks for your
contributions.

Thanks for letting me know of this, I added this patch to the next commit fest before 2021-01-01 AoE[3].

I'm also attaching the updated commits so that the tests pass on the CI.

Thanks,

Onder KALACI

Software Engineer at Microsoft &

Developing the Citus database extension for PostgreSQL

Вложения

Re: row filtering for logical replication

От

Andres Freund

Дата:

28 января 2021 г., 02:20:32

Hi,

On 2020-12-17 09:43:30 +0300, Önder Kalacı wrote:
> The above part can be considered the core of the logic, executed per tuple.
> As far as I can see, it has two downsides.
> 
> First, calling `expression_planner()` for every tuple can be quite
> expensive. I created a sample table, loaded data and ran a quick benchmark
> to see its effect. I attached the very simple script that I used to
> reproduce the issue on my laptop. I'm pretty sure you can find nicer ways
> of doing similar perf tests, just sharing as a reference.
> 
> The idea of the test is to add a WHERE clause to a table, but none of the
> tuples are filtered out. They just go through this code-path and send it to
> the remote node.
> 
> #rows       Patched    | Master
> 1M  00:00:25.067536    | 00:00:16.633988
> 10M  00:04:50.770791    | 00:02:40.945358
> 
> 
> So, it seems a significant overhead to me. What do you think?

That seems almost prohibitively expensive. I think at the very least
some of this work would need to be done in a cached manner, e.g. via
get_rel_sync_entry().

> Secondly, probably more importantly, allowing any operator is as dangerous
> as allowing any function as users can create/overload operator(s).

That's not safe, indeed. It's not even just create/overloading
operators, as far as I can tell the expression can contain just plain
function calls.

The issue also isn't primarily that the user can overload functions,
it's that logical decoding is a limited environment, and not everything
is safe to do within. You e.g. only catalog tables can be
accessed. Therefore I don't think we can allow arbitrary expressions.

> The other problematic area was the performance, as calling
> `expression_planner()` for every tuple can be very expensive. To avoid
> that, it might be considered to ask users to provide a function instead of
> a free form WHERE clause, such that if the function returns true, the tuple
> is sent. The allowed functions need to be immutable SQL functions with bool
> return type. As we can parse the  SQL functions, we should be able to allow
> only functions that rely on the above mentioned procs. We can apply as many
> restrictions (such as no modification query) as possible. For example, see
> below:
> ```

I don't think that would get us very far.

From a safety aspect: A function's body can be changed by the user at
any time, therefore we cannot rely on analyses of the function's body.

From a performance POV: SQL functions are planned at every invocation,
so that'd not buy us much either.

I think what you would have to do instead is to ensure that the
expression is "simple enough", and then process it into a cheaply
executable format in get_rel_sync_entry(). I'd suggest that in the first
version you just allow a simple ANDed list of 'foo.bar op constant'
expressions.

Does that make sense?

Greetings,

Andres Freund

Re: row filtering for logical replication

От

"Euler Taveira"

Дата:

01 февраля 2021 г., 00:23:04

On Mon, Mar 16, 2020, at 10:58 AM, David Steele wrote:

Please submit to a future CF when a new patch is available.

Hi,

This is another version of the row filter patch. Patch summary:

0001: refactor to remove dead code

0002: grammar refactor for row filter

0003: core code, documentation, and tests

0004: psql code

0005: pg_dump support

0006: debug messages (only for test purposes)

0007: measure row filter overhead (only for test purposes)

From the previous version I incorporated Amit's suggestions [1], improve documentation and tests. I refactored to code to make it simple to read (break the row filter code into functions). This new version covers the new parameter publish_via_partition_root that was introduced (cf 83fd4532a7).

Regarding function prohibition, I wouldn't like to open a can of worms (see previous discussions in this thread). Simple expressions covers most of the use cases that I worked with until now. This prohibition can be removed in another patch after some careful analysis.

I did some limited tests and didn't observe some excessive CPU usage while testing this patch tough I agree with Andres that retain some expression context into a cache would certainly speed up this piece of code. I measured the row filter overhead in my i7 (see 0007) and got:

mean: 92.49 us

stddev: 32.63 us

median: 83.45 us

min-max: [11.13 .. 2731.55] us

percentile(95): 117.76 us

[1] https://www.postgresql.org/message-id/CA%2BHiwqG3Jz-cRS%3D4gqXmZDjDAi%3D%3D19GvrFCCqAawwHcOCEn4fQ%40mail.gmail.com

Euler Taveira

EnterpriseDB: https://www.enterprisedb.com/

Вложения

Re: row filtering for logical replication

От

japin

Дата:

01 февраля 2021 г., 09:11:33

On Mon, 01 Feb 2021 at 08:23, Euler Taveira <euler@eulerto.com> wrote:
> On Mon, Mar 16, 2020, at 10:58 AM, David Steele wrote:
>> Please submit to a future CF when a new patch is available.
> Hi,
>
> This is another version of the row filter patch. Patch summary:
>
> 0001: refactor to remove dead code
> 0002: grammar refactor for row filter
> 0003: core code, documentation, and tests
> 0004: psql code
> 0005: pg_dump support
> 0006: debug messages (only for test purposes)
> 0007: measure row filter overhead (only for test purposes)
>

Thanks for updating the patch.  Here are some comments:

(1)
+         <para>
+          If this parameter is <literal>false</literal>, it uses the
+          <literal>WHERE</literal> clause from the partition; otherwise,the
+          <literal>WHERE</literal> clause from the partitioned table is used.
          </para>

otherwise,the -> otherwise, the

(2)
+  <para>
+  Columns used in the <literal>WHERE</literal> clause must be part of the
+  primary key or be covered by <literal>REPLICA IDENTITY</literal> otherwise
+  <command>UPDATE</command> and <command>DELETE</command> operations will not
+  be replicated.
+  </para>
+

IMO we should indent one space here.

(3)
+
+  <para>
+  The <literal>WHERE</literal> clause expression is executed with the role used
+  for the replication connection.
+  </para>

Same as (2).

The documentation says:

>  Columns used in the <literal>WHERE</literal> clause must be part of the
>  primary key or be covered by <literal>REPLICA IDENTITY</literal> otherwise
>  <command>UPDATE</command> and <command>DELETE</command> operations will not
>  be replicated.

Why we need this limitation? Am I missing something?

When I tested, I find that the UPDATE can be replicated, while the DELETE
cannot be replicated.  Here is my test-case:

    -- 1. Create tables and publications on publisher
    CREATE TABLE t1 (a int primary key, b int);
        CREATE TABLE t2 (a int primary key, b int);
        INSERT INTO t1 VALUES (1, 11);
        INSERT INTO t2 VALUES (1, 11);
    CREATE PUBLICATION mypub1 FOR TABLE t1;
        CREATE PUBLICATION mypub2 FOR TABLE t2 WHERE (b > 10);

    -- 2. Create tables and subscriptions on subscriber
        CREATE TABLE t1 (a int primary key, b int);
        CREATE TABLE t2 (a int primary key, b int);
        CREATE SUBSCRIPTION mysub1 CONNECTION 'host=localhost port=8765 dbname=postgres' PUBLICATION mypub1;
        CREATE SUBSCRIPTION mysub2 CONNECTION 'host=localhost port=8765 dbname=postgres' PUBLICATION mypub2;

    -- 3. Check publications on publisher
        postgres=# \dRp+
                               Publication mypub1
     Owner | All tables | Inserts | Updates | Deletes | Truncates | Via root
    -------+------------+---------+---------+---------+-----------+----------
     japin | f          | t       | t       | t       | t         | f
    Tables:
        "public.t1"
    
                               Publication mypub2
     Owner | All tables | Inserts | Updates | Deletes | Truncates | Via root
    -------+------------+---------+---------+---------+-----------+----------
     japin | f          | t       | t       | t       | t         | f
    Tables:
        "public.t2"  WHERE (b > 10)

    -- 4. Check initialization data on subscriber
    postgres=# table t1;
     a | b
    ---+----
     1 | 11
    (1 row)
    
    postgres=# table t2;
     a | b
    ---+----
     1 | 11
    (1 row)

    -- 5. The update on publisher
    postgres=# update t1 set b = 111 where b = 11;
    UPDATE 1
    postgres=# table t1;
     a |  b
    ---+-----
     1 | 111
    (1 row)

    postgres=# update t2 set b = 111 where b = 11;
    UPDATE 1
    postgres=# table t2;
     a |  b
    ---+-----
     1 | 111
    (1 row)

    -- 6. check the updated records on subscriber
    postgres=# table  t1;
     a |  b
    ---+-----
     1 | 111
    (1 row)
    
    postgres=# table  t2;
     a |  b
    ---+-----
     1 | 111
    (1 row)

    -- 7. Delete records on publisher
    postgres=# delete from t1 where b = 111;
    DELETE 1
    postgres=# table t1;
     a | b
    ---+---
    (0 rows)
    
    postgres=# delete from t2 where b = 111;
    DELETE 1
    postgres=# table t2;
     a | b
    ---+---
    (0 rows)

    -- 8. Check the deleted records on subscriber
    postgres=# table t1;
     a | b
    ---+---
    (0 rows)
    
    postgres=# table t2;
     a |  b
    ---+-----
     1 | 111
    (1 row)

I do a simple debug, and find that the pgoutput_row_filter() return false when I
execute "delete from t2 where b = 111;".

Does the publication only load the REPLICA IDENTITY columns into oldtuple when we
execute DELETE? So the pgoutput_row_filter() cannot find non REPLICA IDENTITY
columns, which cause it return false, right?  If that's right, the UPDATE might
not be limitation by REPLICA IDENTITY, because all columns are in newtuple,
isn't it?

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: row filtering for logical replication

От

"Euler Taveira"

Дата:

01 февраля 2021 г., 19:11:50

On Mon, Feb 1, 2021, at 6:11 AM, japin wrote:

Thanks for updating the patch. Here are some comments:

Thanks for your review. I updated the documentation accordingly.

The documentation says:

> Columns used in the <literal>WHERE</literal> clause must be part of the
> primary key or be covered by <literal>REPLICA IDENTITY</literal> otherwise
> <command>UPDATE</command> and <command>DELETE</command> operations will not
> be replicated.

The UPDATE is an oversight from a previous version.

Does the publication only load the REPLICA IDENTITY columns into oldtuple when we
execute DELETE? So the pgoutput_row_filter() cannot find non REPLICA IDENTITY
columns, which cause it return false, right? If that's right, the UPDATE might
not be limitation by REPLICA IDENTITY, because all columns are in newtuple,
isn't it?

No. oldtuple could possibly be available for UPDATE and DELETE. However, row

filter consider only one tuple for filtering. INSERT has only newtuple; row

filter uses it. UPDATE has newtuple and optionally oldtuple (if it has PK or

REPLICA IDENTITY); row filter uses newtuple. DELETE optionally has only

oldtuple; row filter uses it (if available). Keep in mind, if the expression

evaluates to NULL, it returns false and the row won't be replicated.

After the commit 3696a600e2, the last patch does not apply cleanly. I'm

attaching another version to address the documentation issues.

Euler Taveira

EnterpriseDB: https://www.enterprisedb.com/

Вложения

Re: row filtering for logical replication

От

Michael Paquier

Дата:

02 февраля 2021 г., 05:02:53

On Mon, Feb 01, 2021 at 04:11:50PM -0300, Euler Taveira wrote:
> After the commit 3696a600e2, the last patch does not apply cleanly. I'm
> attaching another version to address the documentation issues.

I have bumped into this thread, and applied 0001.  My guess is that
one of the patches developped originally for logical replication
defined atttypmod in LogicalRepRelation, but has finished by not using
it.  Nice catch.
--
Michael

Вложения

signature.asc

Re: row filtering for logical replication

От

japin

Дата:

02 февраля 2021 г., 05:37:31

On Tue, 02 Feb 2021 at 03:11, Euler Taveira <euler@eulerto.com> wrote:
> On Mon, Feb 1, 2021, at 6:11 AM, japin wrote:
>> Thanks for updating the patch.  Here are some comments:
> Thanks for your review. I updated the documentation accordingly.
>
>> The documentation says:
>> 
>> >  Columns used in the <literal>WHERE</literal> clause must be part of the
>> >  primary key or be covered by <literal>REPLICA IDENTITY</literal> otherwise
>> >  <command>UPDATE</command> and <command>DELETE</command> operations will not
>> >  be replicated.
> The UPDATE is an oversight from a previous version.
>
>> 
>> Does the publication only load the REPLICA IDENTITY columns into oldtuple when we
>> execute DELETE? So the pgoutput_row_filter() cannot find non REPLICA IDENTITY
>> columns, which cause it return false, right?  If that's right, the UPDATE might
>> not be limitation by REPLICA IDENTITY, because all columns are in newtuple,
>> isn't it?
> No. oldtuple could possibly be available for UPDATE and DELETE. However, row
> filter consider only one tuple for filtering. INSERT has only newtuple; row
> filter uses it.  UPDATE has newtuple and optionally oldtuple (if it has PK or
> REPLICA IDENTITY); row filter uses newtuple. DELETE optionally has only
> oldtuple; row filter uses it (if available). Keep in mind, if the expression
> evaluates to NULL, it returns false and the row won't be replicated.
>

Thanks for your clarification.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: row filtering for logical replication

От

japin

Дата:

02 февраля 2021 г., 11:16:23

On Tue, 02 Feb 2021 at 13:02, Michael Paquier <michael@paquier.xyz> wrote:
> On Mon, Feb 01, 2021 at 04:11:50PM -0300, Euler Taveira wrote:
>> After the commit 3696a600e2, the last patch does not apply cleanly. I'm
>> attaching another version to address the documentation issues.
>
> I have bumped into this thread, and applied 0001.  My guess is that
> one of the patches developped originally for logical replication
> defined atttypmod in LogicalRepRelation, but has finished by not using
> it.  Nice catch.

Since the 0001 patch already be commited (4ad31bb2ef), we can remove it.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: row filtering for logical replication

От

japin

Дата:

02 февраля 2021 г., 11:38:12

On Tue, 02 Feb 2021 at 19:16, japin <japinli@hotmail.com> wrote:
> On Tue, 02 Feb 2021 at 13:02, Michael Paquier <michael@paquier.xyz> wrote:
>> On Mon, Feb 01, 2021 at 04:11:50PM -0300, Euler Taveira wrote:
>>> After the commit 3696a600e2, the last patch does not apply cleanly. I'm
>>> attaching another version to address the documentation issues.
>>
>> I have bumped into this thread, and applied 0001.  My guess is that
>> one of the patches developped originally for logical replication
>> defined atttypmod in LogicalRepRelation, but has finished by not using
>> it.  Nice catch.
>
> Since the 0001 patch already be commited (4ad31bb2ef), we can remove it.

In 0003 patch, function GetPublicationRelationQuals() has been defined, but it
never used.  So why should we define it?

$ grep 'GetPublicationRelationQuals' -rn src/
src/include/catalog/pg_publication.h:116:extern List *GetPublicationRelationQuals(Oid pubid);
src/backend/catalog/pg_publication.c:347:GetPublicationRelationQuals(Oid pubid)

If we must keep it, here are some comments on it.

(1)
value_datum = heap_getattr(tup, Anum_pg_publication_rel_prqual, RelationGetDescr(pubrelsrel), &isnull);

It looks too long, we can split it into two lines.

(2)
Since qual_value only used in "if (!isnull)" branch, so we can narrow it's scope.

(3)
Should we free the memory for qual_value?

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: row filtering for logical replication

От

"Euler Taveira"

Дата:

02 февраля 2021 г., 12:34:34

On Tue, Feb 2, 2021, at 8:38 AM, japin wrote:

In 0003 patch, function GetPublicationRelationQuals() has been defined, but it
never used. So why should we define it?

Thanks for taking a look again. It is an oversight. It was introduced in an

attempt to refactor ALTER PUBLICATION SET TABLE. In AlterPublicationTables, we

could possibly keep some publication-table mappings that does not change,

however, since commit 3696a600e2, it is required to find the qual for all

inheritors (see GetPublicationRelations). I explain this decision in the

following comment:

* Remove all publication-table mappings. We could possibly

* remove (i) tables that are not found in the new table list and

* (ii) tables that are being re-added with a different qual

* expression. For (ii), simply updating the existing tuple is not

* enough, because of qual expression dependencies.

I will post a new patch set later.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: row filtering for logical replication

От

Rahila Syed

Дата:

18 марта 2021 г., 10:51:55

Hi Euler,

Please find below some review comments,

1.
+
+ <row>
+ <entry><structfield>prqual</structfield></entry>
+ <entry><type>pg_node_tree</type></entry>
+ <entry></entry>
+ <entry>Expression tree (in <function>nodeToString()</function>
+ representation) for the relation's qualifying condition</entry>
+ </row>

I think the docs are being incorrectly updated to add a column to pg_partitioned_table

instead of pg_publication_rel.

2. +typedef struct PublicationRelationQual

+{
+ Oid relid;
+ Relation relation;

+ Node *whereClause;
+} PublicationRelationQual;

Can this be given a more generic name like PublicationRelationInfo, so that the same struct
can be used to store additional relation information in future, for ex. column names, if column filtering is introduced.

3. Also, in the above structure, it seems that we can do with storing just relid and derive relation information from it

using table_open when needed. Am I missing something?

4. Currently in logical replication, I noticed that an UPDATE is being applied on the subscriber even if the column values

are unchanged. Can row-filtering feature be used to change it such that, when all the OLD.columns = NEW.columns, filter out
the row from being sent to the subscriber. I understand this would need REPLICA IDENTITY FULL to work, but would be an
improvement from the existing state.

On subscriber:

postgres=# select xmin, * from tab_rowfilter_1;
xmin | a | b
------+---+-------------
555 | 1 | unfiltered
(1 row)

On publisher:
postgres=# ALTER TABLE tab_rowfilter_1 REPLICA IDENTITY FULL;
ALTER TABLE
postgres=# update tab_rowfilter_1 SET b = 'unfiltered' where a = 1;
UPDATE 1

On Subscriber: The xmin has changed indicating the update from the publisher was applied
even though nothing changed.

postgres=# select xmin, * from tab_rowfilter_1;
xmin | a | b
------+---+-------------
556 | 1 | unfiltered
(1 row)

5. Currently, any existing rows that were not replicated, when updated to match the publication quals
using UPDATE tab SET pub_qual_column = 'not_filtered' where a = 1; won't be applied, as row
does not exist on the subscriber. It would be good if ALTER SUBSCRIBER REFRESH PUBLICATION
would help fetch such existing rows from publishers that match the qual now(either because the row changed
or the qual changed)

Thank you,

Rahila Syed

On Tue, Mar 9, 2021 at 8:35 PM Rahila Syed <rahilasyed90@gmail.com> wrote:

Hi Euler,

Please find some comments below:

1. If the where clause contains non-replica identity columns, the delete performed on a replicated row
using DELETE from pub_tab where repl_ident_col = n;
is not being replicated, as logical replication does not have any info whether the column has
to be filtered or not.
Shouldn't a warning be thrown in this case to notify the user that the delete is not replicated.

2. Same for update, even if I update a row to match the quals on publisher, it is still not being replicated to
the subscriber. (if the quals contain non-replica identity columns). I think for UPDATE at least, the new value
of the non-replicate identity column is available which can be used to filter and replicate the update.

3. 0001.patch,
Why is the name of the existing ExclusionWhereClause node being changed, if the exact same definition is being used?

For 0002.patch,
4. +
+ memset(lrel, 0, sizeof(LogicalRepRelation));

Is this needed, apart from the above, patch does not use or update lrel at all in that function.

5. PublicationRelationQual and PublicationTable have similar fields, can PublicationTable
be used in place of PublicationRelationQual instead of defining a new struct?

Thank you,
Rahila Syed

Re: row filtering for logical replication

От

"Euler Taveira"

Дата:

22 марта 2021 г., 01:58:14

On Tue, Mar 9, 2021, at 12:05 PM, Rahila Syed wrote:

Please find some comments below:

Thanks for your review.

1. If the where clause contains non-replica identity columns, the delete performed on a replicated row
using DELETE from pub_tab where repl_ident_col = n;
is not being replicated, as logical replication does not have any info whether the column has
to be filtered or not.
Shouldn't a warning be thrown in this case to notify the user that the delete is not replicated.

Isn't documentation enough? If you add a WARNING, it should be printed per row,

hence, a huge DELETE will flood the client with WARNING messages by default. If

you are thinking about LOG messages, it is a different story. However, we

should limit those messages to one per transaction. Even if we add such an aid,

it would impose a performance penalty while checking the DELETE is not

replicating because the row filter contains a column that is not part of the PK

or REPLICA IDENTITY. If I were to add any message, it would be to warn at the

creation time (CREATE PUBLICATION or ALTER PUBLICATION ... [ADD|SET] TABLE).

2. Same for update, even if I update a row to match the quals on publisher, it is still not being replicated to
the subscriber. (if the quals contain non-replica identity columns). I think for UPDATE at least, the new value
of the non-replicate identity column is available which can be used to filter and replicate the update.

Indeed, the row filter for UPDATE uses the new tuple. Maybe your non-replica

identity column contains NULL that evaluates the expression to false.

3. 0001.patch,
Why is the name of the existing ExclusionWhereClause node being changed, if the exact same definition is being used?

Because this node ExclusionWhereClause is used for exclusion constraint. This

patch renames the node to made it clear it is a generic node that could be used

for other filtering features in the future.

For 0002.patch,
4. +
+ memset(lrel, 0, sizeof(LogicalRepRelation));

Is this needed, apart from the above, patch does not use or update lrel at all in that function.

Good catch. It is a leftover from a previous patch. It will be fixed in the

next patch set.

5. PublicationRelationQual and PublicationTable have similar fields, can PublicationTable
be used in place of PublicationRelationQual instead of defining a new struct?

I don't think it is a good idea to have additional fields in a parse node. The

DDL commands use Relation (PublicationTableQual) and parse code uses RangeVar

(PublicationTable). publicationcmds.c uses Relation everywhere so I decided to

create a new struct to store Relation and qual as a list item. It also minimizes the places

you have to modify.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: row filtering for logical replication

От

"Euler Taveira"

Дата:

22 марта 2021 г., 02:15:10

On Thu, Mar 18, 2021, at 7:51 AM, Rahila Syed wrote:

1.
I think the docs are being incorrectly updated to add a column to pg_partitioned_table
instead of pg_publication_rel.

Good catch.

2. +typedef struct PublicationRelationQual
+{
+ Oid relid;
+ Relation relation;
+ Node *whereClause;
+} PublicationRelationQual;

Can this be given a more generic name like PublicationRelationInfo, so that the same struct
can be used to store additional relation information in future, for ex. column names, if column filtering is introduced.

Good idea. I rename it and it'll be in this next patch set.

3. Also, in the above structure, it seems that we can do with storing just relid and derive relation information from it
using table_open when needed. Am I missing something?

We need the Relation. See OpenTableList(). The way this code is organized, it

opens all publication tables and append each Relation to a list. This list is

used in PublicationAddTables() to update the catalog. I tried to minimize the

number of refactors while introducing this feature. We could probably revise

this code in the future (someone said in a previous discussion that it is weird

to open relations in one source code file -- publicationcmds.c -- and use it

into another one -- pg_publication.c).

4. Currently in logical replication, I noticed that an UPDATE is being applied on the subscriber even if the column values
are unchanged. Can row-filtering feature be used to change it such that, when all the OLD.columns = NEW.columns, filter out
the row from being sent to the subscriber. I understand this would need REPLICA IDENTITY FULL to work, but would be an
improvement from the existing state.

This is how Postgres works.

postgres=# create table foo (a integer, b integer);

CREATE TABLE

postgres=# insert into foo values(1, 100);

INSERT 0 1

postgres=# select ctid, xmin, xmax, a, b from foo;

ctid | xmin | xmax | a | b

-------+--------+------+---+-----

(0,1) | 488920 | 0 | 1 | 100

(1 row)

postgres=# update foo set b = 101 where a = 1;

UPDATE 1

postgres=# select ctid, xmin, xmax, a, b from foo;

ctid | xmin | xmax | a | b

-------+--------+------+---+-----

(0,2) | 488921 | 0 | 1 | 101

(1 row)

postgres=# update foo set b = 101 where a = 1;

UPDATE 1

postgres=# select ctid, xmin, xmax, a, b from foo;

ctid | xmin | xmax | a | b

-------+--------+------+---+-----

(0,3) | 488922 | 0 | 1 | 101

(1 row)

You could probably abuse this feature and skip some UPDATEs when old tuple is

identical to new tuple. The question is: why would someone issue the same

command multiple times? A broken application? I would say: don't do it. Besides

that, this feature could impose an overhead into a code path that already

consume substantial CPU time. I've seen some tables with RIF and dozens of

columns that would certainly contribute to increase the replication lag.

5. Currently, any existing rows that were not replicated, when updated to match the publication quals
using UPDATE tab SET pub_qual_column = 'not_filtered' where a = 1; won't be applied, as row
does not exist on the subscriber. It would be good if ALTER SUBSCRIBER REFRESH PUBLICATION
would help fetch such existing rows from publishers that match the qual now(either because the row changed
or the qual changed)

I see. This should be addressed by a resynchronize feature. Such option is

useful when you have to change the row filter. It should certainly be implement

as an ALTER SUBSCRIPTION subcommand.

I attached a new patch set that addresses:

* fix documentation;

* rename PublicationRelationQual to PublicationRelationInfo;

* remove the memset that was leftover from a previous patch set;

* add new tests to improve coverage (INSERT/UPDATE/DELETE to exercise the row

filter code).

Euler Taveira

EDB https://www.enterprisedb.com/

Вложения

Re: row filtering for logical replication

От

Peter Eisentraut

Дата:

25 марта 2021 г., 11:15:56

On 22.03.21 03:15, Euler Taveira wrote:
> I attached a new patch set that addresses:
> 
> * fix documentation;
> * rename PublicationRelationQual to PublicationRelationInfo;
> * remove the memset that was leftover from a previous patch set;
> * add new tests to improve coverage (INSERT/UPDATE/DELETE to exercise 
> the row
>    filter code).

I have committed the 0001 patch.

Attached are a few fixup patches that I recommend you integrate into 
your patch set.  They address backward compatibility with PG13, and a 
few more stylistic issues.

I suggest you combine your 0002, 0003, and 0004 patches into one.  They 
can't be used separately, and for example the psql changes in patch 0003 
already appear as regression test output changes in 0002, so this 
arrangement isn't useful.  (0005 can be kept separately, since it's 
mostly for debugging right now.)

Вложения

Re: row filtering for logical replication

От

"Euler Taveira"

Дата:

29 марта 2021 г., 01:23:28

On Thu, Mar 25, 2021, at 8:15 AM, Peter Eisentraut wrote:

I have committed the 0001 patch.

Attached are a few fixup patches that I recommend you integrate into
your patch set. They address backward compatibility with PG13, and a
few more stylistic issues.

I suggest you combine your 0002, 0003, and 0004 patches into one. They
can't be used separately, and for example the psql changes in patch 0003
already appear as regression test output changes in 0002, so this
arrangement isn't useful. (0005 can be kept separately, since it's
mostly for debugging right now.)

I appreciate your work on it. I split into psql and pg_dump support just

because it was developed after the main patch. I expect them to be combined

into the main patch (0002) before committing it. This new patch set integrates

them into the main patch.

I totally forgot about the backward compatibility support. Good catch. While

inspecting the code again, I did a small fix into the psql support. I added an

else as shown below so the query always returns the same number of columns and

we don't possibly have an issue while using a column number that is out of

range in PQgetisnull() a few lines later.

if (pset.sversion >= 140000)

appendPQExpBuffer(&buf,

", pg_get_expr(pr.prqual, c.oid)");

else

appendPQExpBuffer(&buf,

", NULL");

While testing the replication between v14 -> v10, I realized that even if the

tables in the publication have row filters, the data synchronization code won't

evaluate the row filter expressions. That's because the subscriber (v10) is

responsible to assemble the COPY command (possibly adding row filters) for data

synchronization and there is no such code in released versions. I added a new

sentence into copy_data parameter saying that row filters won't be used if

version is prior than 14. I also include this info into the commit message.

At this time, I didn't include the patch that changes the log_min_messages in

the row filter regression test. It was part of this patch set for testing

purposes only.

I don't expect the patch that measures row filter performance to be included

but I'm including it again in case someone wants to inspect the performance

numbers.

Euler Taveira

EDB https://www.enterprisedb.com/

Вложения

Re: row filtering for logical replication

От

Rahila Syed

Дата:

29 марта 2021 г., 09:45:40

Hi Euler,

While running some tests on v13 patches, I noticed that, in case the published table data
already exists on the subscriber database before creating the subscription, at the time of

CREATE subscription/table synchronization, an error as seen as follows

With the patch:

2021-03-29 14:32:56.265 IST [78467] STATEMENT: CREATE_REPLICATION_SLOT "pg_16406_sync_16390_6944995860755251708" LOGICAL pgoutput USE_SNAPSHOT
2021-03-29 14:32:56.279 IST [78467] LOG: could not send data to client: Broken pipe
2021-03-29 14:32:56.279 IST [78467] STATEMENT: COPY (SELECT aid, bid, abalance, filler FROM public.pgbench_accounts WHERE (aid > 0)) TO STDOUT
2021-03-29 14:32:56.279 IST [78467] FATAL: connection to client lost
2021-03-29 14:32:56.279 IST [78467] STATEMENT: COPY (SELECT aid, bid, abalance, filler FROM public.pgbench_accounts WHERE (aid > 0)) TO STDOUT
2021-03-29 14:33:01.302 IST [78470] LOG: logical decoding found consistent point at 0/4E2B8460
2021-03-29 14:33:01.302 IST [78470] DETAIL: There are no running transactions.

Without the patch:

2021-03-29 15:05:01.581 IST [79029] ERROR: duplicate key value violates unique constraint "pgbench_branches_pkey"
2021-03-29 15:05:01.581 IST [79029] DETAIL: Key (bid)=(1) already exists.
2021-03-29 15:05:01.581 IST [79029] CONTEXT: COPY pgbench_branches, line 1
2021-03-29 15:05:01.583 IST [78538] LOG: background worker "logical replication worker" (PID 79029) exited with exit code 1
2021-03-29 15:05:06.593 IST [79031] LOG: logical replication table synchronization worker for subscription "test_sub2", table "pgbench_branches" has started

Without the patch the COPY command throws an ERROR, but with the patch, a similar scenario results in client connection being lost.

I didn't investigate it more, but looks like we should maintain the existing behaviour when table synchronization fails
due to duplicate data.

Thank you,
Rahila Syed

Re: row filtering for logical replication

От

"Euler Taveira"

Дата:

29 марта 2021 г., 13:16:44

On Mon, Mar 29, 2021, at 6:45 AM, Rahila Syed wrote:

While running some tests on v13 patches, I noticed that, in case the published table data
already exists on the subscriber database before creating the subscription, at the time of
CREATE subscription/table synchronization, an error as seen as follows

With the patch:

2021-03-29 14:32:56.265 IST [78467] STATEMENT: CREATE_REPLICATION_SLOT "pg_16406_sync_16390_6944995860755251708" LOGICAL pgoutput USE_SNAPSHOT
2021-03-29 14:32:56.279 IST [78467] LOG: could not send data to client: Broken pipe
2021-03-29 14:32:56.279 IST [78467] STATEMENT: COPY (SELECT aid, bid, abalance, filler FROM public.pgbench_accounts WHERE (aid > 0)) TO STDOUT
2021-03-29 14:32:56.279 IST [78467] FATAL: connection to client lost
2021-03-29 14:32:56.279 IST [78467] STATEMENT: COPY (SELECT aid, bid, abalance, filler FROM public.pgbench_accounts WHERE (aid > 0)) TO STDOUT
2021-03-29 14:33:01.302 IST [78470] LOG: logical decoding found consistent point at 0/4E2B8460
2021-03-29 14:33:01.302 IST [78470] DETAIL: There are no running transactions.

Rahila, I tried to reproduce this issue with the attached script but no luck. I always get

Without the patch:

2021-03-29 15:05:01.581 IST [79029] ERROR: duplicate key value violates unique constraint "pgbench_branches_pkey"
2021-03-29 15:05:01.581 IST [79029] DETAIL: Key (bid)=(1) already exists.
2021-03-29 15:05:01.581 IST [79029] CONTEXT: COPY pgbench_branches, line 1
2021-03-29 15:05:01.583 IST [78538] LOG: background worker "logical replication worker" (PID 79029) exited with exit code 1
2021-03-29 15:05:06.593 IST [79031] LOG: logical replication table synchronization worker for subscription "test_sub2", table "pgbench_branches" has started

... this message. The code that reports this error is from the COPY command.

Row filter modifications has no control over it. It seems somehow your

subscriber close the replication connection causing this issue. Can you

reproduce it consistently? If so, please share your steps.

Euler Taveira

EDB https://www.enterprisedb.com/

Вложения

test-row-filter-pgbench.sh

Re: row filtering for logical replication

От

Rahila Syed

Дата:

29 марта 2021 г., 17:46:45

Hi,

While running some tests on v13 patches, I noticed that, in case the published table data
already exists on the subscriber database before creating the subscription, at the time of
CREATE subscription/table synchronization, an error as seen as follows

With the patch:

2021-03-29 14:32:56.265 IST [78467] STATEMENT: CREATE_REPLICATION_SLOT "pg_16406_sync_16390_6944995860755251708" LOGICAL pgoutput USE_SNAPSHOT
2021-03-29 14:32:56.279 IST [78467] LOG: could not send data to client: Broken pipe
2021-03-29 14:32:56.279 IST [78467] STATEMENT: COPY (SELECT aid, bid, abalance, filler FROM public.pgbench_accounts WHERE (aid > 0)) TO STDOUT
2021-03-29 14:32:56.279 IST [78467] FATAL: connection to client lost
2021-03-29 14:32:56.279 IST [78467] STATEMENT: COPY (SELECT aid, bid, abalance, filler FROM public.pgbench_accounts WHERE (aid > 0)) TO STDOUT
2021-03-29 14:33:01.302 IST [78470] LOG: logical decoding found consistent point at 0/4E2B8460
2021-03-29 14:33:01.302 IST [78470] DETAIL: There are no running transactions.
Rahila, I tried to reproduce this issue with the attached script but no luck. I always get

OK, Sorry for confusion. Actually both the errors are happening on different servers. *Broken pipe* error on publisher and
the following error on subscriber end. And the behaviour is consistent with or without row filtering.

Without the patch:

2021-03-29 15:05:01.581 IST [79029] ERROR: duplicate key value violates unique constraint "pgbench_branches_pkey"
2021-03-29 15:05:01.581 IST [79029] DETAIL: Key (bid)=(1) already exists.
2021-03-29 15:05:01.581 IST [79029] CONTEXT: COPY pgbench_branches, line 1
2021-03-29 15:05:01.583 IST [78538] LOG: background worker "logical replication worker" (PID 79029) exited with exit code 1
2021-03-29 15:05:06.593 IST [79031] LOG: logical replication table synchronization worker for subscription "test_sub2", table "pgbench_branches" has started
... this message. The code that reports this error is from the COPY command.
Row filter modifications has no control over it. It seems somehow your
subscriber close the replication connection causing this issue. Can you
reproduce it consistently? If so, please share your steps.

Please ignore the report.

Thank you,

Rahila Syed

Re: row filtering for logical replication

От

Amit Kapila

Дата:

30 марта 2021 г., 11:23:56

On Mon, Mar 29, 2021 at 6:47 PM Euler Taveira <euler@eulerto.com> wrote:
>
Few comments:
==============
1. How can we specify row filters for multiple tables for a
publication? Consider a case as below:
postgres=# CREATE TABLE tab_rowfilter_1 (a int primary key, b text);
CREATE TABLE
postgres=# CREATE TABLE tab_rowfilter_2 (c int primary key);
CREATE TABLE

postgres=# CREATE PUBLICATION tap_pub_1 FOR TABLE tab_rowfilter_1,
tab_rowfilter_2 WHERE (a > 1000 AND b <> 'filtered');
ERROR:  column "a" does not exist
LINE 1: ...FOR TABLE tab_rowfilter_1, tab_rowfilter_2 WHERE (a > 1000 A...

                                                             ^

postgres=# CREATE PUBLICATION tap_pub_1 FOR TABLE tab_rowfilter_1,
tab_rowfilter_2  WHERE (c > 1000);
CREATE PUBLICATION

It gives an error when I tried to specify the columns corresponding to
the first relation but is fine for columns for the second relation.
Then, I tried few more combinations like below but that didn't work.
CREATE PUBLICATION tap_pub_1 FOR TABLE tab_rowfilter_1 As t1,
tab_rowfilter_2 As t2 WHERE (t1.a > 1000 AND t1.b <> 'filtered');

Will users be allowed to specify join conditions among columns from
multiple tables?

2.
+ /*
+ * Although ALTER PUBLICATION grammar allows WHERE clause to be specified
+ * for DROP TABLE action, it doesn't make sense to allow it. We implement
+ * this restriction here, instead of complicating the grammar to enforce
+ * it.
+ */
+ if (stmt->tableAction == DEFELEM_DROP)
+ {
+ ListCell   *lc;
+
+ foreach(lc, stmt->tables)
+ {
+ PublicationTable *t = lfirst(lc);
+
+ if (t->whereClause)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot use a WHERE clause when removing table from
publication \"%s\"",
+ NameStr(pubform->pubname))));
+ }
+ }

Is there a reason to deal with this here separately rather than in the
ALTER PUBLICATION grammar?


-- 
With Regards,
Amit Kapila.

Re: row filtering for logical replication

От

"Euler Taveira"

Дата:

31 марта 2021 г., 01:45:55

On Tue, Mar 30, 2021, at 8:23 AM, Amit Kapila wrote:

On Mon, Mar 29, 2021 at 6:47 PM Euler Taveira <euler@eulerto.com> wrote:
>
Few comments:
==============
1. How can we specify row filters for multiple tables for a
publication? Consider a case as below:

It is not possible. Row filter is a per table option. Isn't it clear from the

synopsis? The current design allows different row filter for tables in the same

publication. It is more flexible than a single row filter for a set of tables

(even if we would support such variant, there are some cases where the

condition should be different because the column names are not the same). You

can easily build a CREATE PUBLICATION command that adds the same row filter

multiple times using a DO block or use a similar approach in your favorite

language.

postgres=# CREATE TABLE tab_rowfilter_1 (a int primary key, b text);
CREATE TABLE
postgres=# CREATE TABLE tab_rowfilter_2 (c int primary key);
CREATE TABLE

postgres=# CREATE PUBLICATION tap_pub_1 FOR TABLE tab_rowfilter_1,
tab_rowfilter_2 WHERE (a > 1000 AND b <> 'filtered');
ERROR: column "a" does not exist
LINE 1: ...FOR TABLE tab_rowfilter_1, tab_rowfilter_2 WHERE (a > 1000 A...

^

postgres=# CREATE PUBLICATION tap_pub_1 FOR TABLE tab_rowfilter_1,
tab_rowfilter_2 WHERE (c > 1000);
CREATE PUBLICATION

It gives an error when I tried to specify the columns corresponding to
the first relation but is fine for columns for the second relation.
Then, I tried few more combinations like below but that didn't work.
CREATE PUBLICATION tap_pub_1 FOR TABLE tab_rowfilter_1 As t1,
tab_rowfilter_2 As t2 WHERE (t1.a > 1000 AND t1.b <> 'filtered');

Will users be allowed to specify join conditions among columns from
multiple tables?

It seems you are envisioning row filter as a publication property instead of a

publication-relation property. Due to the flexibility that the later approach

provides, I decided to use it because it covers more use cases. Regarding

allowing joins, it could possibly slow down a critical path, no? This code path

is executed by every change. If there are interest in the join support, we

might add it in a future patch.

2.
+ /*
+ * Although ALTER PUBLICATION grammar allows WHERE clause to be specified
+ * for DROP TABLE action, it doesn't make sense to allow it. We implement
+ * this restriction here, instead of complicating the grammar to enforce
+ * it.
+ */
+ if (stmt->tableAction == DEFELEM_DROP)
+ {
+ ListCell *lc;
+
+ foreach(lc, stmt->tables)
+ {
+ PublicationTable *t = lfirst(lc);
+
+ if (t->whereClause)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot use a WHERE clause when removing table from
publication \"%s\"",
+ NameStr(pubform->pubname))));
+ }
+ }

Is there a reason to deal with this here separately rather than in the
ALTER PUBLICATION grammar?

Good question. IIRC the issue is that AlterPublicationStmt->tables has a list

element that was a relation_expr_list and was converted to

publication_table_list. If we share 'tables' with relation_expr_list (for ALTER

PUBLICATION ... DROP TABLE) and publication_table_list (for the other ALTER

PUBLICATION ... ADD|SET TABLE), the OpenTableList() has to know what list

element it is dealing with. I think I came to the conclusion that it is less

uglier to avoid changing OpenTableList() and CloseTableList().

[Doing some experimentation...]

Here is a patch that remove the referred code. It uses 2 distinct list

elements: relation_expr_list for ALTER PUBLICATION ... DROP TABLE and

publication_table_list for for ALTER PUBLICATION ... ADD|SET TABLE. A new

parameter was introduced to deal with the different elements of the list

'tables'.

Euler Taveira

EDB https://www.enterprisedb.com/

Вложения

0001-Row-filter-for-logical-replication.patch

Re: row filtering for logical replication

От

Amit Kapila

Дата:

31 марта 2021 г., 12:46:44

On Wed, Mar 31, 2021 at 7:17 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Tue, Mar 30, 2021, at 8:23 AM, Amit Kapila wrote:
>
> On Mon, Mar 29, 2021 at 6:47 PM Euler Taveira <euler@eulerto.com> wrote:
> >
> Few comments:
> ==============
> 1. How can we specify row filters for multiple tables for a
> publication? Consider a case as below:
>
> It is not possible. Row filter is a per table option. Isn't it clear from the
> synopsis?
>

Sorry, it seems I didn't read it properly earlier, now I got it.

>
> 2.
> + /*
> + * Although ALTER PUBLICATION grammar allows WHERE clause to be specified
> + * for DROP TABLE action, it doesn't make sense to allow it. We implement
> + * this restriction here, instead of complicating the grammar to enforce
> + * it.
> + */
> + if (stmt->tableAction == DEFELEM_DROP)
> + {
> + ListCell   *lc;
> +
> + foreach(lc, stmt->tables)
> + {
> + PublicationTable *t = lfirst(lc);
> +
> + if (t->whereClause)
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("cannot use a WHERE clause when removing table from
> publication \"%s\"",
> + NameStr(pubform->pubname))));
> + }
> + }
>
> Is there a reason to deal with this here separately rather than in the
> ALTER PUBLICATION grammar?
>
> Good question. IIRC the issue is that AlterPublicationStmt->tables has a list
> element that was a relation_expr_list and was converted to
> publication_table_list. If we share 'tables' with relation_expr_list (for ALTER
> PUBLICATION ... DROP TABLE) and publication_table_list (for the other ALTER
> PUBLICATION ... ADD|SET TABLE), the OpenTableList() has to know what list
> element it is dealing with. I think I came to the conclusion that it is less
> uglier to avoid changing OpenTableList() and CloseTableList().
>
> [Doing some experimentation...]
>
> Here is a patch that remove the referred code.
>

Thanks, few more comments:
1. In pgoutput_change, we are always sending schema even though we
don't send actual data because of row filters. It may not be a problem
in many cases but I guess for some odd cases we can avoid sending
extra information.

2. In get_rel_sync_entry(), we are caching the qual for rel_sync_entry
even though we won't publish it which seems unnecessary?

3.
@@ -1193,5 +1365,11 @@ rel_sync_cache_publication_cb(Datum arg, int
cacheid, uint32 hashvalue)
  entry->pubactions.pubupdate = false;
  entry->pubactions.pubdelete = false;
  entry->pubactions.pubtruncate = false;
+
+ if (entry->qual != NIL)
+ list_free_deep(entry->qual);

Seeing one previous comment in this thread [1], I am wondering if
list_free_deep is enough here?

4. Can we write explicitly in the docs that row filters won't apply
for Truncate operation?

5. Getting some whitespace errors:
git am /d/PostgreSQL/Patches/logical_replication/row_filter/v14-0001-Row-filter-for-logical-replication.patch
.git/rebase-apply/patch:487: trailing whitespace.

warning: 1 line adds whitespace errors.
Applying: Row filter for logical replication


[1] - https://www.postgresql.org/message-id/20181123161933.jpepibtyayflz2xg%40alvherre.pgsql

-- 
With Regards,
Amit Kapila.

Re: row filtering for logical replication

От

Andres Freund

Дата:

31 марта 2021 г., 19:17:10

Hi,

As far as I can tell you have not *AT ALL* addressed that it is *NOT
SAFE* to evaluate arbitrary expressions from within an output
plugin. Despite that having been brought up multiple times.


> +static ExprState *
> +pgoutput_row_filter_prepare_expr(Node *rfnode, EState *estate)
> +{
> +    ExprState  *exprstate;
> +    Oid            exprtype;
> +    Expr       *expr;
> +
> +    /* Prepare expression for execution */
> +    exprtype = exprType(rfnode);
> +    expr = (Expr *) coerce_to_target_type(NULL, rfnode, exprtype, BOOLOID, -1, COERCION_ASSIGNMENT,
COERCE_IMPLICIT_CAST,-1);
 
> +
> +    if (expr == NULL)
> +        ereport(ERROR,
> +                (errcode(ERRCODE_CANNOT_COERCE),
> +                 errmsg("row filter returns type %s that cannot be coerced to the expected type %s",
> +                        format_type_be(exprtype),
> +                        format_type_be(BOOLOID)),
> +                 errhint("You will need to rewrite the row filter.")));
> +
> +    exprstate = ExecPrepareExpr(expr, estate);
> +
> +    return exprstate;
> +}
> +
> +/*
> + * Evaluates row filter.
> + *
> + * If the row filter evaluates to NULL, it is taken as false i.e. the change
> + * isn't replicated.
> + */
> +static inline bool
> +pgoutput_row_filter_exec_expr(ExprState *state, ExprContext *econtext)
> +{
> +    Datum        ret;
> +    bool        isnull;
> +
> +    Assert(state != NULL);
> +
> +    ret = ExecEvalExprSwitchContext(state, econtext, &isnull);
> +
> +    elog(DEBUG3, "row filter evaluates to %s (isnull: %s)",
> +         DatumGetBool(ret) ? "true" : "false",
> +         isnull ? "true" : "false");
> +
> +    if (isnull)
> +        return false;
> +
> +    return DatumGetBool(ret);
> +}

> +/*
> + * Change is checked against the row filter, if any.
> + *
> + * If it returns true, the change is replicated, otherwise, it is not.
> + */
> +static bool
> +pgoutput_row_filter(Relation relation, HeapTuple oldtuple, HeapTuple newtuple, List *rowfilter)
> +{
> +    TupleDesc    tupdesc;
> +    EState       *estate;
> +    ExprContext *ecxt;
> +    MemoryContext oldcxt;
> +    ListCell   *lc;
> +    bool        result = true;
> +
> +    /* Bail out if there is no row filter */
> +    if (rowfilter == NIL)
> +        return true;
> +
> +    elog(DEBUG3, "table \"%s.%s\" has row filter",
> +         get_namespace_name(get_rel_namespace(RelationGetRelid(relation))),
> +         get_rel_name(relation->rd_id));
> +
> +    tupdesc = RelationGetDescr(relation);
> +
> +    estate = create_estate_for_relation(relation);
> +
> +    /* Prepare context per tuple */
> +    ecxt = GetPerTupleExprContext(estate);
> +    oldcxt = MemoryContextSwitchTo(estate->es_query_cxt);
> +    ecxt->ecxt_scantuple = ExecInitExtraTupleSlot(estate, tupdesc, &TTSOpsHeapTuple);
> +    MemoryContextSwitchTo(oldcxt);
> +
> +    ExecStoreHeapTuple(newtuple ? newtuple : oldtuple, ecxt->ecxt_scantuple, false);
> +    /*
> +     * If the subscription has multiple publications and the same table has a
> +     * different row filter in these publications, all row filters must be
> +     * matched in order to replicate this change.
> +     */
> +    foreach(lc, rowfilter)
> +    {
> +        Node       *rfnode = (Node *) lfirst(lc);
> +        ExprState  *exprstate;
> +
> +        /* Prepare for expression execution */
> +        exprstate = pgoutput_row_filter_prepare_expr(rfnode, estate);
> +
> +        /* Evaluates row filter */
> +        result = pgoutput_row_filter_exec_expr(exprstate, ecxt);

Also, this still seems like an *extremely* expensive thing to do for
each tuple. It'll often be *vastly* faster to just send the data than to
the other side.

This just cannot be done once per tuple. It has to be cached.

I don't see how these issues can be addressed in the next 7 days,
therefore I think this unfortunately needs to be marked as returned with
feedback.

Greetings,

Andres Freund

Re: row filtering for logical replication

От

Peter Smith

Дата:

10 мая 2021 г., 08:19:42

On Wed, Mar 31, 2021 at 12:47 PM Euler Taveira <euler@eulerto.com> wrote:
>
....

> Good question. IIRC the issue is that AlterPublicationStmt->tables has a list
> element that was a relation_expr_list and was converted to
> publication_table_list. If we share 'tables' with relation_expr_list (for ALTER
> PUBLICATION ... DROP TABLE) and publication_table_list (for the other ALTER
> PUBLICATION ... ADD|SET TABLE), the OpenTableList() has to know what list
> element it is dealing with. I think I came to the conclusion that it is less
> uglier to avoid changing OpenTableList() and CloseTableList().
>
> [Doing some experimentation...]
>
> Here is a patch that remove the referred code. It uses 2 distinct list
> elements: relation_expr_list for ALTER PUBLICATION ... DROP TABLE and
> publication_table_list for for ALTER PUBLICATION ... ADD|SET TABLE. A new
> parameter was introduced to deal with the different elements of the list
> 'tables'.

AFAIK this is the latest patch available, but FYI it no longer applies
cleanly on HEAD.

git apply ../patches_misc/0001-Row-filter-for-logical-replication.patch
../patches_misc/0001-Row-filter-for-logical-replication.patch:518:
trailing whitespace.
error: patch failed: src/backend/parser/gram.y:426
error: src/backend/parser/gram.y: patch does not apply
error: patch failed: src/backend/replication/logical/worker.c:340
error: src/backend/replication/logical/worker.c: patch does not apply

--------
Kind Regards,
Peter Smith.
Fujitsu Australia.

Re: row filtering for logical replication

От

"Euler Taveira"

Дата:

10 мая 2021 г., 13:41:24

On Mon, May 10, 2021, at 5:19 AM, Peter Smith wrote:

AFAIK this is the latest patch available, but FYI it no longer applies
cleanly on HEAD.

Peter, the last patch is broken since f3b141c4825. I'm still working on it for

the next CF. I already addressed the points suggested by Amit in his last

review; however, I'm still working on a cache for evaluating expression as

suggested by Andres. I hope to post a new patch soon.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: row filtering for logical replication

От

Peter Smith

Дата:

09 июня 2021 г., 00:02:52

On Mon, May 10, 2021 at 11:42 PM Euler Taveira <euler@eulerto.com> wrote:
>
> On Mon, May 10, 2021, at 5:19 AM, Peter Smith wrote:
>
> AFAIK this is the latest patch available, but FYI it no longer applies
> cleanly on HEAD.
>
> Peter, the last patch is broken since f3b141c4825. I'm still working on it for
> the next CF. I already addressed the points suggested by Amit in his last
> review; however, I'm still working on a cache for evaluating expression as
> suggested by Andres. I hope to post a new patch soon.

Is there any ETA for your new patch?

In the interim can you rebase the old patch just so it builds and I can try it?

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: row filtering for logical replication

От

Amit Kapila

Дата:

18 июня 2021 г., 11:40:38

On Wed, Jun 9, 2021 at 5:33 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Mon, May 10, 2021 at 11:42 PM Euler Taveira <euler@eulerto.com> wrote:
> >
> > On Mon, May 10, 2021, at 5:19 AM, Peter Smith wrote:
> >
> > AFAIK this is the latest patch available, but FYI it no longer applies
> > cleanly on HEAD.
> >
> > Peter, the last patch is broken since f3b141c4825. I'm still working on it for
> > the next CF. I already addressed the points suggested by Amit in his last
> > review; however, I'm still working on a cache for evaluating expression as
> > suggested by Andres. I hope to post a new patch soon.
>
> Is there any ETA for your new patch?
>
> In the interim can you rebase the old patch just so it builds and I can try it?
>

I have rebased the patch so that you can try it out. The main thing I
have done is to remove changes in worker.c and created a specialized
function to create estate for pgoutput.c as I don't think we need what
is done in worker.c.

Euler, do let me know if you are not happy with the change in pgoutput.c?

-- 
With Regards,
Amit Kapila.

Вложения

v15-0001-Row-filter-for-logical-replication.patch

Re: row filtering for logical replication

От

Peter Smith

Дата:

22 июня 2021 г., 02:38:47

On Fri, Jun 18, 2021 at 9:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
[...]
> I have rebased the patch so that you can try it out. The main thing I
> have done is to remove changes in worker.c and created a specialized
> function to create estate for pgoutput.c as I don't think we need what
> is done in worker.c.

Thanks for the recent rebase.

- The v15 patch applies OK (albeit with whitespace warning)
- make check is passing OK
- the new TAP tests 020_row_filter is passing OK.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: row filtering for logical replication

От

"Euler Taveira"

Дата:

01 июля 2021 г., 00:42:58

On Fri, Jun 18, 2021, at 8:40 AM, Amit Kapila wrote:

I have rebased the patch so that you can try it out. The main thing I
have done is to remove changes in worker.c and created a specialized
function to create estate for pgoutput.c as I don't think we need what
is done in worker.c.

Euler, do let me know if you are not happy with the change in pgoutput.c?

Amit, thanks for rebasing this patch. I already had a similar rebased patch in

my local tree. A recent patch broke your version v15 so I rebased it.

I like the idea of a simple create_estate_for_relation() function (I fixed an

oversight regarding GetCurrentCommandId(false) because it is used only for

read-only purposes). This patch also replaces all references to version 14.

Commit ef948050 made some changes in the snapshot handling. Set the current

active snapshot might not be required but future changes to allow functions

will need it.

As the previous patches, it includes commits (0002 and 0003) that are not

intended to be committed. They are available for test-only purposes.

Euler Taveira

EDB https://www.enterprisedb.com/

Вложения

Re: row filtering for logical replication

От

Peter Smith

Дата:

02 июля 2021 г., 07:29:59

Hi.

I have been looking at the latest patch set (v16). Below are my review
comments and some patches.

The patches are:
v16-0001. This is identical to your previously posted 0001 patch. (I
only attached it again hoping it can allow the cfbot to keep working
OK).
v16-0002,0003. These are for demonstrating some of the review comments
v16-0004. This is a POC plan cache for your consideration.

//////////

REVIEW COMMENTS
===============

1. Patch 0001 comment - typo

you can optionally filter rows that does not satisfy a WHERE condition

typo: does/does

~~

2. Patch 0001 comment - typo

The WHERE clause should probably contain only columns that are part of
the primary key or that are covered by REPLICA IDENTITY. Otherwise,
and DELETEs won't be replicated.

typo: "Otherwise, and DELETEs" ??

~~

3. Patch 0001 comment - typo and clarification

If your publication contains partitioned table, the parameter
publish_via_partition_root determines if it uses the partition row filter (if
the parameter is false -- default) or the partitioned table row filter.

Typo: "contains partitioned table" -> "contains a partitioned table"

Also, perhaps the text "or the partitioned table row filter." should
say "or the root partitioned table row filter." to disambiguate the
case where there are more levels of partitions like A->B->C. e.g. What
filter does C use?

~~

4. src/backend/catalog/pg_publication.c - misleading names

-publication_add_relation(Oid pubid, Relation targetrel,
+publication_add_relation(Oid pubid, PublicationRelationInfo *targetrel,
  bool if_not_exists)

Leaving this parameter name as "targetrel" seems a bit misleading now
in the function code. Maybe this should be called something like "pri"
which is consistent with other places where you have declared
PublicationRelationInfo.

Also, consider declaring some local variables so that the patch may
have less impact on existing code. e.g.
Oid relid = pri->relid
Relation *targetrel = relationinfo->relation

~~

5. src/backend/commands/publicationcmds.c - simplify code

- rels = OpenTableList(stmt->tables);
+ if (stmt->tableAction == DEFELEM_DROP)
+ rels = OpenTableList(stmt->tables, true);
+ else
+ rels = OpenTableList(stmt->tables, false);

Consider writing that code more simply as just:

rels = OpenTableList(stmt->tables, stmt->tableAction == DEFELEM_DROP);

~~

6. src/backend/commands/publicationcmds.c - bug?

- CloseTableList(rels);
+ CloseTableList(rels, false);
 }

Is this a potential bug? When you called OpenTableList the 2nd param
was maybe true/false, so is it correct to be unconditionally false
here? I am not sure.

~~

7. src/backend/commands/publicationcmds.c - OpenTableList function comment.

  * Open relations specified by a RangeVar list.
+ * AlterPublicationStmt->tables has a different list element, hence, is_drop
+ * indicates if it has a RangeVar (true) or PublicationTable (false).
  * The returned tables are locked in ShareUpdateExclusiveLock mode in order to
  * add them to a publication.

I am not sure about this. Should that comment instead say "indicates
if it has a Relation (true) or PublicationTable (false)"?

~~

8. src/backend/commands/publicationcmds.c - OpenTableList

- RangeVar   *rv = castNode(RangeVar, lfirst(lc));
- bool recurse = rv->inh;
+ PublicationTable *t = NULL;
+ RangeVar   *rv;
+ bool recurse;
  Relation rel;
  Oid myrelid;

+ if (is_drop)
+ {
+ rv = castNode(RangeVar, lfirst(lc));
+ }
+ else
+ {
+ t = lfirst(lc);
+ rv = castNode(RangeVar, t->relation);
+ }
+
+ recurse = rv->inh;
+

For some reason it feels kind of clunky to me for this function to be
processing the list differently according to the 2nd param. e.g. the
name "is_drop" seems quite unrelated to the function code, and more to
do with where it was called from. Sorry, I don't have any better ideas
for improvement atm.

~~

9. src/backend/commands/publicationcmds.c - OpenTableList bug?

- rels = lappend(rels, rel);
+ pri = palloc(sizeof(PublicationRelationInfo));
+ pri->relid = myrelid;
+ pri->relation = rel;
+ if (!is_drop)
+ pri->whereClause = t->whereClause;
+ rels = lappend(rels, pri);

I felt maybe this is a possible bug here because there seems no code
explicitly assigning the whereClause = NULL  if "is_drop" is true so
maybe it can have a garbage value which could cause problems later.
Maybe this is fixed by using palloc0.

Same thing is 2x in this function.

~~

10. src/backend/commands/publicationcmds.c - CloseTableList function comment

@@ -587,16 +609,28 @@ OpenTableList(List *tables)
  * Close all relations in the list.
  */
 static void
-CloseTableList(List *rels)
+CloseTableList(List *rels, bool is_drop)
 {

Probably the meaning of "is_drop" should be described in this function comment.

~~

11. src/backend/replication/pgoutput/pgoutput.c - get_rel_sync_entry signature.

-static RelationSyncEntry *get_rel_sync_entry(PGOutputData *data, Oid relid);
+static RelationSyncEntry *get_rel_sync_entry(PGOutputData *data, Relation rel);

I see that this function signature is modified but I did not see how
this parameter refactoring is actually related to the RowFilter patch.
Perhaps I am mistaken, but IIUC this only changes the relid =
RelationGetRelid(rel); to be done inside this function instead of
being done outside by the callers.

It impacts other code like in pgoutput_truncate:

@@ -689,12 +865,11 @@ pgoutput_truncate(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
  for (i = 0; i < nrelations; i++)
  {
  Relation relation = relations[i];
- Oid relid = RelationGetRelid(relation);

  if (!is_publishable_relation(relation))
  continue;

- relentry = get_rel_sync_entry(data, relid);
+ relentry = get_rel_sync_entry(data, relation);

  if (!relentry->pubactions.pubtruncate)
  continue;
@@ -704,10 +879,10 @@ pgoutput_truncate(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
  * root tables through it.
  */
  if (relation->rd_rel->relispartition &&
- relentry->publish_as_relid != relid)
+ relentry->publish_as_relid != relentry->relid)
  continue;

- relids[nrelids++] = relid;
+ relids[nrelids++] = relentry->relid;
  maybe_send_schema(ctx, txn, change, relation, relentry);
  }
So maybe this is a good refactor or maybe not, but I felt this should
not be included as part of the RowFilter patch unless it is really
necessary.

~~

12. src/backend/replication/pgoutput/pgoutput.c - missing function comments

The static functions create_estate_for_relation and
pgoutput_row_filter_prepare_expr probably should be commented.

~~

13. src/backend/replication/pgoutput/pgoutput.c -
pgoutput_row_filter_prepare_expr function name

+static ExprState *pgoutput_row_filter_prepare_expr(Node *rfnode,
EState *estate);

This function has an unfortunate name with the word "prepare" in it. I
wonder if a different name can be found for this function to avoid any
confusion with pgoutput functions (coming soon) which are related to
the two-phase commit "prepare".

~~

14. src/bin/psql/describe.c

+ if (!PQgetisnull(tabres, j, 2))
+ appendPQExpBuffer(&buf, " WHERE (%s)",
+   PQgetvalue(tabres, j, 2));

Because the where-clause value already has enclosing parentheses so
using " WHERE (%s)" seems overkill here. e.g. you can see the effect
in your src/test/regress/expected/publication.out file. I think this
should be changed to " WHERE %s" to give better output.

~~

15. src/include/catalog/pg_publication.h - new typedef

+typedef struct PublicationRelationInfo
+{
+ Oid relid;
+ Relation relation;
+ Node    *whereClause;
+} PublicationRelationInfo;
+

The new PublicationRelationInfo should also be added
src/tools/pgindent/typedefs.list

~~

16. src/include/nodes/parsenodes.h - new typedef

+typedef struct PublicationTable
+{
+ NodeTag type;
+ RangeVar   *relation; /* relation to be published */
+ Node    *whereClause; /* qualifications */
+} PublicationTable;

The new PublicationTable should also be added src/tools/pgindent/typedefs.list

~~

17. sql/publication.sql - show more output

+CREATE PUBLICATION testpub5 FOR TABLE testpub_rf_tbl1,
testpub_rf_tbl2 WHERE (c <> 'test' AND d < 5);
+RESET client_min_messages;
+ALTER PUBLICATION testpub5 ADD TABLE testpub_rf_tbl3 WHERE (e > 1000
AND e < 2000);
+ALTER PUBLICATION testpub5 DROP TABLE testpub_rf_tbl2;
+-- remove testpub_rf_tbl1 and add testpub_rf_tbl3 again (another
WHERE expression)
+ALTER PUBLICATION testpub5 SET TABLE testpub_rf_tbl3 WHERE (e > 300
AND e < 500);
+-- fail - functions disallowed
+ALTER PUBLICATION testpub5 ADD TABLE testpub_rf_tbl4 WHERE (length(g) < 6);
+-- fail - WHERE not allowed in DROP
+ALTER PUBLICATION testpub5 DROP TABLE testpub_rf_tbl3 WHERE (e < 27);
+\dRp+ testpub5

I felt that it would be better to have a "\dRp+ testpub5" after each
of the valid ALTER PUBLICATION steps to show the intermediate results
also; not just the final one at the end.

(PSA a temp patch showing what I mean by this review comment)

~~

18. src/test/subscription/t/020_row_filter.pl - rename file

I think this file should be renamed to 021_row_filter.pl as there is
already an 020 TAP test present.

~~

19. src/test/subscription/t/020_row_filter.pl - test comments

AFAIK the test cases are all OK, but it was really quite hard to
review these TAP tests to try to determine what the expected results
should be.

I found that I had to add my own comments to the file so I could
understand what was going on, so I think the TAP test can benefit lots
from having many more comments describing how the expected results are
determined.

Also, the filtering does not take place at the INSERT but really it is
affected only by which publications the subscription has subscribed
to. So I thought some of the existing comments (although correct) are
misplaced.

(PSA a temp patch showing what I mean by this review comment)

~~~

20. src/test/subscription/t/020_row_filter.pl - missing test case?

There are some partition tests, but I did not see any test that was
like 3 levels deep like A->B->C, so I was not sure if there is any
case C would ever make use of the filter of its parent B, or would it
only use the filter of the root A?

~~

21. src/test/subscription/t/020_row_filter.pl - missing test case?

If the same table is in multiple publications they can each have a row
filter. And a subscription might subscribe to some but not all of
those publications. I think this scenario is only partly tested.

e.g.
pub_1 has tableX with RowFilter1
pub_2 has tableX with RowFilter2

Then sub_12 subscribes to pub_1, pub_2
This is already tested in your TAP test (I think) and it makes sure
both filters are applied

But if there was also
pub_3 has tableX with RowFilter3

Then sub_12 still should only be checking the filtered RowFilter1 AND
RowFilter2 (but NOT row RowFilter3). I think this scenario is not
tested.

////////////////

POC PATCH FOR PLAN CACHE
========================

PSA a POC patch for a plan cache which gets used inside the
pgoutput_row_filter function instead of calling prepare for every row.
I think this is implementing something like Andes was suggesting a
while back [1].

Measurements with/without this plan cache:

Time spent processing within the pgoutput_row_filter function
- Data was captured using the same technique as the
0002-Measure-row-filter-overhead.patch.
- Inserted 1000 rows, sampled data for the first 100 times in this function.
not cached: average ~ 28.48 us
cached: average ~ 9.75 us

Replication times:
- Using tables and row filters same as in Onder's commands_to_test_perf.sql [2]
100K rows - not cached: ~ 42sec, 43sec, 44sec
100K rows - cached: ~ 41sec, 42sec, 42 sec.

There does seem to be a tiny gain achieved by having the plan cache,
but I think the gain might be a lot less than what people were
expecting.

Unless there are millions of rows the speedup may be barely noticeable.

--------
[1] https://www.postgresql.org/message-id/20210128022032.eq2qqc6zxkqn5syt%40alap3.anarazel.de
[2] https://www.postgresql.org/message-id/CACawEhW_iMnY9XK2tEb1ig%2BA%2BgKeB4cxdJcxMsoCU0SaKPExxg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Вложения

Re: row filtering for logical replication

От

Greg Nancarrow

Дата:

05 июля 2021 г., 03:14:02

On Thu, Jul 1, 2021 at 10:43 AM Euler Taveira <euler@eulerto.com> wrote:
>
> Amit, thanks for rebasing this patch. I already had a similar rebased patch in
> my local tree. A recent patch broke your version v15 so I rebased it.
>
> I like the idea of a simple create_estate_for_relation() function (I fixed an
> oversight regarding GetCurrentCommandId(false) because it is used only for
> read-only purposes). This patch also replaces all references to version 14.
>
> Commit ef948050 made some changes in the snapshot handling. Set the current
> active snapshot might not be required but future changes to allow functions
> will need it.
>
> As the previous patches, it includes commits (0002 and 0003) that are not
> intended to be committed. They are available for test-only purposes.
>

I have some review comments on the "Row filter for logical replication" patch:

(1) Suggested update to patch comment:
(There are some missing words and things which could be better expressed)

This feature adds row filtering for publication tables.
When a publication is defined or modified, rows that don't satisfy a WHERE
clause may be optionally filtered out. This allows a database or set of
tables to be partially replicated. The row filter is per table, which allows
different row filters to be defined for different tables. A new row filter
can be added simply by specifying a WHERE clause after the table name.
The WHERE clause must be enclosed by parentheses.

The WHERE clause should probably contain only columns that are part of the
primary key or that are covered by REPLICA IDENTITY. Otherwise, any DELETEs
won't be replicated. DELETE uses the old row version (that is limited to
primary key or REPLICA IDENTITY) to evaluate the row filter. INSERT and UPDATE
use the new row version to evaluate the row filter, hence, you can use any
column. If the row filter evaluates to NULL, it returns false. For simplicity,
functions are not allowed; that could possibly be addressed in a future patch.

If you choose to do the initial table synchronization, only data that satisfies
the row filters is sent. If the subscription has several publications in which
a table has been published with different WHERE clauses, rows must satisfy all
expressions to be copied. If subscriber is a pre-15 version, data
synchronization won't use row filters if they are defined in the publisher.
Previous versions cannot handle row filters.

If your publication contains a partitioned table, the publication parameter
publish_via_partition_root determines if it uses the partition row filter (if
the parameter is false, the default) or the root partitioned table row filter.

(2) Some inconsistent error message wording:

Currently:
err = _("cannot use subquery in publication WHERE expression");

Suggest changing it to:
err = _("subqueries are not allowed in publication WHERE expressions");

Other examples from the patch:
err = _("aggregate functions are not allowed in publication WHERE expressions");
err = _("grouping operations are not allowed in publication WHERE expressions");
err = _("window functions are not allowed in publication WHERE expressions");
errmsg("functions are not allowed in publication WHERE expressions"),
err = _("set-returning functions are not allowed in publication WHERE
expressions");

(3) The current code still allows arbitrary code execution, e.g. via a
user-defined operator:

e.g.
publisher:

CREATE OR REPLACE FUNCTION myop(left_arg INTEGER, right_arg INTEGER)
RETURNS BOOL AS
$$
BEGIN
  RAISE NOTICE 'I can do anything here!';
  RETURN left_arg > right_arg;
 END;
$$ LANGUAGE PLPGSQL VOLATILE;

CREATE OPERATOR >>>> (
  PROCEDURE = myop,
  LEFTARG = INTEGER,
  RIGHTARG = INTEGER
);

CREATE PUBLICATION tap_pub FOR TABLE test_tab WHERE (a >>>> 5);

subscriber:
CREATE SUBSCRIPTION tap_sub CONNECTION 'host=localhost dbname=test_pub
application_name=tap_sub' PUBLICATION tap_pub;

Perhaps add the following after the existing shell error-check in make_op():

/* User-defined operators are not allowed in publication WHERE clauses */
if (pstate->p_expr_kind == EXPR_KIND_PUBLICATION_WHERE && opform->oid
>= FirstNormalObjectId)
    ereport(ERROR,
    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
    errmsg("user-defined operators are not allowed in publication
WHERE expressions"),
    parser_errposition(pstate, location)));

Also, I believe it's also allowing user-defined CASTs (so could add a
similar check to above in transformTypeCast()).
Ideally, it would be preferable to validate/check publication WHERE
expressions in one central place, rather than scattered all over the
place, but that might be easier said than done.
You need to update the patch comment accordingly.

(4) src/backend/replication/pgoutput/pgoutput.c
pgoutput_change()

The 3 added calls to pgoutput_row_filter() are returning from
pgoutput_change(), if false is returned, but instead they should break
from the switch, otherwise cleanup code is missed. This is surely a
bug.

e.g.
(3 similar cases of this)

+ if (!pgoutput_row_filter(relation, NULL, tuple, relentry->qual))
+ return;

should be:

+ if (!pgoutput_row_filter(relation, NULL, tuple, relentry->qual))
+ break;

Regards,
Greg Nancarrow
Fujitsu Australia

Re: row filtering for logical replication

От

Greg Nancarrow

Дата:

07 июля 2021 г., 05:24:24

On Thu, Jul 1, 2021 at 10:43 AM Euler Taveira <euler@eulerto.com> wrote:
>
>
> Amit, thanks for rebasing this patch. I already had a similar rebased patch in
> my local tree. A recent patch broke your version v15 so I rebased it.
>

Hi,

I did some testing of the performance of the row filtering, in the
case of the publisher INSERTing 100,000 rows, using a similar test
setup and timing as previously used in the “commands_to_perf_test.sql“
script posted by Önder Kalacı.

I found that with the call to ExecInitExtraTupleSlot() in
pgoutput_row_filter(), then the performance of pgoutput_row_filter()
degrades considerably over the 100,000 invocations, and on my system
it took about 43 seconds to filter and send to the subscriber.
However, by caching the tuple table slot in RelationSyncEntry, this
duration can be dramatically reduced by 38+ seconds.
A further improvement can be made using this in combination with
Peter's plan cache (v16-0004).
I've attached a patch for this, which relies on the latest v16-0001
and v16-0004 patches posted by Peter Smith (noting that v16-0001 is
identical to your previously-posted 0001 patch).
Also attached is a graph (created by Peter Smith – thanks!) detailing
the performance improvement.

Regards,
Greg Nancarrow
Fujitsu Australia

Вложения

Re: row filtering for logical replication

От

"Euler Taveira"

Дата:

08 июля 2021 г., 00:33:41

On Wed, Jul 7, 2021, at 2:24 AM, Greg Nancarrow wrote:

I found that with the call to ExecInitExtraTupleSlot() in
pgoutput_row_filter(), then the performance of pgoutput_row_filter()
degrades considerably over the 100,000 invocations, and on my system
it took about 43 seconds to filter and send to the subscriber.
However, by caching the tuple table slot in RelationSyncEntry, this
duration can be dramatically reduced by 38+ seconds.
A further improvement can be made using this in combination with
Peter's plan cache (v16-0004).
I've attached a patch for this, which relies on the latest v16-0001
and v16-0004 patches posted by Peter Smith (noting that v16-0001 is
identical to your previously-posted 0001 patch).
Also attached is a graph (created by Peter Smith – thanks!) detailing
the performance improvement.

Greg, I like your suggestion and already integrate it (I replaced

ExecAllocTableSlot() with MakeSingleTupleTableSlot() because we don't need the

List). I'm still working on a new version to integrate all suggestions that you

and Peter did. I have a similar code to Peter's plan cache and I'm working on

merging both ideas together. I'm done for today but I'll continue tomorrow.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: row filtering for logical replication

От

Greg Nancarrow

Дата:

08 июля 2021 г., 01:03:34

On Thu, Jul 8, 2021 at 10:34 AM Euler Taveira <euler@eulerto.com> wrote:
>
> Greg, I like your suggestion and already integrate it (I replaced
> ExecAllocTableSlot() with MakeSingleTupleTableSlot() because we don't need the
> List).

Yes I agree, I found the same thing, it's not needed.

>I'm still working on a new version to integrate all suggestions that you
> and Peter did. I have a similar code to Peter's plan cache and I'm working on
> merging both ideas together. I'm done for today but I'll continue tomorrow.
>

I also realised that my 0005 patch wasn't handling RelationSyncEntry
invalidation, so I've updated it.
For completeness, I'm posting the complete patch set with the updates,
so you can look at it and compare with yours, and also it'll keep the
cfbot happy until you post your updated patch.

Regards,
Greg Nancarrow
Fujitsu Australia

Вложения

Re: row filtering for logical replication

От

"Euler Taveira"

Дата:

11 июля 2021 г., 19:39:24

On Fri, Jul 2, 2021, at 4:29 AM, Peter Smith wrote:

Hi.

I have been looking at the latest patch set (v16). Below are my review
comments and some patches.

Peter, thanks for your detailed review. Comments are inline.

1. Patch 0001 comment - typo

you can optionally filter rows that does not satisfy a WHERE condition

typo: does/does

Fixed.

2. Patch 0001 comment - typo

The WHERE clause should probably contain only columns that are part of
the primary key or that are covered by REPLICA IDENTITY. Otherwise,
and DELETEs won't be replicated.

typo: "Otherwise, and DELETEs" ??

Fixed.

3. Patch 0001 comment - typo and clarification

If your publication contains partitioned table, the parameter
publish_via_partition_root determines if it uses the partition row filter (if
the parameter is false -- default) or the partitioned table row filter.

Typo: "contains partitioned table" -> "contains a partitioned table"

Fixed.

Also, perhaps the text "or the partitioned table row filter." should
say "or the root partitioned table row filter." to disambiguate the
case where there are more levels of partitions like A->B->C. e.g. What
filter does C use?

I agree it can be confusing. BTW, CREATE PUBLICATION does not mention that the

root partitioned table is used. We should improve that sentence too.

4. src/backend/catalog/pg_publication.c - misleading names

-publication_add_relation(Oid pubid, Relation targetrel,
+publication_add_relation(Oid pubid, PublicationRelationInfo *targetrel,
bool if_not_exists)

Leaving this parameter name as "targetrel" seems a bit misleading now
in the function code. Maybe this should be called something like "pri"
which is consistent with other places where you have declared
PublicationRelationInfo.

Also, consider declaring some local variables so that the patch may
have less impact on existing code. e.g.
Oid relid = pri->relid
Relation *targetrel = relationinfo->relation

Done.

5. src/backend/commands/publicationcmds.c - simplify code

- rels = OpenTableList(stmt->tables);
+ if (stmt->tableAction == DEFELEM_DROP)
+ rels = OpenTableList(stmt->tables, true);
+ else
+ rels = OpenTableList(stmt->tables, false);

Consider writing that code more simply as just:

rels = OpenTableList(stmt->tables, stmt->tableAction == DEFELEM_DROP);

It is not a common pattern to use an expression as a function argument in

Postgres. I prefer to use a variable with a suggestive name.

6. src/backend/commands/publicationcmds.c - bug?

- CloseTableList(rels);
+ CloseTableList(rels, false);
}

Is this a potential bug? When you called OpenTableList the 2nd param
was maybe true/false, so is it correct to be unconditionally false
here? I am not sure.

Good catch.

7. src/backend/commands/publicationcmds.c - OpenTableList function comment.

* Open relations specified by a RangeVar list.
+ * AlterPublicationStmt->tables has a different list element, hence, is_drop
+ * indicates if it has a RangeVar (true) or PublicationTable (false).
* The returned tables are locked in ShareUpdateExclusiveLock mode in order to
* add them to a publication.

I am not sure about this. Should that comment instead say "indicates
if it has a Relation (true) or PublicationTable (false)"?

Fixed.

8. src/backend/commands/publicationcmds.c - OpenTableList

For some reason it feels kind of clunky to me for this function to be
processing the list differently according to the 2nd param. e.g. the
name "is_drop" seems quite unrelated to the function code, and more to
do with where it was called from. Sorry, I don't have any better ideas
for improvement atm.

My suggestion is to rename it to "pub_drop_table".

9. src/backend/commands/publicationcmds.c - OpenTableList bug?

I felt maybe this is a possible bug here because there seems no code
explicitly assigning the whereClause = NULL if "is_drop" is true so
maybe it can have a garbage value which could cause problems later.
Maybe this is fixed by using palloc0.

Fixed.

10. src/backend/commands/publicationcmds.c - CloseTableList function comment

Probably the meaning of "is_drop" should be described in this function comment.

Done.

11. src/backend/replication/pgoutput/pgoutput.c - get_rel_sync_entry signature.

-static RelationSyncEntry *get_rel_sync_entry(PGOutputData *data, Oid relid);
+static RelationSyncEntry *get_rel_sync_entry(PGOutputData *data, Relation rel);

I see that this function signature is modified but I did not see how
this parameter refactoring is actually related to the RowFilter patch.
Perhaps I am mistaken, but IIUC this only changes the relid =
RelationGetRelid(rel); to be done inside this function instead of
being done outside by the callers.

It is not critical for this patch so I removed it.

12. src/backend/replication/pgoutput/pgoutput.c - missing function comments

The static functions create_estate_for_relation and
pgoutput_row_filter_prepare_expr probably should be commented.

Done.

13. src/backend/replication/pgoutput/pgoutput.c -
pgoutput_row_filter_prepare_expr function name

+static ExprState *pgoutput_row_filter_prepare_expr(Node *rfnode,
EState *estate);

This function has an unfortunate name with the word "prepare" in it. I
wonder if a different name can be found for this function to avoid any
confusion with pgoutput functions (coming soon) which are related to
the two-phase commit "prepare".

The word "prepare" is related to the executor context. The function name

contains "row_filter" that is sufficient to distinguish it from any other

function whose context is "prepare". I replaced "prepare" with "init".

14. src/bin/psql/describe.c

+ if (!PQgetisnull(tabres, j, 2))
+ appendPQExpBuffer(&buf, " WHERE (%s)",
+ PQgetvalue(tabres, j, 2));

Because the where-clause value already has enclosing parentheses so
using " WHERE (%s)" seems overkill here. e.g. you can see the effect
in your src/test/regress/expected/publication.out file. I think this
should be changed to " WHERE %s" to give better output.

Peter E suggested that extra parenthesis be added. See 0005 [1].

15. src/include/catalog/pg_publication.h - new typedef

+typedef struct PublicationRelationInfo
+{
+ Oid relid;
+ Relation relation;
+ Node *whereClause;
+} PublicationRelationInfo;
+

The new PublicationRelationInfo should also be added
src/tools/pgindent/typedefs.list

Patches usually don't update typedefs.list. Check src/tools/pgindent/README.

16. src/include/nodes/parsenodes.h - new typedef

+typedef struct PublicationTable
+{
+ NodeTag type;
+ RangeVar *relation; /* relation to be published */
+ Node *whereClause; /* qualifications */
+} PublicationTable;

The new PublicationTable should also be added src/tools/pgindent/typedefs.list

Idem.

17. sql/publication.sql - show more output

+CREATE PUBLICATION testpub5 FOR TABLE testpub_rf_tbl1,
testpub_rf_tbl2 WHERE (c <> 'test' AND d < 5);
+RESET client_min_messages;
+ALTER PUBLICATION testpub5 ADD TABLE testpub_rf_tbl3 WHERE (e > 1000
AND e < 2000);
+ALTER PUBLICATION testpub5 DROP TABLE testpub_rf_tbl2;
+-- remove testpub_rf_tbl1 and add testpub_rf_tbl3 again (another
WHERE expression)
+ALTER PUBLICATION testpub5 SET TABLE testpub_rf_tbl3 WHERE (e > 300
AND e < 500);
+-- fail - functions disallowed
+ALTER PUBLICATION testpub5 ADD TABLE testpub_rf_tbl4 WHERE (length(g) < 6);
+-- fail - WHERE not allowed in DROP
+ALTER PUBLICATION testpub5 DROP TABLE testpub_rf_tbl3 WHERE (e < 27);
+\dRp+ testpub5

I felt that it would be better to have a "\dRp+ testpub5" after each
of the valid ALTER PUBLICATION steps to show the intermediate results
also; not just the final one at the end.

Done.

18. src/test/subscription/t/020_row_filter.pl - rename file

I think this file should be renamed to 021_row_filter.pl as there is
already an 020 TAP test present.

Done.

19. src/test/subscription/t/020_row_filter.pl - test comments

AFAIK the test cases are all OK, but it was really quite hard to
review these TAP tests to try to determine what the expected results
should be.

I included your comments but heavily changed it.

20. src/test/subscription/t/020_row_filter.pl - missing test case?

There are some partition tests, but I did not see any test that was
like 3 levels deep like A->B->C, so I was not sure if there is any
case C would ever make use of the filter of its parent B, or would it
only use the filter of the root A?

I didn't include it yet. There is an issue with initial synchronization and

partitioned table when you set publish_via_partition_root. I'll start another

thread for this issue.

21. src/test/subscription/t/020_row_filter.pl - missing test case?

If the same table is in multiple publications they can each have a row
filter. And a subscription might subscribe to some but not all of
those publications. I think this scenario is only partly tested.

e.g.
pub_1 has tableX with RowFilter1
pub_2 has tableX with RowFilter2

Then sub_12 subscribes to pub_1, pub_2
This is already tested in your TAP test (I think) and it makes sure
both filters are applied

But if there was also
pub_3 has tableX with RowFilter3

Then sub_12 still should only be checking the filtered RowFilter1 AND
RowFilter2 (but NOT row RowFilter3). I think this scenario is not
tested.

I added a new publication tap_pub_not_used to cover this case.

POC PATCH FOR PLAN CACHE
========================

PSA a POC patch for a plan cache which gets used inside the
pgoutput_row_filter function instead of calling prepare for every row.
I think this is implementing something like Andes was suggesting a
while back [1].

I also had a WIP patch for it (that's very similar to your patch) so I merged

it.

This cache mechanism consists of caching ExprState and avoid calling

pgoutput_row_filter_init_expr() for every single row. Greg N suggested in

another email that tuple table slot should also be cached to avoid a few cycles

too. It is also included in this new patch.

Measurements with/without this plan cache:

Time spent processing within the pgoutput_row_filter function
- Data was captured using the same technique as the
0002-Measure-row-filter-overhead.patch.
- Inserted 1000 rows, sampled data for the first 100 times in this function.
not cached: average ~ 28.48 us
cached: average ~ 9.75 us

Replication times:
- Using tables and row filters same as in Onder's commands_to_test_perf.sql [2]
100K rows - not cached: ~ 42sec, 43sec, 44sec
100K rows - cached: ~ 41sec, 42sec, 42 sec.

There does seem to be a tiny gain achieved by having the plan cache,
but I think the gain might be a lot less than what people were
expecting.

I did another measure using as baseline the previous patch (v16).

without cache (v16)

---------------------------

mean: 1.46 us

stddev: 2.13 us

median: 1.39 us

min-max: [0.69 .. 1456.69] us

percentile(99): 3.15 us

mode: 0.91 us

with cache (v18)

-----------------------

mean: 0.63 us

stddev: 1.07 us

median: 0.55 us

min-max: [0.29 .. 844.87] us

percentile(99): 1.38 us

mode: 0.41 us

It represents -57%. It is a really good optimization for just a few extra lines

of code.

[1] https://www.postgresql.org/message-id/57373e8b-1264-cd37-404e-8edbcf7884cc%40enterprisedb.com

Euler Taveira

EDB https://www.enterprisedb.com/

Вложения

Re: row filtering for logical replication

От

"Euler Taveira"

Дата:

11 июля 2021 г., 19:48:26

On Mon, Jul 5, 2021, at 12:14 AM, Greg Nancarrow wrote:

I have some review comments on the "Row filter for logical replication" patch:

(1) Suggested update to patch comment:
(There are some missing words and things which could be better expressed)

I incorporated all your wording suggestions.

(2) Some inconsistent error message wording:

Currently:
err = _("cannot use subquery in publication WHERE expression");

Suggest changing it to:
err = _("subqueries are not allowed in publication WHERE expressions");

The same expression "cannot use subquery in ..." is used in the other switch

cases. If you think this message can be improved, I suggest that you submit a

separate patch to change all sentences.

Other examples from the patch:
err = _("aggregate functions are not allowed in publication WHERE expressions");
err = _("grouping operations are not allowed in publication WHERE expressions");
err = _("window functions are not allowed in publication WHERE expressions");
errmsg("functions are not allowed in publication WHERE expressions"),
err = _("set-returning functions are not allowed in publication WHERE
expressions");

This is a different function. I just followed the same wording from similar

sentences around it.

(3) The current code still allows arbitrary code execution, e.g. via a
user-defined operator:

I fixed it in v18.

Perhaps add the following after the existing shell error-check in make_op():

/* User-defined operators are not allowed in publication WHERE clauses */
if (pstate->p_expr_kind == EXPR_KIND_PUBLICATION_WHERE && opform->oid
>= FirstNormalObjectId)
    ereport(ERROR,
    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
    errmsg("user-defined operators are not allowed in publication
WHERE expressions"),
    parser_errposition(pstate, location)));

I'm still working on a way to accept built-in functions but while we don't have

it, let's forbid custom operators too.

Also, I believe it's also allowing user-defined CASTs (so could add a
similar check to above in transformTypeCast()).
Ideally, it would be preferable to validate/check publication WHERE
expressions in one central place, rather than scattered all over the
place, but that might be easier said than done.
You need to update the patch comment accordingly.

I forgot to mention it in the patch I sent a few minutes ago. I'm not sure we

need to mention every error condition (specially one that will be rarely used).

(4) src/backend/replication/pgoutput/pgoutput.c
pgoutput_change()

The 3 added calls to pgoutput_row_filter() are returning from
pgoutput_change(), if false is returned, but instead they should break
from the switch, otherwise cleanup code is missed. This is surely a
bug.

Fixed.

In summary, v18 contains

* Peter Smith's review

* Greg Nancarrow's review

* cache ExprState

* cache TupleTableSlot

* forbid custom operators

* various fixes

Euler Taveira

EDB https://www.enterprisedb.com/

Re: row filtering for logical replication

От

Tomas Vondra

Дата:

11 июля 2021 г., 23:09:21

Hi,

I took a look at this patch, which seems to be in CF since 2018. I have 
only some basic comments and observations at this point:

1) alter_publication.sgml

I think "expression is executed" sounds a bit strange, perhaps 
"evaluated" would be better?

2) create_publication.sgml

Why is the patch changing publish_via_partition_root docs? That seems 
like a rather unrelated bit.

    The <literal>WHERE</literal> clause should probably contain only
    columns that are part of the primary key or be covered by
    <literal>REPLICA ...

I'm not sure what exactly is this trying to say. What does "should 
probably ..." mean in practice for the users? Does that mean something 
bad will happen for other columns, or what? I'm sure this wording will 
be quite confusing for users.

It may also be unclear whether the condition is evaluated on the old or 
new row, so perhaps add an example illustrating that & more detailed 
comment, or something. E.g. what will happen with

    UPDATE departments SET active = false WHERE active;


3) publication_add_relation

Does this need to build the parse state even for whereClause == NULL?


4) AlterPublicationTables

I wonder if this new reworked code might have issues with subscriptions 
containing many tables, but I haven't tried.


5) OpenTableList

I really dislike that the list can have two different node types 
(Relation and PublicationTable). In principle we don't actually need the 
extra flag, we can simply check the node type directly by IsA() and act 
based on that. However, I think it'd be better to just use a single node 
type from all places.

I don't see why not to set whereClause every time, I don't think the 
extra if saves anything, it's just a bit more complex.


5) CloseTableList

The comment about node types seems pointless, this function has no flag 
and the element type does not matter.


6) parse_agg.c

    ... are not allowed in publication WHERE expressions

I think all similar cases use "WHERE conditions" instead.


7) transformExprRecurse

The check at the beginning seems rather awkward / misplaced - it's way 
too specific for this location (there are no other p_expr_kind 
references in this function). Wouldn't transformFuncCall (or maybe 
ParseFuncOrColumn) be a more appropriate place?

Initially I was wondering why not to allow function calls in WHERE 
conditions, but I see that was discussed in the past as problematic. But 
that reminds me that I don't see any docs describing what expressions 
are allowed in WHERE conditions - maybe we should explicitly list what 
expressions are allowed?


8) pgoutput.c

I have not reviewed this in detail yet, but there seems to be something 
wrong because `make check-world` fails in subscription/010_truncate.pl 
after hitting an assert  (backtrace attached) during "START_REPLICATION 
SLOT" in get_rel_sync_entry in this code:

     /* Release tuple table slot */
     if (entry->scantuple != NULL)
     {
         ExecDropSingleTupleTableSlot(entry->scantuple);
         entry->scantuple = NULL;
     }

So there seems to be something wrong with how the slot is created.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

crash.txt

Re: row filtering for logical replication

От

"Euler Taveira"

Дата:

11 июля 2021 г., 23:30:24

On Sun, Jul 11, 2021, at 4:39 PM, Euler Taveira wrote:

with cache (v18)
-----------------------

mean:           0.63 us
stddev:         1.07 us
median:         0.55 us
min-max:        [0.29 .. 844.87] us
percentile(99): 1.38 us
mode:           0.41 us

It represents -57%. It is a really good optimization for just a few extra lines
of code.

cfbot seems to be unhappy with v18 on some of the hosts. Cirrus/FreeBSD failed

in the test 010_truncate. It also failed in a Cirrus/Linux box. I failed to

reproduce in my local FreeBSD box. Since it passes appveyor and Cirrus/macos,

it could probably be a transient issue.

$ uname -a

FreeBSD freebsd12 12.2-RELEASE FreeBSD 12.2-RELEASE r366954 GENERIC amd64

$ PROVE_TESTS="t/010_truncate.pl" gmake check

gmake -C ../../../src/backend generated-headers

gmake[1]: Entering directory '/usr/home/euler/pglr-row-filter-v17/src/backend'

gmake -C catalog distprep generated-header-symlinks

gmake[2]: Entering directory '/usr/home/euler/pglr-row-filter-v17/src/backend/catalog'

gmake[2]: Nothing to be done for 'distprep'.

gmake[2]: Nothing to be done for 'generated-header-symlinks'.

gmake[2]: Leaving directory '/usr/home/euler/pglr-row-filter-v17/src/backend/catalog'

gmake -C utils distprep generated-header-symlinks

gmake[2]: Entering directory '/usr/home/euler/pglr-row-filter-v17/src/backend/utils'

gmake[2]: Nothing to be done for 'distprep'.

gmake[2]: Nothing to be done for 'generated-header-symlinks'.

gmake[2]: Leaving directory '/usr/home/euler/pglr-row-filter-v17/src/backend/utils'

gmake[1]: Leaving directory '/usr/home/euler/pglr-row-filter-v17/src/backend'

rm -rf '/home/euler/pglr-row-filter-v17'/tmp_install

/bin/sh ../../../config/install-sh -c -d '/home/euler/pglr-row-filter-v17'/tmp_install/log

gmake -C '../../..' DESTDIR='/home/euler/pglr-row-filter-v17'/tmp_install install >'/home/euler/pglr-row-filter-v17'/tmp_install/log/install.log 2>&1

gmake -j1 checkprep >>'/home/euler/pglr-row-filter-v17'/tmp_install/log/install.log 2>&1

rm -rf '/usr/home/euler/pglr-row-filter-v17/src/test/subscription'/tmp_check

/bin/sh ../../../config/install-sh -c -d '/usr/home/euler/pglr-row-filter-v17/src/test/subscription'/tmp_check

cd . && TESTDIR='/usr/home/euler/pglr-row-filter-v17/src/test/subscription' PATH="/home/euler/pglr-row-filter-v17/tmp_install/home/euler/pgrf18/bin:$PATH" LD_LIBRARY_PATH="/home/euler/pglr-row-filter-v17/tmp_install/home/euler/pgrf18/lib" LD_LIBRARY_PATH_RPATH=1 PGPORT='69999' PG_REGRESS='/usr/home/euler/pglr-row-filter-v17/src/test/subscription/../../../src/test/regress/pg_regress' /usr/local/bin/prove -I ../../../src/test/perl/ -I . t/010_truncate.pl

t/010_truncate.pl .. ok

All tests successful.

Files=1, Tests=14, 5 wallclock secs ( 0.02 usr 0.00 sys + 1.09 cusr 0.99 csys = 2.10 CPU)

Result: PASS

Euler Taveira

EDB https://www.enterprisedb.com/

Re: row filtering for logical replication

От

Alvaro Herrera

Дата:

12 июля 2021 г., 01:48:57

Hi

Andres complained about the safety of doing general expression
evaluation in pgoutput; that was first in 

https://postgr.es/m/20210128022032.eq2qqc6zxkqn5syt@alap3.anarazel.de
where he described a possible approach to handle it by restricting
expressions to have limited shape; and later in
http://postgr.es/m/20210331191710.kqbiwe73lur7jo2e@alap3.anarazel.de

I was just scanning the patch trying to see if some sort of protection
had been added for this, but I couldn't find anything.  (Some functions
are under-commented, though).  So, is it there already, and if so what
is it?  And if it isn't, then I think it should definitely be put there
in some form.

Thanks

-- 
Álvaro Herrera              Valdivia, Chile  —  https://www.EnterpriseDB.com/

Re: row filtering for logical replication

От

Greg Nancarrow

Дата:

12 июля 2021 г., 02:15:04

On Mon, Jul 12, 2021 at 9:31 AM Euler Taveira <euler@eulerto.com> wrote:
>
> cfbot seems to be unhappy with v18 on some of the hosts. Cirrus/FreeBSD failed
> in the test 010_truncate. It also failed in a Cirrus/Linux box. I failed to
> reproduce in my local FreeBSD box. Since it passes appveyor and Cirrus/macos,
> it could probably be a transient issue.
>

I don't think it's a transient issue.
I also get a test failure in subscription/010_truncate.pl when I run
"make check-world" with the v18 patches applied.
The problem can be avoided with the following change (to match what
was originally in my v17-0005 performance-improvement patch):

diff --git a/src/backend/replication/pgoutput/pgoutput.c
b/src/backend/replication/pgoutput/pgoutput.c
index 08c018a300..800bae400b 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1256,8 +1256,8 @@ get_rel_sync_entry(PGOutputData *data, Relation relation)
         }

         /* create a tuple table slot for row filter */
-        tupdesc = RelationGetDescr(relation);
         oldctx = MemoryContextSwitchTo(CacheMemoryContext);
+        tupdesc = CreateTupleDescCopy(RelationGetDescr(relation));
         entry->scantuple = MakeSingleTupleTableSlot(tupdesc, &TTSOpsHeapTuple);
         MemoryContextSwitchTo(oldctx);

This creates a TupleDesc copy in CacheMemoryContext that is not
refcounted, so it side-steps the problem.
At this stage I am not sure why the original v18 patch code doesn't
work correctly for the TupleDesc refcounting here.
The TupleDesc refcount is zero when it's time to dealloc the tuple
slot (thus causing that Assert to fire), yet when the slot was
created, the TupleDesc refcount was incremented.- so it seems
something else has already decremented the refcount by the time it
comes to deallocate the slot. Perhaps there's an order-of-cleanup or
MemoryContext issue here or some buggy code somewhere, not sure yet.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: row filtering for logical replication

От

Amit Kapila

Дата:

12 июля 2021 г., 04:46:06

On Mon, Jul 12, 2021 at 7:19 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> Hi
>
> Andres complained about the safety of doing general expression
> evaluation in pgoutput; that was first in
>
> https://postgr.es/m/20210128022032.eq2qqc6zxkqn5syt@alap3.anarazel.de
> where he described a possible approach to handle it by restricting
> expressions to have limited shape; and later in
> http://postgr.es/m/20210331191710.kqbiwe73lur7jo2e@alap3.anarazel.de
>
> I was just scanning the patch trying to see if some sort of protection
> had been added for this, but I couldn't find anything.  (Some functions
> are under-commented, though).  So, is it there already, and if so what
> is it?
>

I think the patch is trying to prohibit arbitrary expressions in the
WHERE clause via
transformWhereClause(..EXPR_KIND_PUBLICATION_WHERE..). You can notice
that at various places the expressions are prohibited via
EXPR_KIND_PUBLICATION_WHERE. I am not sure that the checks are correct
and sufficient but I think there is some attempt to do it. For
example, the below sort of ad-hoc check for func_call doesn't seem to
be good idea.

@@ -119,6 +119,13 @@ transformExprRecurse(ParseState *pstate, Node *expr)
  /* Guard against stack overflow due to overly complex expressions */
  check_stack_depth();

+ /* Functions are not allowed in publication WHERE clauses */
+ if (pstate->p_expr_kind == EXPR_KIND_PUBLICATION_WHERE &&
nodeTag(expr) == T_FuncCall)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("functions are not allowed in publication WHERE expressions"),
+ parser_errposition(pstate, exprLocation(expr))));

Now, the other idea I had in mind was to traverse the WHERE clause
expression in publication_add_relation and identify if it contains
anything other than the ANDed list of 'foo.bar op constant'
expressions. OTOH, for index where clause expressions or policy check
expressions, we use a technique similar to what we have in the patch
to prohibit certain kinds of expressions.

Do you have any preference on how this should be addressed?

-- 
With Regards,
Amit Kapila.

Re: row filtering for logical replication

От

Tomas Vondra

Дата:

12 июля 2021 г., 09:31:23


On 7/12/21 6:46 AM, Amit Kapila wrote:
> On Mon, Jul 12, 2021 at 7:19 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>>
>> Hi
>>
>> Andres complained about the safety of doing general expression
>> evaluation in pgoutput; that was first in
>>
>> https://postgr.es/m/20210128022032.eq2qqc6zxkqn5syt@alap3.anarazel.de
>> where he described a possible approach to handle it by restricting
>> expressions to have limited shape; and later in
>> http://postgr.es/m/20210331191710.kqbiwe73lur7jo2e@alap3.anarazel.de
>>
>> I was just scanning the patch trying to see if some sort of protection
>> had been added for this, but I couldn't find anything.  (Some functions
>> are under-commented, though).  So, is it there already, and if so what
>> is it?
>>
> 
> I think the patch is trying to prohibit arbitrary expressions in the
> WHERE clause via
> transformWhereClause(..EXPR_KIND_PUBLICATION_WHERE..). You can notice
> that at various places the expressions are prohibited via
> EXPR_KIND_PUBLICATION_WHERE. I am not sure that the checks are correct
> and sufficient but I think there is some attempt to do it. For
> example, the below sort of ad-hoc check for func_call doesn't seem to
> be good idea.
> 
> @@ -119,6 +119,13 @@ transformExprRecurse(ParseState *pstate, Node *expr)
>    /* Guard against stack overflow due to overly complex expressions */
>    check_stack_depth();
> 
> + /* Functions are not allowed in publication WHERE clauses */
> + if (pstate->p_expr_kind == EXPR_KIND_PUBLICATION_WHERE &&
> nodeTag(expr) == T_FuncCall)
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("functions are not allowed in publication WHERE expressions"),
> + parser_errposition(pstate, exprLocation(expr))));
> 

Yes, I mentioned this bit of code in my review, although I was mostly 
wondering if this is the wrong place to make this check.

> Now, the other idea I had in mind was to traverse the WHERE clause
> expression in publication_add_relation and identify if it contains
> anything other than the ANDed list of 'foo.bar op constant'
> expressions. OTOH, for index where clause expressions or policy check
> expressions, we use a technique similar to what we have in the patch
> to prohibit certain kinds of expressions.
> 
> Do you have any preference on how this should be addressed?
> 

I don't think this is sufficient, because who knows where "op" comes 
from? It might be from an extension, in which case the problem pointed 
out by Petr Jelinek [1] would apply. OTOH I suppose we could allow 
expressions like (Var op Var), i.e. "a < b" or something like that. And 
then why not allow (a+b < c-10) and similar "more complex" expressions, 
as long as all the operators are built-in?

In terms of implementation, I think there are two basic options - either 
we can define a new "expression" type in gram.y, which would be a subset 
of a_expr etc. Or we can do it as some sort of expression walker, kinda 
like what the transform* functions do now.


regards

[1] 
https://www.postgresql.org/message-id/92e5587d-28b8-5849-2374-5ca3863256f1%402ndquadrant.com

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: row filtering for logical replication

От

Amit Kapila

Дата:

12 июля 2021 г., 09:35:10

On Mon, Jul 12, 2021 at 1:09 AM Euler Taveira <euler@eulerto.com> wrote:
>
> I did another measure using as baseline the previous patch (v16).
>
> without cache (v16)
> ---------------------------
>
> mean:           1.46 us
> stddev:         2.13 us
> median:         1.39 us
> min-max:        [0.69 .. 1456.69] us
> percentile(99): 3.15 us
> mode:           0.91 us
>
> with cache (v18)
> -----------------------
>
> mean:           0.63 us
> stddev:         1.07 us
> median:         0.55 us
> min-max:        [0.29 .. 844.87] us
> percentile(99): 1.38 us
> mode:           0.41 us
>
> It represents -57%. It is a really good optimization for just a few extra lines
> of code.
>

Good improvement but I think it is better to measure the performance
by using synchronous_replication by setting the subscriber as
standby_synchronous_names, which will provide the overall saving of
time. We can probably see when the timings when no rows are filtered,
when 10% rows are filtered when 30% are filtered and so on.

I think the way caching has been done in the patch is a bit
inefficient. Basically, it always invalidates and rebuilds the
expressions even though some unrelated operation has happened on
publication. For example, say publication has initially table t1 with
rowfilter r1 for which we have cached the state. Now you altered
publication and added table t2, it will invalidate the entire state of
t1 as well. I think we can avoid that if we invalidate the rowfilter
related state only on relcache invalidation i.e in
rel_sync_cache_relation_cb and save it the very first time we prepare
the expression. In that case, we don't need to do it in advance when
preparing relsyncentry, this will have the additional advantage that
we won't spend cycles on preparing state unless it is required (for
truncate we won't require row_filtering, so it won't be prepared).

Few other things, I have noticed:
1.
I am seeing tupledesc leak by following below steps:
ERROR:  tupdesc reference 00000000008D7D18 is not owned by resource
owner TopTransaction
CONTEXT:  slot "tap_sub", output plugin "pgoutput", in the change
callback, associated LSN 0/170BD50

Publisher
CREATE TABLE tab_rowfilter_1 (a int primary key, b text);
CREATE PUBLICATION tap_pub_1 FOR TABLE tab_rowfilter_1 WHERE (a > 1000
AND b <> 'filtered');

Subscriber
CREATE TABLE tab_rowfilter_1 (a int primary key, b text);
CREATE SUBSCRIPTION tap_sub
         CONNECTION 'host=localhost port=5432 dbname=postgres'
        PUBLICATION tap_pub_1;

Publisher
INSERT INTO tab_rowfilter_1 (a, b) VALUES (1980, 'not filtered');
Alter table tab_rowfilter_1 drop column b cascade;
INSERT INTO tab_rowfilter_1 (a) VALUES (1982);

2.
postgres=# Alter table tab_rowfilter_1 alter column b set data type varchar;
ERROR:  unexpected object depending on column: publication of table
tab_rowfilter_1 in publication tap_pub_1

I think for this you need to change ATExecAlterColumnType to handle
the publication case.

-- 
With Regards,
Amit Kapila.

Re: row filtering for logical replication

От

Tomas Vondra

Дата:

12 июля 2021 г., 11:44:42

While looking at the other logrep patch [1] (column filtering) I noticed
Alvaro's comment regarding a new parsenode (PublicationTable) not having
read/out/equal/copy funcs. I'd bet the same thing applies here, so
perhaps see if the patch needs the same fix.

[1]
https://www.postgresql.org/message-id/202107062342.eq6htmp2wgp2%40alvherre.pgsql

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: row filtering for logical replication

От

"Euler Taveira"

Дата:

12 июля 2021 г., 19:53:08

On Mon, Jul 12, 2021, at 8:44 AM, Tomas Vondra wrote:

While looking at the other logrep patch [1] (column filtering) I noticed
Alvaro's comment regarding a new parsenode (PublicationTable) not having
read/out/equal/copy funcs. I'd bet the same thing applies here, so
perhaps see if the patch needs the same fix.

Good catch! I completely forgot about _copyPublicationTable() and

_equalPublicationTable().

Euler Taveira

EDB https://www.enterprisedb.com/

Re: row filtering for logical replication

От

Peter Smith

Дата:

13 июля 2021 г., 03:25:48

On Mon, Jul 12, 2021 at 5:39 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Fri, Jul 2, 2021, at 4:29 AM, Peter Smith wrote:
>
> Hi.
>
> I have been looking at the latest patch set (v16). Below are my review
> comments and some patches.
>
> Peter, thanks for your detailed review. Comments are inline.
>

Hi Euler,

Thanks for addressing my previous review comments.

I have reviewed the latest v18 patch. Below are some more review
comments and patches.

(The patches 0003,0004 are just examples of what is mentioned in my
comments; The patches 0001,0002 are there only to try to keep cfbot
green).

//////////

1. Commit comment - wording

"When a publication is defined or modified, rows that don't satisfy a
WHERE clause may be
optionally filtered out."

=>

I think this means to say: "Rows that don't satisfy an optional WHERE
clause will be filtered out."

------

2. Commit comment - wording

"The row filter is per table, which allows different row filters to be
defined for different tables."

=>

I think all that is the same as just saying: "The row filter is per table."

------

3. PG docs - independent improvement

You wrote (ref [1] point 3):

"I agree it can be confusing. BTW, CREATE PUBLICATION does not mention that the
root partitioned table is used. We should improve that sentence too."

I agree, but that PG docs improvement is independent of your RowFilter
patch; please make another thread for that idea.

------

4. doc/src/sgml/ref/create_publication.sgml - independent improvement

@@ -131,9 +135,9 @@ CREATE PUBLICATION <replaceable
class="parameter">name</replaceable>
           on its partitions) contained in the publication will be published
           using the identity and schema of the partitioned table rather than
           that of the individual partitions that are actually changed; the
-          latter is the default.  Enabling this allows the changes to be
-          replicated into a non-partitioned table or a partitioned table
-          consisting of a different set of partitions.
+          latter is the default (<literal>false</literal>).  Enabling this
+          allows the changes to be replicated into a non-partitioned table or a
+          partitioned table consisting of a different set of partitions.
          </para>

I think that Tomas wrote (ref [2] point 2) that this change seems
unrelated to your RowFilter patch.

I agree; I liked the change, but IMO you need to propose this one in
another thread too.

------

5. doc/src/sgml/ref/create_subscription.sgml - wording

@@ -102,7 +102,16 @@ CREATE SUBSCRIPTION <replaceable
class="parameter">subscription_name</replaceabl
          <para>
           Specifies whether the existing data in the publications that are
           being subscribed to should be copied once the replication starts.
-          The default is <literal>true</literal>.
+          The default is <literal>true</literal>. If any table in the
+          publications has a <literal>WHERE</literal> clause, rows that do not
+          satisfy the <replaceable class="parameter">expression</replaceable>
+          will not be copied. If the subscription has several publications in
+          which a table has been published with different
+          <literal>WHERE</literal> clauses, rows must satisfy all expressions
+          to be copied. If any table in the publications has a
+          <literal>WHERE</literal> clause, data synchronization does not use it
+          if the subscriber is a <productname>PostgreSQL</productname> version
+          before 15.

I felt that the sentence: "If any table in the publications has a
<literal>WHERE</literal> clause, data synchronization does not use it
if the subscriber is a <productname>PostgreSQL</productname> version
before 15."

Could be expressed more simply like: "If the subscriber is a
<productname>PostgreSQL</productname> version before 15 then any row
filtering is ignored."

------

6. src/backend/commands/publicationcmds.c - wrong function comment

@@ -585,6 +611,9 @@ OpenTableList(List *tables)

 /*
  * Close all relations in the list.
+ *
+ * Publication node can have a different list element, hence, pub_drop_table
+ * indicates if it has a Relation (true) or PublicationTable (false).
  */
 static void
 CloseTableList(List *rels)

=>

The 2nd parameter does not exist in v18, so that comment about
pub_drop_table seems to be a cut/paste error from the OpenTableList.

------

src/backend/replication/logical/tablesync.c - bug ?

@@ -829,16 +883,23 @@ copy_table(Relation rel)
  relmapentry = logicalrep_rel_open(lrel.remoteid, NoLock);
  Assert(rel == relmapentry->localrel);

+ /* List of columns for COPY */
+ attnamelist = make_copy_attnamelist(relmapentry);
+
  /* Start copy on the publisher. */
=>

I did not understand the above call to make_copy_attnamelist. The
result seems unused before it is overwritten later in this same
function (??)

------

7. src/backend/replication/logical/tablesync.c  -
fetch_remote_table_info enhancement

+ /* Get relation qual */
+ if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 150000)
+ {
+ resetStringInfo(&cmd);
+ appendStringInfo(&cmd,
+ "SELECT pg_get_expr(prqual, prrelid) "
+ "  FROM pg_publication p "
+ "  INNER JOIN pg_publication_rel pr "
+ "       ON (p.oid = pr.prpubid) "
+ " WHERE pr.prrelid = %u "
+ "   AND p.pubname IN (", lrel->remoteid);

=>

I think a small improvement is possible in this SQL.

If we change that to "SELECT DISTINCT pg_get_expr(prqual, prrelid)"...
then it avoids the copy SQL from having multiple WHERE clauses which
are all identical. This could happen when subscribed to multiple
publications which had the same filter for the same table.

I attached a tmp POC patch for this change and it works as expected.
For example, I subscribe to 3 publications, but 2 of them have the
same filter for the table.

BEFORE
COPY (SELECT key, value, data FROM public.test WHERE (key > 0) AND
(key > 1000) AND (key > 1000)) TO STDOUT

AFTER
COPY (SELECT key, value, data FROM public.test WHERE (key > 0) AND
(key > 1000) ) TO STDOUT

------

8. src/backend/replication/pgoutput/pgoutput.c - qual member is redundant

@@ -99,6 +108,9 @@ typedef struct RelationSyncEntry

  bool replicate_valid;
  PublicationActions pubactions;
+ List    *qual; /* row filter */
+ List    *exprstate; /* ExprState for row filter */
+ TupleTableSlot *scantuple; /* tuple table slot for row filter */

=>

Now that the exprstate is introduced I think that the other member
"qual" is redundant, so it can be removed.

FYI - I attached a tmp patch with all the qual references deleted and
everything is fine.

------

9. src/backend/replication/pgoutput/pgoutput.c - comment typo?

+ /*
+ * Cache ExprState using CacheMemoryContext. This is the same code as
+ * ExecPrepareExpr() but it is not used because it doesn't use an EState.
+ * It should probably be another function in the executor to handle the
+ * execution outside a normal Plan tree context.
+ */

=>

typo: it/that ?

I think it ought to say "This is the same code as ExecPrepareExpr()
but that is not used because"...

------

10. src/backend/replication/pgoutput/pgoutput.c - redundant debug logging?

+ /* Evaluates row filter */
+ result = pgoutput_row_filter_exec_expr(exprstate, ecxt);
+
+ elog(DEBUG3, "row filter %smatched", result ? "" : "not ");

The above debug logging is really only a repeat (with different
wording) of the same information already being logged inside the
pgoutput_row_filter_exec_expr function isn't it? Consider removing the
redundant logging.

e.g. This is already getting logged by pgoutput_row_filter_exec_expr:

    elog(DEBUG3, "row filter evaluates to %s (isnull: %s)",
         DatumGetBool(ret) ? "true" : "false",
         isnull ? "true" : "false");


------
[1] https://www.postgresql.org/message-id/532a18d8-ce90-4444-8570-8a9fcf09f329%40www.fastmail.com
[2] https://www.postgresql.org/message-id/849ee491-bba3-c0ae-cc25-4fce1c03f105%40enterprisedb.com
[3] https://www.postgresql.org/message-id/532a18d8-ce90-4444-8570-8a9fcf09f329%40www.fastmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Вложения

Re: row filtering for logical replication

От

Amit Kapila

Дата:

13 июля 2021 г., 04:54:39

On Mon, Jul 12, 2021 at 3:01 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> On 7/12/21 6:46 AM, Amit Kapila wrote:
> > On Mon, Jul 12, 2021 at 7:19 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> > Now, the other idea I had in mind was to traverse the WHERE clause
> > expression in publication_add_relation and identify if it contains
> > anything other than the ANDed list of 'foo.bar op constant'
> > expressions. OTOH, for index where clause expressions or policy check
> > expressions, we use a technique similar to what we have in the patch
> > to prohibit certain kinds of expressions.
> >
> > Do you have any preference on how this should be addressed?
> >
>
> I don't think this is sufficient, because who knows where "op" comes
> from? It might be from an extension, in which case the problem pointed
> out by Petr Jelinek [1] would apply. OTOH I suppose we could allow
> expressions like (Var op Var), i.e. "a < b" or something like that. And
> then why not allow (a+b < c-10) and similar "more complex" expressions,
> as long as all the operators are built-in?
>

Yeah, and the patch already disallows the user-defined operators in
filters. I think ideally if the operator doesn't refer to UDFs, we can
allow to directly use such an OP in the filter as we can add a
dependency for the same.

> In terms of implementation, I think there are two basic options - either
> we can define a new "expression" type in gram.y, which would be a subset
> of a_expr etc. Or we can do it as some sort of expression walker, kinda
> like what the transform* functions do now.
>

I think it is better to use some form of walker here rather than
extending the grammar for this. However, the question is do we need
some special kind of expression walker here or can we handle all
required cases via transformWhereClause() call as the patch is trying
to do. AFAIU, the main things we want to prohibit in the filter are:
(a) it doesn't refer to any relation other than catalog in where
clause, (b) it doesn't use UDFs in any way (in expressions, in
user-defined operators, user-defined types, etc.), (c) the columns
referred to in the filter should be part of PK or Replica Identity.
Now, if all such things can be detected by the approach patch has
taken then why do we need a special kind of expression walker? OTOH,
if we can't detect some of this then probably we can use a special
walker.

I think in the long run one idea to allow UDFs is probably by
explicitly allowing users to specify whether the function is
publication predicate safe and if so, then we can allow such functions
in the filter clause.

-- 
With Regards,
Amit Kapila.

Re: row filtering for logical replication

От

Peter Smith

Дата:

13 июля 2021 г., 06:59:08

Hi Euler,

Greg noticed that your patch set was missing any implementation of the
psql tab auto-complete for the new row filter WHERE syntax.

So I have added a POC patch for this missing feature.

Unfortunately, there is an existing HEAD problem overlapping with this
exact same code. I reported this already in another thread [1].

So there are 2 patches attached here:
0001 - Fixes the other reported problem (I hope this may be pushed soon)
0002 - Adds the tab-completion code for your row filter WHERE's

------
[1] https://www.postgresql.org/message-id/CAHut+Ps-vkmnWAShWSRVCB3gx8aM=bFoDqWgBNTzofK0q1LpwA@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

On Tue, Jul 13, 2021 at 1:25 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Mon, Jul 12, 2021 at 5:39 AM Euler Taveira <euler@eulerto.com> wrote:
> >
> > On Fri, Jul 2, 2021, at 4:29 AM, Peter Smith wrote:
> >
> > Hi.
> >
> > I have been looking at the latest patch set (v16). Below are my review
> > comments and some patches.
> >
> > Peter, thanks for your detailed review. Comments are inline.
> >
>
> Hi Euler,
>
> Thanks for addressing my previous review comments.
>
> I have reviewed the latest v18 patch. Below are some more review
> comments and patches.
>
> (The patches 0003,0004 are just examples of what is mentioned in my
> comments; The patches 0001,0002 are there only to try to keep cfbot
> green).
>
> //////////
>
> 1. Commit comment - wording
>
> "When a publication is defined or modified, rows that don't satisfy a
> WHERE clause may be
> optionally filtered out."
>
> =>
>
> I think this means to say: "Rows that don't satisfy an optional WHERE
> clause will be filtered out."
>
> ------
>
> 2. Commit comment - wording
>
> "The row filter is per table, which allows different row filters to be
> defined for different tables."
>
> =>
>
> I think all that is the same as just saying: "The row filter is per table."
>
> ------
>
> 3. PG docs - independent improvement
>
> You wrote (ref [1] point 3):
>
> "I agree it can be confusing. BTW, CREATE PUBLICATION does not mention that the
> root partitioned table is used. We should improve that sentence too."
>
> I agree, but that PG docs improvement is independent of your RowFilter
> patch; please make another thread for that idea.
>
> ------
>
> 4. doc/src/sgml/ref/create_publication.sgml - independent improvement
>
> @@ -131,9 +135,9 @@ CREATE PUBLICATION <replaceable
> class="parameter">name</replaceable>
>            on its partitions) contained in the publication will be published
>            using the identity and schema of the partitioned table rather than
>            that of the individual partitions that are actually changed; the
> -          latter is the default.  Enabling this allows the changes to be
> -          replicated into a non-partitioned table or a partitioned table
> -          consisting of a different set of partitions.
> +          latter is the default (<literal>false</literal>).  Enabling this
> +          allows the changes to be replicated into a non-partitioned table or a
> +          partitioned table consisting of a different set of partitions.
>           </para>
>
> I think that Tomas wrote (ref [2] point 2) that this change seems
> unrelated to your RowFilter patch.
>
> I agree; I liked the change, but IMO you need to propose this one in
> another thread too.
>
> ------
>
> 5. doc/src/sgml/ref/create_subscription.sgml - wording
>
> @@ -102,7 +102,16 @@ CREATE SUBSCRIPTION <replaceable
> class="parameter">subscription_name</replaceabl
>           <para>
>            Specifies whether the existing data in the publications that are
>            being subscribed to should be copied once the replication starts.
> -          The default is <literal>true</literal>.
> +          The default is <literal>true</literal>. If any table in the
> +          publications has a <literal>WHERE</literal> clause, rows that do not
> +          satisfy the <replaceable class="parameter">expression</replaceable>
> +          will not be copied. If the subscription has several publications in
> +          which a table has been published with different
> +          <literal>WHERE</literal> clauses, rows must satisfy all expressions
> +          to be copied. If any table in the publications has a
> +          <literal>WHERE</literal> clause, data synchronization does not use it
> +          if the subscriber is a <productname>PostgreSQL</productname> version
> +          before 15.
>
> I felt that the sentence: "If any table in the publications has a
> <literal>WHERE</literal> clause, data synchronization does not use it
> if the subscriber is a <productname>PostgreSQL</productname> version
> before 15."
>
> Could be expressed more simply like: "If the subscriber is a
> <productname>PostgreSQL</productname> version before 15 then any row
> filtering is ignored."
>
> ------
>
> 6. src/backend/commands/publicationcmds.c - wrong function comment
>
> @@ -585,6 +611,9 @@ OpenTableList(List *tables)
>
>  /*
>   * Close all relations in the list.
> + *
> + * Publication node can have a different list element, hence, pub_drop_table
> + * indicates if it has a Relation (true) or PublicationTable (false).
>   */
>  static void
>  CloseTableList(List *rels)
>
> =>
>
> The 2nd parameter does not exist in v18, so that comment about
> pub_drop_table seems to be a cut/paste error from the OpenTableList.
>
> ------
>
> src/backend/replication/logical/tablesync.c - bug ?
>
> @@ -829,16 +883,23 @@ copy_table(Relation rel)
>   relmapentry = logicalrep_rel_open(lrel.remoteid, NoLock);
>   Assert(rel == relmapentry->localrel);
>
> + /* List of columns for COPY */
> + attnamelist = make_copy_attnamelist(relmapentry);
> +
>   /* Start copy on the publisher. */
> =>
>
> I did not understand the above call to make_copy_attnamelist. The
> result seems unused before it is overwritten later in this same
> function (??)
>
> ------
>
> 7. src/backend/replication/logical/tablesync.c  -
> fetch_remote_table_info enhancement
>
> + /* Get relation qual */
> + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 150000)
> + {
> + resetStringInfo(&cmd);
> + appendStringInfo(&cmd,
> + "SELECT pg_get_expr(prqual, prrelid) "
> + "  FROM pg_publication p "
> + "  INNER JOIN pg_publication_rel pr "
> + "       ON (p.oid = pr.prpubid) "
> + " WHERE pr.prrelid = %u "
> + "   AND p.pubname IN (", lrel->remoteid);
>
> =>
>
> I think a small improvement is possible in this SQL.
>
> If we change that to "SELECT DISTINCT pg_get_expr(prqual, prrelid)"...
> then it avoids the copy SQL from having multiple WHERE clauses which
> are all identical. This could happen when subscribed to multiple
> publications which had the same filter for the same table.
>
> I attached a tmp POC patch for this change and it works as expected.
> For example, I subscribe to 3 publications, but 2 of them have the
> same filter for the table.
>
> BEFORE
> COPY (SELECT key, value, data FROM public.test WHERE (key > 0) AND
> (key > 1000) AND (key > 1000)) TO STDOUT
>
> AFTER
> COPY (SELECT key, value, data FROM public.test WHERE (key > 0) AND
> (key > 1000) ) TO STDOUT
>
> ------
>
> 8. src/backend/replication/pgoutput/pgoutput.c - qual member is redundant
>
> @@ -99,6 +108,9 @@ typedef struct RelationSyncEntry
>
>   bool replicate_valid;
>   PublicationActions pubactions;
> + List    *qual; /* row filter */
> + List    *exprstate; /* ExprState for row filter */
> + TupleTableSlot *scantuple; /* tuple table slot for row filter */
>
> =>
>
> Now that the exprstate is introduced I think that the other member
> "qual" is redundant, so it can be removed.
>
> FYI - I attached a tmp patch with all the qual references deleted and
> everything is fine.
>
> ------
>
> 9. src/backend/replication/pgoutput/pgoutput.c - comment typo?
>
> + /*
> + * Cache ExprState using CacheMemoryContext. This is the same code as
> + * ExecPrepareExpr() but it is not used because it doesn't use an EState.
> + * It should probably be another function in the executor to handle the
> + * execution outside a normal Plan tree context.
> + */
>
> =>
>
> typo: it/that ?
>
> I think it ought to say "This is the same code as ExecPrepareExpr()
> but that is not used because"...
>
> ------
>
> 10. src/backend/replication/pgoutput/pgoutput.c - redundant debug logging?
>
> + /* Evaluates row filter */
> + result = pgoutput_row_filter_exec_expr(exprstate, ecxt);
> +
> + elog(DEBUG3, "row filter %smatched", result ? "" : "not ");
>
> The above debug logging is really only a repeat (with different
> wording) of the same information already being logged inside the
> pgoutput_row_filter_exec_expr function isn't it? Consider removing the
> redundant logging.
>
> e.g. This is already getting logged by pgoutput_row_filter_exec_expr:
>
>     elog(DEBUG3, "row filter evaluates to %s (isnull: %s)",
>          DatumGetBool(ret) ? "true" : "false",
>          isnull ? "true" : "false");
>
>
> ------
> [1] https://www.postgresql.org/message-id/532a18d8-ce90-4444-8570-8a9fcf09f329%40www.fastmail.com
> [2] https://www.postgresql.org/message-id/849ee491-bba3-c0ae-cc25-4fce1c03f105%40enterprisedb.com
> [3] https://www.postgresql.org/message-id/532a18d8-ce90-4444-8570-8a9fcf09f329%40www.fastmail.com
>
> Kind Regards,
> Peter Smith.
> Fujitsu Australia

On Tue, Jul 13, 2021, at 12:25 AM, Peter Smith wrote:

I have reviewed the latest v18 patch. Below are some more review
comments and patches.

Peter, thanks for quickly check the new patch. I'm attaching a new patch (v19)

that addresses (a) this new review, (b) Tomas' review and (c) Greg's review. I

also included the copy/equal node support for the new node (PublicationTable)

mentioned by Tomas in another email.

1. Commit comment - wording

=>

I think this means to say: "Rows that don't satisfy an optional WHERE
clause will be filtered out."

Agreed.

2. Commit comment - wording

"The row filter is per table, which allows different row filters to be
defined for different tables."

=>

I think all that is the same as just saying: "The row filter is per table."

Agreed.

3. PG docs - independent improvement

You wrote (ref [1] point 3):

"I agree it can be confusing. BTW, CREATE PUBLICATION does not mention that the
root partitioned table is used. We should improve that sentence too."

I agree, but that PG docs improvement is independent of your RowFilter
patch; please make another thread for that idea.

I will. And I will also include the next item that I removed from the patch.

4. doc/src/sgml/ref/create_publication.sgml - independent improvement

@@ -131,9 +135,9 @@ CREATE PUBLICATION <replaceable
class="parameter">name</replaceable>
           on its partitions) contained in the publication will be published
           using the identity and schema of the partitioned table rather than
           that of the individual partitions that are actually changed; the
-          latter is the default. Enabling this allows the changes to be
-          replicated into a non-partitioned table or a partitioned table
-          consisting of a different set of partitions.
+          latter is the default (<literal>false</literal>). Enabling this
+          allows the changes to be replicated into a non-partitioned table or a
+          partitioned table consisting of a different set of partitions.
          </para>

I think that Tomas wrote (ref [2] point 2) that this change seems
unrelated to your RowFilter patch.

I agree; I liked the change, but IMO you need to propose this one in
another thread too.

Reverted.

5. doc/src/sgml/ref/create_subscription.sgml - wording

I felt that the sentence: "If any table in the publications has a
<literal>WHERE</literal> clause, data synchronization does not use it
if the subscriber is a <productname>PostgreSQL</productname> version
before 15."

Could be expressed more simply like: "If the subscriber is a
<productname>PostgreSQL</productname> version before 15 then any row
filtering is ignored."

Agreed.

6. src/backend/commands/publicationcmds.c - wrong function comment

/*
* Close all relations in the list.
+ *
+ * Publication node can have a different list element, hence, pub_drop_table
+ * indicates if it has a Relation (true) or PublicationTable (false).
*/
static void
CloseTableList(List *rels)

=>

The 2nd parameter does not exist in v18, so that comment about
pub_drop_table seems to be a cut/paste error from the OpenTableList.

Oops. Removed.

src/backend/replication/logical/tablesync.c - bug ?

@@ -829,16 +883,23 @@ copy_table(Relation rel)
relmapentry = logicalrep_rel_open(lrel.remoteid, NoLock);
Assert(rel == relmapentry->localrel);

+ /* List of columns for COPY */
+ attnamelist = make_copy_attnamelist(relmapentry);
+
/* Start copy on the publisher. */
=>

I did not understand the above call to make_copy_attnamelist. The
result seems unused before it is overwritten later in this same
function (??)

Good catch. This seems to be a leftover from an ancient version.

7. src/backend/replication/logical/tablesync.c -
fetch_remote_table_info enhancement

+ /* Get relation qual */
+ if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 150000)
+ {
+ resetStringInfo(&cmd);
+ appendStringInfo(&cmd,
+ "SELECT pg_get_expr(prqual, prrelid) "
+ " FROM pg_publication p "
+ " INNER JOIN pg_publication_rel pr "
+ " ON (p.oid = pr.prpubid) "
+ " WHERE pr.prrelid = %u "
+ " AND p.pubname IN (", lrel->remoteid);

=>

I think a small improvement is possible in this SQL.

If we change that to "SELECT DISTINCT pg_get_expr(prqual, prrelid)"...
then it avoids the copy SQL from having multiple WHERE clauses which
are all identical. This could happen when subscribed to multiple
publications which had the same filter for the same table.

Good catch!

8. src/backend/replication/pgoutput/pgoutput.c - qual member is redundant

@@ -99,6 +108,9 @@ typedef struct RelationSyncEntry

bool replicate_valid;
PublicationActions pubactions;
+ List *qual; /* row filter */
+ List *exprstate; /* ExprState for row filter */
+ TupleTableSlot *scantuple; /* tuple table slot for row filter */

=>

Now that the exprstate is introduced I think that the other member
"qual" is redundant, so it can be removed.

I was thinking about it for the next patch. Removed.

9. src/backend/replication/pgoutput/pgoutput.c - comment typo?

typo: it/that ?

I think it ought to say "This is the same code as ExecPrepareExpr()
but that is not used because"...

Fixed.

10. src/backend/replication/pgoutput/pgoutput.c - redundant debug logging?

+ /* Evaluates row filter */
+ result = pgoutput_row_filter_exec_expr(exprstate, ecxt);
+
+ elog(DEBUG3, "row filter %smatched", result ? "" : "not ");

The above debug logging is really only a repeat (with different
wording) of the same information already being logged inside the
pgoutput_row_filter_exec_expr function isn't it? Consider removing the
redundant logging.

Agreed. Removed.

Euler Taveira

EDB https://www.enterprisedb.com/

Вложения

v19-0001-Row-filter-for-logical-replication.patch

Re: row filtering for logical replication

От

"Euler Taveira"

Дата:

13 июля 2021 г., 20:39:27

On Sun, Jul 11, 2021, at 8:09 PM, Tomas Vondra wrote:

I took a look at this patch, which seems to be in CF since 2018. I have
only some basic comments and observations at this point:

Tomas, thanks for reviewing this patch again.

1) alter_publication.sgml

I think "expression is executed" sounds a bit strange, perhaps
"evaluated" would be better?

Fixed.

2) create_publication.sgml

Why is the patch changing publish_via_partition_root docs? That seems
like a rather unrelated bit.

Removed. I will submit a separate patch for this.

    The <literal>WHERE</literal> clause should probably contain only
    columns that are part of the primary key or be covered by
    <literal>REPLICA ...

I'm not sure what exactly is this trying to say. What does "should
probably ..." mean in practice for the users? Does that mean something
bad will happen for other columns, or what? I'm sure this wording will
be quite confusing for users.

Reading again it seems "probably" is confusing. Let's remove it.

It may also be unclear whether the condition is evaluated on the old or
new row, so perhaps add an example illustrating that & more detailed
comment, or something. E.g. what will happen with

UPDATE departments SET active = false WHERE active;

Yeah. I avoided to mention this internal detail about old/new row but it seems

better to be clear. How about the following paragraph?

<para>

The <literal>WHERE</literal> clause should contain only columns that are

part of the primary key or be covered by <literal>REPLICA

IDENTITY</literal> otherwise, <command>DELETE</command> operations will not

be replicated. That's because old row is used and it only contains primary

key or columns that are part of the <literal>REPLICA IDENTITY</literal>; the

remaining columns are <literal>NULL</literal>. For <command>INSERT</command>

and <command>UPDATE</command> operations, any column might be used in the

<literal>WHERE</literal> clause. New row is used and it contains all

columns. A <literal>NULL</literal> value causes the expression to evaluate

to false; avoid using columns without not-null constraints in the

<literal>WHERE</literal> clause. The <literal>WHERE</literal> clause does

not allow functions and user-defined operators.

</para>

3) publication_add_relation

Does this need to build the parse state even for whereClause == NULL?

No. Fixed.

4) AlterPublicationTables

I wonder if this new reworked code might have issues with subscriptions
containing many tables, but I haven't tried.

This piece of code is already complicated. Amit complained about it too [1].

Are you envisioning any specific issue (other than open thousands of relations,

do some stuff, and close them all)? IMO the open/close relation should be

postponed for as long as possible.

5) OpenTableList

I really dislike that the list can have two different node types
(Relation and PublicationTable). In principle we don't actually need the
extra flag, we can simply check the node type directly by IsA() and act
based on that. However, I think it'd be better to just use a single node
type from all places.

Amit complained about having a runtime test for ALTER PUBLICATION ... DROP

TABLE in case user provides a WHERE clause [2]. I did that way (runtime test)

because it simplified the code. I would tend to avoid moving grammar task into

a runtime, that's why I agreed to change it. I didn't like the multi-node

argument handling for OpenTableList() (mainly because of the extra argument in

the function signature) but with your suggestion (IsA()) maybe it is

acceptable. What do you think? I included IsA() in v19.

I don't see why not to set whereClause every time, I don't think the
extra if saves anything, it's just a bit more complex.

See runtime test in [2].

5) CloseTableList

The comment about node types seems pointless, this function has no flag
and the element type does not matter.

Fixed.

6) parse_agg.c

... are not allowed in publication WHERE expressions

I think all similar cases use "WHERE conditions" instead.

No. Policy, index, statistics, partition, column generation use expressions.

COPY and trigger use conditions. It is also referred as expression in the

synopsis.

7) transformExprRecurse

The check at the beginning seems rather awkward / misplaced - it's way
too specific for this location (there are no other p_expr_kind
references in this function). Wouldn't transformFuncCall (or maybe
ParseFuncOrColumn) be a more appropriate place?

Probably. I have to try the multiple possibilities to make sure it forbids all

cases.

Initially I was wondering why not to allow function calls in WHERE
conditions, but I see that was discussed in the past as problematic. But
that reminds me that I don't see any docs describing what expressions
are allowed in WHERE conditions - maybe we should explicitly list what
expressions are allowed?

I started to investigate how to safely allow built-in functions. There is a

long discussion about using functions in a logical decoding context. As I said

during the last CF for v14, I prefer this to be a separate feature. I realized

that I mentioned that functions and user-defined operators are not allowed in

the commit message but forgot to mention it in the documentation.

8) pgoutput.c

I have not reviewed this in detail yet, but there seems to be something
wrong because `make check-world` fails in subscription/010_truncate.pl
after hitting an assert (backtrace attached) during "START_REPLICATION
SLOT" in get_rel_sync_entry in this code:

That's because I didn't copy the TupleDesc in CacheMemoryContext. Greg pointed

it too in a previous email [3]. The new patch (v19) includes a fix for it.

[1] https://www.postgresql.org/message-id/CA%2BHiwqG3Jz-cRS%3D4gqXmZDjDAi%3D%3D19GvrFCCqAawwHcOCEn4fQ%40mail.gmail.com

[2] https://www.postgresql.org/message-id/CAA4eK1Lu7oPHm2j%3DnLeqZLVoro76E0EWvH%2B5wmGG39iJNBzUog%40mail.gmail.com

[3] https://www.postgresql.org/message-id/CAJcOf-d70xg1O2jX1CrUeXaj%2BnMas3%2BNyJwYjbRsK6ZctH%2Bx5Q%40mail.gmail.com

Euler Taveira

EDB https://www.enterprisedb.com/

Re: row filtering for logical replication

От

"Euler Taveira"

Дата:

13 июля 2021 г., 21:02:43

On Tue, Jul 13, 2021, at 4:07 PM, Tomas Vondra wrote:

On 7/13/21 5:44 PM, Jeff Davis wrote:
> On Tue, 2021-07-13 at 10:24 +0530, Amit Kapila wrote:

>> (c) the columns
>> referred to in the filter should be part of PK or Replica Identity.
>
> Why?
>

I'm not sure either.

This patch uses the old row for DELETE operations and new row for INSERT and

UPDATE operations. Since we usually don't use REPLICA IDENTITY FULL, all

columns in an old row that are not part of the PK or REPLICA IDENTITY are NULL.

The row filter evaluates NULL to false. Documentation says