RE: row filtering for logical replication

Поиск
Список
Период
Сортировка
От houzj.fnst@fujitsu.com
Тема RE: row filtering for logical replication
Дата
Msg-id OS0PR01MB5716C06C065651C2DA2C4D8994669@OS0PR01MB5716.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: row filtering for logical replication  (Peter Smith <smithpb2250@gmail.com>)
Ответы Re: row filtering for logical replication  (Peter Smith <smithpb2250@gmail.com>)
Re: row filtering for logical replication  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Sun, Nov 28, 2021 3:18 PM Peter Smith <smithpb2250@gmail.com> wrote:
> On Fri, Nov 26, 2021 at 1:16 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:
> >
> ...
> > Based on this direction, I tried to write a top up POC patch(0005) which I'd
> > like to share.
> >
> > The top up patch mainly did the following things.
> >
> > * Move the row filter columns invalidation to CheckCmdReplicaIdentity, so
> > that the invalidation is executed only when actual UPDATE or DELETE executed on
> > the published relation. It's consistent with the existing check about replica
> > identity.
> >
> > * Cache the results of the validation for row filter columns in relcache to
> > reduce the cost of the validation. It's safe because every operation that
> > change the row filter and replica identity will invalidate the relcache.
> >
> > Also attach the v42 patch set to keep cfbot happy.
> 
> Hi Hou-san.
> 
> Thanks for providing your "top-up" 0005 patch!
> 
> I suppose the goal will be to later merge this top-up with the current
> 0002 validation patch, but in the meantime here are my review comments
> for 0005.

Thanks for the review and many valuable comments !

> 8) src/backend/executor/execReplication.c - CheckCmdReplicaIdentity
> 
> But which are the bad filter columns?
> 
> Previously the Row Filter column validation gave errors for the
> invalid filter column, but in this top-up patch there is no indication
> which column or which filter or which publication was the bad one -
> only that "something" bad was detected. IMO this might make it very
> difficult for the user to know enough about the cause of the problem
> to be able to fix the offending filter.

If we want to report the invalid filter column, I can see two possibilities.

1) Instead of a bool flag, we cache a AttrNumber flag which indicates the
   invalid column number(0 means all valid). We can report it in the error
   message.

2) Everytime we decide to report an error, we traverse all the publications to
   find the invalid column again and report it.

What do you think ?

> 13). src/backend/utils/cache/relcache.c - function RelationGetPublicationInfo
> +/*
> + * Get publication information for the given relation.
> + */
> +struct PublicationInfo *
> +RelationGetPublicationInfo(Relation relation)
> +{
> + List    *puboids;
> + ListCell    *lc;
> + MemoryContext oldcxt;
> + Oid schemaid;
> + Bitmapset    *bms_replident = NULL;
> + PublicationInfo *pubinfo = palloc0(sizeof(PublicationInfo));
> +
> + pubinfo->rfcol_valid_for_replid = true;
> 
> It is not entirely clear to me why this function is always pallocing
> the PublicationInfo and then returning a copy of what is stored in the
> relation->rd_pubinfo. This then puts a burden on the callers (like the
> GetRelationPublicationActions etc) to make sure to free that memory.
> Why can't we just return the relation->rd_pubinfo directly And avoid
> all the extra palloc/memcpy/free?

Normally, I think only the cache management function should change the data in
relcache.  Return relation->xx directly might have a risk that user could
change the data in relcache. So, the management function usually return a copy
of cache data so that user is free to change it without affecting the real
cache data.

16) Tests... CREATE PUBLICATION succeeds

> I have not yet reviewed any of the 0005 tests, but there was some big
> behaviour difference that I noticed.
> 
> I think now with the 0005 top-up patch the replica identify validation
> is deferred to when UPDATE/DELETE is executed. I don’t know if this
> will be very user friendly. It means now sometimes you can
> successfully CREATE a PUBLICATION even though it will fail as soon as
> you try to use it.

I am not sure, the initial idea here is to make the check of replica identity
consistent.

Currently, if user create a publication which publish "update" but the relation
in the publication didn't mark as replica identity, then user can create the
publication successfully. but the later UPDATE will report an error.

Best regards,
Hou zj

В списке pgsql-hackers по дате отправления:

Предыдущее
От: vignesh C
Дата:
Сообщение: Re: enhance pg_log_backend_memory_contexts() to log memory contexts of auxiliary processes
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: [PATCH] ALTER tab completion