RE: row filtering for logical replication

Поиск
Список
Период
Сортировка
От houzj.fnst@fujitsu.com
Тема RE: row filtering for logical replication
Дата
Msg-id OS0PR01MB57169DC3E8D7B7E588CD64DC94229@OS0PR01MB5716.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: row filtering for logical replication  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: row filtering for logical replication  (Greg Nancarrow <gregn4422@gmail.com>)
Re: row filtering for logical replication  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Re: row filtering for logical replication  (Andres Freund <andres@anarazel.de>)
Re: row filtering for logical replication  (Peter Smith <smithpb2250@gmail.com>)
Список pgsql-hackers
On Wednesday, January 26, 2022 6:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Wed, Jan 26, 2022 at 8:37 AM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Monday, January 24, 2022 4:38 PM Peter Smith
> <smithpb2250@gmail.com> wrote:
> > >
> > >
> > > 3. src/backend/utils/cache/relcache.c - RelationBuildPublicationDesc
> > >
> > > +RelationBuildPublicationDesc(Relation relation)
> > >  {
> > >   List    *puboids;
> > >   ListCell   *lc;
> > >   MemoryContext oldcxt;
> > >   Oid schemaid;
> > > - PublicationActions *pubactions = palloc0(sizeof(PublicationActions));
> > > + List    *ancestors = NIL;
> > > + Oid relid = RelationGetRelid(relation); AttrNumber invalid_rfcolnum =
> > > + InvalidAttrNumber; PublicationDesc *pubdesc =
> > > + palloc0(sizeof(PublicationDesc)); PublicationActions *pubactions =
> > > + &pubdesc->pubactions;
> > > +
> > > + pubdesc->rf_valid_for_update = true;
> > > + pubdesc->rf_valid_for_delete = true;
> > >
> > > IMO it wold be better to change the "sense" of those variables.
> > > e.g.
> > >
> > > "rf_valid_for_update" --> "rf_invalid_for_update"
> > > "rf_valid_for_delete" --> "rf_invalid_for_delete"
> > >
> > > That way they have the same 'sense' as the AttrNumbers so it all reads better
> to
> > > me.
> > >
> > > Also, it means no special assignment is needed because the palloc0 will set
> > > them correctly
> >
> > Think again, I am not sure it's better to have an invalid_... flag.
> > It seems more natural to have a valid_... flag.
> >

Thanks for the comments !

> Can't we do without these valid_ flags? AFAICS, if we check for
> "invalid_" attributes, it should serve our purpose because those can
> have some attribute number only when the row filter contains some
> column that is not part of RI. A few possible optimizations in
> RelationBuildPublicationDesc:

I slightly refactored the logic here.

> a. It calls contain_invalid_rfcolumn with pubid and then does cache
> lookup to again find a publication which its only caller has access
> to, so can't we pass the same?

Adjusted the code here.

> b. In RelationBuildPublicationDesc(), we call
> GetRelationPublications() to get the list of publications and then
> process those publications. I think if none of the publications has
> row filter and the relation has replica identity then we don't need to
> build the descriptor at all. If we do this optimization inside
> RelationBuildPublicationDesc, we may want to rename function as
> CheckAndBuildRelationPublicationDesc or something like that?

After thinking more on this and considering Alvaro's comments. I did some
changes for the RelationBuildPublicationDesc function to try to
make it more natural.

- Make the function always collect the complete information instead of
  returning immediately when find invalid rowfilter.

  The reason for this change is: some extensions(3rd-part) might only care
  about the cached publication actions, this approach can make sure they can
  still get complete pulication actions as usual. Besides, this is also
  consistent with the other existing cache management functions(like
  RelationGetIndexAttrBitmap ...) which will always build complete information
  even if user only want part of it.

- Only cache the flag rf_valid_for_[update|delete] flag in PublicationDesc
  instead of the invalid rowfilter column.

  Because it's a bit unnatural to me to store an invalid thing in relcache. Note
  that now the patch doesn't report the column number in the error message. If we
  later decide that the accurate column number or publication is useful, I
  think it might be better to add a separate simple function(get_invalid_...)
  to report the accurate column or publication instead of reusing the cache
  management function.

Also address Peter's comments[1] and Greg's comments[2] [3]

[1] https://www.postgresql.org/message-id/CAHut%2BPsG1G80AoSYka7m1x05vHjKZAzKeVyK4b6CAm2-sTkadg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAJcOf-c7XrtsWSGppb96-eQxPbtg%2BAfssAtTXNYbT8QuhdyOYA%40mail.gmail.com
[3] https://www.postgresql.org/message-id/CAJcOf-f0kc%2B4xGEgkvqNLkbJxMf8Ff0E9gTO2biHDoSJnxyziA%40mail.gmail.com

Attach the V72 patch set which did the above changes.

Best regards,
Hou zj

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Julien Rouhaud
Дата:
Сообщение: Re: Is there a way (except from server logs) to know the kind of on-going/last checkpoint?
Следующее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Is there a way (except from server logs) to know the kind of on-going/last checkpoint?