Re: Column Filtering in Logical Replication

Поиск
Список
Период
Сортировка
От Rahila Syed
Тема Re: Column Filtering in Logical Replication
Дата
Msg-id CAH2L28stHzf6Yg3KxBEJ1JEr8C6xM4oSZfysdUObauZ00ArdWQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Column Filtering in Logical Replication  (Rahila Syed <rahilasyed90@gmail.com>)
Ответы Re: Column Filtering in Logical Replication
Список pgsql-hackers
Hi,

 
>  
> Currently, this capability is not included in the patch. If the table on
> the subscriber
> server has lesser attributes than that on the publisher server, it
> throws an error at the 
> time of CREATE SUBSCRIPTION.
>

That's a bit surprising, to be honest. I do understand the patch simply
treats the filtered columns as "unchanged" because that's the simplest
way to filter the *data* of the columns. But if someone told me we can
"filter columns" I'd expect this to work without the columns on the
subscriber.

OK, I will look into adding this. 

This has been added in the attached patch. Now, instead of 
treating the filtered columns as unchanged and sending a byte
with that information, unfiltered columns are not sent to the subscriber 
server at all. This along with saving the network bandwidth, allows 
the logical replication to even work between tables with different numbers of 
columns i.e with the table on subscriber server containing only the filtered 
columns. Currently, replica identity columns are replicated irrespective of 
the presence of the column filters, hence the table on the subscriber side  must 
contain the replica identity columns.

The patch adds a new parse node PublicationTable, but doesn't add
copyfuncs.c, equalfuncs.c, readfuncs.c, outfuncs.c support for it.

Thanks, added this. 
 
Looking at OpenTableList(), I think you forgot to update the comment --
it says "open relations specified by a RangeVar list", 

Thank you for the review, Modified this.

To nitpick, I find "Bitmapset *att_list" a bit annoying, because it's
not really a list ;-)

Changed this.

 It's not super clear to me that strlist_to_textarray() and related
processing will behave sanely when the column names contain weird
characters such as commas or quotes, or just when used with uppercase
column names.  Maybe it's worth having tests that try to break such
cases.
 
Added a few test cases for this. 

In AlterPublicationTables() I was confused by some code that seemed
commented a bit too verbosely 
 
Modified this as per the suggestion.

: REPLICA IDENTITY columns are always replicated
: irrespective of column names specification.

... for which you don't have any tests 

I have added these tests.

Having said that, I'm not sure I agree with this design decision; what I
think this is doing is hiding from the user the fact that they are
publishing columns that they don't want to publish.  I think as a user I
would rather get an error in that case:

  ERROR:  invalid column list in published set
  DETAIL:  The set of published commands does not include all the replica identity columns.

or something like that.  Avoid possible nasty surprises of security-
leaking nature.

Ok, Thank you for your opinion. I agree that giving an explicit error in this case will be safer.  
I will include this, in case there are no counter views.

Thank you for your review comments. Please find attached the rebased and updated patch. 
 

Thank you,
Rahila Syed 

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Another regexp performance improvement: skip useless paren-captures