Re: row filtering for logical replication

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: row filtering for logical replication
Дата
Msg-id 20181214153836.GU3415@tamriel.snowman.net
обсуждение исходный текст
Ответ на Re: row filtering for logical replication  (Petr Jelinek <petr.jelinek@2ndquadrant.com>)
Ответы Re: row filtering for logical replication  (Petr Jelinek <petr.jelinek@2ndquadrant.com>)
Список pgsql-hackers
Greetings,

* Petr Jelinek (petr.jelinek@2ndquadrant.com) wrote:
> On 23/11/2018 03:02, Stephen Frost wrote:
> > * Euler Taveira (euler@timbira.com.br) wrote:
> >> 2018-02-28 21:54 GMT-03:00 Craig Ringer <craig@2ndquadrant.com>:
> >>> Good idea. I haven't read this yet, but one thing to make sure you've
> >>> handled is limiting the clause to referencing only the current tuple and the
> >>> catalogs. user-catalog tables are OK, too, anything that is
> >>> RelationIsAccessibleInLogicalDecoding().
> >>>
> >>> This means only immutable functions may be invoked, since a stable or
> >>> volatile function might attempt to access a table. And views must be
> >>> prohibited or recursively checked. (We have tree walkers that would help
> >>> with this).
> >>>
> >>> It might be worth looking at the current logic for CHECK expressions, since
> >>> the requirements are similar. In my opinion you could safely not bother with
> >>> allowing access to user catalog tables in the filter expressions and limit
> >>> them strictly to immutable functions and the tuple its self.
> >>
> >> IIRC implementation is similar to RLS expressions. I'll check all of
> >> these rules.
> >
> > Given the similarity to RLS and the nearby discussion about allowing
> > non-superusers to create subscriptions, and probably publications later,
> > I wonder if we shouldn't be somehow associating this with RLS policies
> > instead of having the publication filtering be entirely independent..
>
> I do see the appeal here, if you consider logical replication to be a
> streaming select it probably applies well.
>
> But given that this is happening inside output plugin which does not
> have full executor setup and has catalog-only snapshot I am not sure how
> feasible it is to try to merge these two things. As per my previous
> email it's possible that we'll have to be stricter about what we allow
> in expressions here.

I can certainly understand the concern about trying to combine the
implementation of this with that of RLS; perhaps that isn't a good fit
due to the additional constraints put on logical decoding.

That said, I still think it might make sense to consider these filters
for logical decoding to be policies and, ideally, to allow users to use
the same policy for both.

In the end, the idea of having to build a single large and complex
'create publication' command which has a bunch of tables, each with
their own filter clauses, just strikes me as pretty painful.

> The other issue with merging this is that the use-case for filtering out
> the data in logical replication is not necessarily about security, but
> often about sending only relevant data. So it makes sense to have filter
> on publication without RLS enabled on table and if we'd force that, we'd
> limit usefulness of this feature.

I definitely have a serious problem if we are going to say that you
can't use this filtering for security-sensitive cases.

> We definitely want to eventually create subscriptions as non-superuser
> but that has zero effect on this as everything here is happening on
> different server than where subscription lives (we already allow
> creation of publications with just CREATE privilege on database and
> ownership of the table).

What I wasn't clear about above was the idea that we might allow a user
other than the table owner to publish a given table, but that such a
publication should certanily only be allowed to include the rows which
that user has access to- as regulated by RLS.  If the RLS policy is too
complex to allow that then I would think we'd simply throw an error at
the create publication time and the would-be publisher would need to
figure that out with the table owner.

I'll admit that this might seem like a stretch, but what happens today?
Today, people write cronjobs to try to sync between tables with FDWs and
you don't need to own a table to use it as the target of a foreign
table.

I do think that we'll need to have some additional privileges around who
is allowed to create publications, I'm not entirely thrilled with that
being combined with the ability to create schemas; the two seem quite
different to me.

* Euler Taveira (euler@timbira.com.br) wrote:
> Em sex, 23 de nov de 2018 às 11:40, Petr Jelinek
> <petr.jelinek@2ndquadrant.com> escreveu:
> > But given that this is happening inside output plugin which does not
> > have full executor setup and has catalog-only snapshot I am not sure how
> > feasible it is to try to merge these two things. As per my previous
> > email it's possible that we'll have to be stricter about what we allow
> > in expressions here.
>
> This feature should be as simple as possible. I don't want to
> introduce a huge overhead just for filtering some data. Data sharding
> generally uses simple expressions.

RLS often uses simple filters too.

> > The other issue with merging this is that the use-case for filtering out
> > the data in logical replication is not necessarily about security, but
> > often about sending only relevant data. So it makes sense to have filter
> > on publication without RLS enabled on table and if we'd force that, we'd
> > limit usefulness of this feature.
>
> Use the same infrastructure as RLS could be a good idea but use RLS
> for row filtering is not. RLS is complex.

Right, this was along the lines I was thinking of- using the
infrastructure and the policy system, in particular.

Thanks!

Stephen

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: explain plans with information about (modified) gucs
Следующее
От: Tom Lane
Дата:
Сообщение: Re: alternative to PG_CATCH