Обсуждение: [RFC] Interface of Row Level Security

Поиск
Список
Период
Сортировка

[RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
Let me have a discussion to get preferable interface for row-level security.

My planned feature will perform to append additional conditions to WHERE
clause implicitly, to restrict tuples being visible for the current user.
For example, when row-level policy "uname = getpgusername()" is configured
on the table T1, the following query:   select * from T1 where X > 20;
should be rewritten to:   select * from T1 where (X > 20) AND (uname = getpgusername());
on somewhere in the query processing stage prior to optimizer.


I checked the way to set up row-level policy at Oracle. Its document seems to me
users specify a function for row-level policy. http://docs.oracle.com/cd/B28359_01/network.111/b28531/vpd.htm#i1008294

I had a short talk with Robert about this topic, and had an impression
the policy
should be given as a formula of where-clause instead of sql function, for query
optimization purpose.
However, I missed a simple sql function can be inlined with simplify_function().
So, unless the security policy isn't enough simple, it is harmless to
optimization.

Example)
 postgres=# CREATE TABLE t1 (x int, y int, uname text); CREATE TABLE postgres=# CREATE FUNCTION sel_pol_t1 (text)
RETURNSbool                    LANGUAGE sql AS 'SELECT $1 = getpgusername()'; CREATE FUNCTION postgres=# EXPLAIN SELECT
*FROM t1 WHERE (x > 20) AND sel_pol_t1(uname);                          QUERY PLAN
------------------------------------------------------------ Seq Scan on t1  (cost=0.00..33.20 rows=2 width=40)
Filter:((x > 20) AND (uname = (getpgusername())::text)) (2 rows)
 

A simple SQL function sel_pol_t1() is inlined to the where-clause,
thus if an index
would be configured to uname, index-scan should be an option.

So, I'd like to chose simpler implementation with the following interface.
 ALTER TABLE <tblname> ADD SECURITY POLICY func(<colname>,...)     [FOR SELECT|UPDATE|DELETE]; ALTER TABLE <tblname>
DROPSECURITY POLICY func(<colname>,...);     [FOR SELECT|UPDATE|DELETE]; ALTER TABLE <tblname> DROP SECURITY POLICY
ALL;

This interface allows to assign multiple functions on a particular table.
Then, these functions shall be assigned on where clause of the tables
to be scanned on. If available, optimizer will inline the functions for further
optimization.

Any comments please.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Tom Lane
Дата:
Kohei KaiGai <kaigai@kaigai.gr.jp> writes:
> Let me have a discussion to get preferable interface for row-level security.
> My planned feature will perform to append additional conditions to WHERE
> clause implicitly, to restrict tuples being visible for the current user.
> For example, when row-level policy "uname = getpgusername()" is configured
> on the table T1, the following query:
>     select * from T1 where X > 20;
> should be rewritten to:
>     select * from T1 where (X > 20) AND (uname = getpgusername());

Hm.  Simple and fairly noninvasive, but ... would this not be subject to
the same sorts of information-leak hazards that were addressed in the
"security views" feature?  That is, I see no guarantee that the RLS
condition will be evaluated before any conditions supplied by the user.
So it seems easy to get information out of rows the RLS policy is
supposed to prevent access to.  It would be far more secure to just
use a security view to apply the RLS condition.

Also, if the point here is to provide security for tables not views,
it seems like you really need to have (at least a design for) RLS
security on insert/update/delete operations.  Just adding the same
filter condition might be adequate for deletes, but I don't think it
works at all for inserts.  And for updates, what prevents the user from
updating columns he shouldn't, or updating them to key values he
shouldn't be able to use?
        regards, tom lane


Re: [RFC] Interface of Row Level Security

От
Alastair Turner
Дата:
On Wed, May 23, 2012 at 5:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Kohei KaiGai <kaigai@kaigai.gr.jp> writes:
>> Let me have a discussion to get preferable interface for row-level security.
>> My planned feature will perform to append additional conditions to WHERE
>> clause implicitly, to restrict tuples being visible for the current user.
>> For example, when row-level policy "uname = getpgusername()" is configured
>> on the table T1, the following query:
>>     select * from T1 where X > 20;
>> should be rewritten to:
>>     select * from T1 where (X > 20) AND (uname = getpgusername());
>
> Hm.  Simple and fairly noninvasive, but ... would this not be subject to
> the same sorts of information-leak hazards that were addressed in the
> "security views" feature?  That is, I see no guarantee that the RLS
> condition will be evaluated before any conditions supplied by the user.
> So it seems easy to get information out of rows the RLS policy is
> supposed to prevent access to.  It would be far more secure to just
> use a security view to apply the RLS condition.

Since adding a condition to the where clause is a relatively simple
operation (compared to the full potential scope of a view) could the
RLS rewrite of the query create a CTE with the additional condition[s]
rather than adding condition[s] to the user-supplied query? This would
provide the forced ordering of the evaluating the conditions, thereby
avoiding many of the potential points of leakage.

Bell.


Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/5/23 Tom Lane <tgl@sss.pgh.pa.us>:
> Kohei KaiGai <kaigai@kaigai.gr.jp> writes:
>> Let me have a discussion to get preferable interface for row-level security.
>> My planned feature will perform to append additional conditions to WHERE
>> clause implicitly, to restrict tuples being visible for the current user.
>> For example, when row-level policy "uname = getpgusername()" is configured
>> on the table T1, the following query:
>>     select * from T1 where X > 20;
>> should be rewritten to:
>>     select * from T1 where (X > 20) AND (uname = getpgusername());
>
> Hm.  Simple and fairly noninvasive, but ... would this not be subject to
> the same sorts of information-leak hazards that were addressed in the
> "security views" feature?  That is, I see no guarantee that the RLS
> condition will be evaluated before any conditions supplied by the user.
> So it seems easy to get information out of rows the RLS policy is
> supposed to prevent access to.  It would be far more secure to just
> use a security view to apply the RLS condition.
>
I wanted to have discussion to handle this problem.

Unlike leaky-view problem, we don't need to worry about unexpected
qualifier distribution on either side of join, because a scan on table
never contains any join. Thus, all we need to care about is order of
qualifiers chained on a particular scan node; being reordered by
the cost to invoke functions.

How about an idea to track FuncExpr come from the security policy
and enforce "0" on its cost? Regular functions never reach zero
cost, so the security policy must be re-ordered to the head.

> Also, if the point here is to provide security for tables not views,
> it seems like you really need to have (at least a design for) RLS
> security on insert/update/delete operations.  Just adding the same
> filter condition might be adequate for deletes, but I don't think it
> works at all for inserts.  And for updates, what prevents the user from
> updating columns he shouldn't, or updating them to key values he
> shouldn't be able to use?
>
If we also apply the security policy to newer version of tuples on
update and insert, one idea is to inject a before-row-(update|insert)
trigger to check whether it satisfies the security policy.
For same reason, the trigger should be executed at the end of
trigger chain.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/5/23 Alastair Turner <bell@ctrlf5.co.za>:
> On Wed, May 23, 2012 at 5:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Kohei KaiGai <kaigai@kaigai.gr.jp> writes:
>>> Let me have a discussion to get preferable interface for row-level security.
>>> My planned feature will perform to append additional conditions to WHERE
>>> clause implicitly, to restrict tuples being visible for the current user.
>>> For example, when row-level policy "uname = getpgusername()" is configured
>>> on the table T1, the following query:
>>>     select * from T1 where X > 20;
>>> should be rewritten to:
>>>     select * from T1 where (X > 20) AND (uname = getpgusername());
>>
>> Hm.  Simple and fairly noninvasive, but ... would this not be subject to
>> the same sorts of information-leak hazards that were addressed in the
>> "security views" feature?  That is, I see no guarantee that the RLS
>> condition will be evaluated before any conditions supplied by the user.
>> So it seems easy to get information out of rows the RLS policy is
>> supposed to prevent access to.  It would be far more secure to just
>> use a security view to apply the RLS condition.
>
> Since adding a condition to the where clause is a relatively simple
> operation (compared to the full potential scope of a view) could the
> RLS rewrite of the query create a CTE with the additional condition[s]
> rather than adding condition[s] to the user-supplied query? This would
> provide the forced ordering of the evaluating the conditions, thereby
> avoiding many of the potential points of leakage.
>
An interesting idea. However, I cannot imagine how does it works on
update or delete statement.

For select statement, it will get better performance to rewrite reference
to a particular table by a subquery with security_barrier flag than CTE,
because it allows to push down leakproof functions.

Could you tell me your idea for more details?
An example will help me understand well.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Robert Haas
Дата:
On Wed, May 23, 2012 at 3:45 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
> I wanted to have discussion to handle this problem.
>
> Unlike leaky-view problem, we don't need to worry about unexpected
> qualifier distribution on either side of join, because a scan on table
> never contains any join. Thus, all we need to care about is order of
> qualifiers chained on a particular scan node; being reordered by
> the cost to invoke functions.
>
> How about an idea to track FuncExpr come from the security policy
> and enforce "0" on its cost? Regular functions never reach zero
> cost, so the security policy must be re-ordered to the head.

Hmm.  That would disregard the relative costs of multiple qualifiers
all of which were injected by the security policy, which I suspect is
not a good idea.  Furthermore, I think that we should not assume that
there is no join involved.  I would expect a fairly popular RLS qual
to be something of the form "WHERE NOT EXISTS (SELECT 1 FROM hide_me
WHERE hide_me.pk = thistab.pk)".  Perhaps when we see that RLS
applies, we should replace the reference to the original table with a
subquery RTE that has the security_barrier flag set - essentially
treating a table with RLS as if it were a security view.

Also, suppose that Bob applies an RLS policy to a table, and, later,
Alice selects from the table.  How do we keep Bob from usurping
Alice's privileges?  If we insist that Bob's RLS policy function runs
as Bob, then it defeats inlining; but if it runs as Alice, then Bob
can steal Alice's credentials.  One idea is to apply the security
policy only if Alice's access to the table is granted by Bob.  That
way, if Alice is (for example) the superuser, she's immune to RLS.
But that doesn't seem to completely solve the problem, because Alice
might merely be some other relatively unprivileged user and we still
don't want Bob to be able to walk off with her access.

Another idea is to set things up so that the RLS policy function isn't
applied to each row directly; instead, it's invoked once per query and
*returns* a WHERE clause.  This would be a lot more powerful than the
proposed design, because now the table owner can write a function that
imposes quals on some people but not others, which seems very useful.

>> Also, if the point here is to provide security for tables not views,
>> it seems like you really need to have (at least a design for) RLS
>> security on insert/update/delete operations.  Just adding the same
>> filter condition might be adequate for deletes, but I don't think it
>> works at all for inserts.  And for updates, what prevents the user from
>> updating columns he shouldn't, or updating them to key values he
>> shouldn't be able to use?
>>
> If we also apply the security policy to newer version of tuples on
> update and insert, one idea is to inject a before-row-(update|insert)
> trigger to check whether it satisfies the security policy.
> For same reason, the trigger should be executed at the end of
> trigger chain.

It's not clear to me that there is any need for built-in server
functionality here.  If the table owner wants to enforce some sort of
policy regarding INSERT or UPDATE or DELETE, they can already do that
today just by attaching a trigger to the table.  And they can enforce
whatever policy they like that way.  Before designing any new
mechanism, what's wrong with the existing one?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/5/23 Robert Haas <robertmhaas@gmail.com>:
> On Wed, May 23, 2012 at 3:45 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>> I wanted to have discussion to handle this problem.
>>
>> Unlike leaky-view problem, we don't need to worry about unexpected
>> qualifier distribution on either side of join, because a scan on table
>> never contains any join. Thus, all we need to care about is order of
>> qualifiers chained on a particular scan node; being reordered by
>> the cost to invoke functions.
>>
>> How about an idea to track FuncExpr come from the security policy
>> and enforce "0" on its cost? Regular functions never reach zero
>> cost, so the security policy must be re-ordered to the head.
>
> Hmm.  That would disregard the relative costs of multiple qualifiers
> all of which were injected by the security policy, which I suspect is
> not a good idea.
>  Furthermore, I think that we should not assume that
> there is no join involved.  I would expect a fairly popular RLS qual
> to be something of the form "WHERE NOT EXISTS (SELECT 1 FROM hide_me
> WHERE hide_me.pk = thistab.pk)".
>
Please ignore the approach to track cost value of qualifiers.
I believe it does not work well without something fundamental updates.

> Perhaps when we see that RLS
> applies, we should replace the reference to the original table with a
> subquery RTE that has the security_barrier flag set - essentially
> treating a table with RLS as if it were a security view.
>
I become to think it is a better approach than tracking origin of each
qualifiers. One problem is case handling on update or delete statement.

It may be possible to rewrite the update / delete query as follows:

From: UPDATE tbl SET X = X + 1 WHERE f_leak(Y)
To: UPDATE tbl SET X = X + 1 WHERE ctid = (     SELECT * FROM (         SELECT ctid FROM tbl WHERE uname =
getpgusername() <== (*) 
should have security-barrier     ) AS tbl_subqry WHERE f_leak(Y) );

Expanded sub-queries will have security-barrier flag, so it enforces
the "uname = getpgusername()" being checked earlier than f_leak(Y).
We may need to measure the performance impact due to the reform.

> Also, suppose that Bob applies an RLS policy to a table, and, later,
> Alice selects from the table.  How do we keep Bob from usurping
> Alice's privileges?  If we insist that Bob's RLS policy function runs
> as Bob, then it defeats inlining; but if it runs as Alice, then Bob
> can steal Alice's credentials.  One idea is to apply the security
> policy only if Alice's access to the table is granted by Bob.  That
> way, if Alice is (for example) the superuser, she's immune to RLS.
> But that doesn't seem to completely solve the problem, because Alice
> might merely be some other relatively unprivileged user and we still
> don't want Bob to be able to walk off with her access.
>
I think, this situation is similar to a case when we reference a view
without privileges to underlying tables. If Bob set up a view with
something "tricky" function, it allows Bob to reference credentials
of users who reference the view.
More or less, it might be a problem when a user try to invoke
a user defined function declared by others.
(Thus, sepgsql policy does not allow users to invoke a function
declared by another one in different domain; without DBA's checks.)

I think it is a good idea not to apply RLS when current user has
superuser privilege from perspective of security model consistency,
but it is inconsistent to check privileges underlying tables.

> Another idea is to set things up so that the RLS policy function isn't
> applied to each row directly; instead, it's invoked once per query and
> *returns* a WHERE clause.  This would be a lot more powerful than the
> proposed design, because now the table owner can write a function that
> imposes quals on some people but not others, which seems very useful.
>
Sorry, I don't favor this idea. Even if table owner set up a function to
generate additional qualifiers, it also has no guarantee the qualifiers
are invoked prior to user-given one.
It seems to me this approach will have same problem...

>>> Also, if the point here is to provide security for tables not views,
>>> it seems like you really need to have (at least a design for) RLS
>>> security on insert/update/delete operations.  Just adding the same
>>> filter condition might be adequate for deletes, but I don't think it
>>> works at all for inserts.  And for updates, what prevents the user from
>>> updating columns he shouldn't, or updating them to key values he
>>> shouldn't be able to use?
>>>
>> If we also apply the security policy to newer version of tuples on
>> update and insert, one idea is to inject a before-row-(update|insert)
>> trigger to check whether it satisfies the security policy.
>> For same reason, the trigger should be executed at the end of
>> trigger chain.
>
> It's not clear to me that there is any need for built-in server
> functionality here.  If the table owner wants to enforce some sort of
> policy regarding INSERT or UPDATE or DELETE, they can already do that
> today just by attaching a trigger to the table.  And they can enforce
> whatever policy they like that way.  Before designing any new
> mechanism, what's wrong with the existing one?
>
Yes, we don't need any new invent to check the value of new tuples.
But it should be done after all the user-defined triggers. Existing
trigger does not have a mechanism to enforce order to be invoked.
So, what I really implement is a mechanism to inject some pseudo
triggers "at tail of the Trigger array".

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Florian Pflug
Дата:
On May23, 2012, at 22:12 , Robert Haas wrote:
> Also, suppose that Bob applies an RLS policy to a table, and, later,
> Alice selects from the table.  How do we keep Bob from usurping
> Alice's privileges?

That problem isn't restricted to RLW, though. Bob could just as well
have booby-trapped any other SQL object, e.g. could have added
bobs_malicious_function() to a view Alice selects from, or attached
it as a trigger to a table Alice inserts to.

It would be nice if there was (optional) protection against these
kinds of attacks, but it's not really something that RLS should be
burdened with.

BTW, I've wondered in the past how tight our protection against some
trying to take over the postgres role during pg_dump is. On the surface,
it seems that we're safe because pg_dump doesn't invoke user-defined
functions except output functions (which require superuser privileges
to modify). But that's not exactly a solid line of defense...

>> If we also apply the security policy to newer version of tuples on
>> update and insert, one idea is to inject a before-row-(update|insert)
>> trigger to check whether it satisfies the security policy.
>> For same reason, the trigger should be executed at the end of
>> trigger chain.
> 
> It's not clear to me that there is any need for built-in server
> functionality here.  If the table owner wants to enforce some sort of
> policy regarding INSERT or UPDATE or DELETE, they can already do that
> today just by attaching a trigger to the table.  And they can enforce
> whatever policy they like that way.  Before designing any new
> mechanism, what's wrong with the existing one?

Yeah, applying the security policy to the new row (for UPDATES
and INSERTS) seems weird - the policy determines what you can see,
not what you can store, which might be two different things.

But the security policy should still apply to the old rows, i.e.
you shouldn't be after to UPDATE or DELETE rows you cannot see, no?

best regards,
Florian Pflug


Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/5/24 Florian Pflug <fgp@phlo.org>:
>>> If we also apply the security policy to newer version of tuples on
>>> update and insert, one idea is to inject a before-row-(update|insert)
>>> trigger to check whether it satisfies the security policy.
>>> For same reason, the trigger should be executed at the end of
>>> trigger chain.
>>
>> It's not clear to me that there is any need for built-in server
>> functionality here.  If the table owner wants to enforce some sort of
>> policy regarding INSERT or UPDATE or DELETE, they can already do that
>> today just by attaching a trigger to the table.  And they can enforce
>> whatever policy they like that way.  Before designing any new
>> mechanism, what's wrong with the existing one?
>
> Yeah, applying the security policy to the new row (for UPDATES
> and INSERTS) seems weird - the policy determines what you can see,
> not what you can store, which might be two different things.
>
> But the security policy should still apply to the old rows, i.e.
> you shouldn't be after to UPDATE or DELETE rows you cannot see, no?
>
The case of INSERT / DELETE are simple; All we need to apply is
checks on either new or old tuples.

In case of UPDATE, we need to check on the old tuple whether use can
see, and on the new tuple whether use can store them.
Indeed, these are different checks, however, it seems like a black hole
if the new tuple is allowed to write but no reader privileges.
I expect most use cases choose same policy on reader timing and
writer times at UPDATE statement.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Florian Pflug
Дата:
On May24, 2012, at 12:43 , Kohei KaiGai wrote:
> The case of INSERT / DELETE are simple; All we need to apply is
> checks on either new or old tuples.
>
> In case of UPDATE, we need to check on the old tuple whether use can
> see, and on the new tuple whether use can store them.
> Indeed, these are different checks, however, it seems like a black hole
> if the new tuple is allowed to write but no reader privileges.
> I expect most use cases choose same policy on reader timing and
> writer times at UPDATE statement.

I don't think preventing block holes is sensible here - it might,
in fact, be *just* what the user wants.

Imagine a messaging system. A reasonable RLS policy would be to allow
a user to see messages addressed to him. Yet you wouldn't want to prevent
her from creating messages to other people - cause what good is a messaging
system that only allows you to send messages to yourself. What you
probably *would* want to do, though, is to check that she did put herself in
as the sender when she creates a message. And you'd probably wanna forbit
updates entirely. So you'd have
 - A RLS policy that checks current_user = ANY(recipients) - An ON INSERT trigger which checks current_user = sender -
AnON UPDATE trigger which errors out 

If RLS policy applies to INSERTEed rows also, how would you do that?

Another example, although in the realm of filesystem permissions, is Mac OS X.
Per default, every user has a "Drop Box" folder, which anybody can write to, yet
only the owner can read. This allows you to easily transfer files from one
user to another without allowing a third party to read it.

best regards,
Florian Pflug



Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/5/24 Florian Pflug <fgp@phlo.org>:
> On May24, 2012, at 12:43 , Kohei KaiGai wrote:
>> The case of INSERT / DELETE are simple; All we need to apply is
>> checks on either new or old tuples.
>>
>> In case of UPDATE, we need to check on the old tuple whether use can
>> see, and on the new tuple whether use can store them.
>> Indeed, these are different checks, however, it seems like a black hole
>> if the new tuple is allowed to write but no reader privileges.
>> I expect most use cases choose same policy on reader timing and
>> writer times at UPDATE statement.
>
> I don't think preventing block holes is sensible here - it might,
> in fact, be *just* what the user wants.
>
> Imagine a messaging system. A reasonable RLS policy would be to allow
> a user to see messages addressed to him. Yet you wouldn't want to prevent
> her from creating messages to other people - cause what good is a messaging
> system that only allows you to send messages to yourself. What you
> probably *would* want to do, though, is to check that she did put herself in
> as the sender when she creates a message. And you'd probably wanna forbit
> updates entirely. So you'd have
>
>  - A RLS policy that checks current_user = ANY(recipients)
>  - An ON INSERT trigger which checks current_user = sender
>  - An ON UPDATE trigger which errors out
>
> If RLS policy applies to INSERTEed rows also, how would you do that?
>
> Another example, although in the realm of filesystem permissions, is Mac OS X.
> Per default, every user has a "Drop Box" folder, which anybody can write to, yet
> only the owner can read. This allows you to easily transfer files from one
> user to another without allowing a third party to read it.
>
Indeed, you are right. We have no special reason why to enforce same rules
on both of reader and writer stage on UPDATE statement.

So, the proposed interface might be revised as follows: ALTER TABLE <tblname> ADD SECURITY POLICY
<func_name>(<args>,...) [FOR SELECT |                                                        INSERT |
                                    [BEFORE|AFTER] UPDATE |
DELETE];

In case of INSERT or AFTER UPDATE, I assume the check shall be applied
on the tail of before-row triggers.

(*) I don't check whether it conflicts syntax or not yet.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Florian Pflug
Дата:
On May24, 2012, at 16:19 , Kohei KaiGai wrote:
> So, the proposed interface might be revised as follows:
>  ALTER TABLE <tblname> ADD SECURITY POLICY
>      <func_name>(<args>, ...) [FOR SELECT |
>                                                         INSERT |
>                                                         [BEFORE|AFTER] UPDATE |
>                                                         DELETE];
>
> In case of INSERT or AFTER UPDATE, I assume the check shall be applied
> on the tail of before-row triggers.

I'd go with just "SELECT, UPDATE, DELETE", and leave the INSERT and BEFORE
UPDATE case to regular triggers, for two reasons

First, it's conceptually much simpler, since the policy always just adds
an implicit WHERE clause, period. This of course assumes that DELETE and
(BEFORE) UPDATE simply skips rows for which the policy function returns false,
instead of reporting 'permission denied' or something. But that's the most
reasonable behaviour anyway, I think, because otherwise you'd make batch
UPDATEs and DELETEs pretty much unusable, 'cause there'd always be the risk
of tripping over some invisible row and getting and error.

And second, it avoids mimicking functionality that is already provided
by an existing feature, namely triggers.

People will have to deal with the trigger ordering issue, but that's nothing
new, and I bet most people have a system in place for that. I usually prefix
my trigger names with 'a_' to 'z_', for example, to make the ordering explicit.

Another issue, BTW, are FOREIGN KEY constraints. Should you be allowed to
created references to rows which are invisible to you, or should FOREIGN KEY
constraints be exempt from security policies? I'd say they shouldn't be, i.e.
the policy WHERE clause should be added to constraint checking queries like
usual. But maybe I'm missing some reason why that'd be undesirable…

best regards,
Florian Pflug



Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/5/24 Florian Pflug <fgp@phlo.org>:
> On May24, 2012, at 16:19 , Kohei KaiGai wrote:
>> So, the proposed interface might be revised as follows:
>>  ALTER TABLE <tblname> ADD SECURITY POLICY
>>      <func_name>(<args>, ...) [FOR SELECT |
>>                                                         INSERT |
>>                                                         [BEFORE|AFTER] UPDATE |
>>                                                         DELETE];
>>
>> In case of INSERT or AFTER UPDATE, I assume the check shall be applied
>> on the tail of before-row triggers.
>
> I'd go with just "SELECT, UPDATE, DELETE", and leave the INSERT and BEFORE
> UPDATE case to regular triggers, for two reasons
>
> First, it's conceptually much simpler, since the policy always just adds
> an implicit WHERE clause, period. This of course assumes that DELETE and
> (BEFORE) UPDATE simply skips rows for which the policy function returns false,
> instead of reporting 'permission denied' or something. But that's the most
> reasonable behaviour anyway, I think, because otherwise you'd make batch
> UPDATEs and DELETEs pretty much unusable, 'cause there'd always be the risk
> of tripping over some invisible row and getting and error.
>
I definitely agree with starting a new feature from simple implementation.

Although I'm inclined to the approach to replace references to tables with
security policy by sub-queries with security barrier flag, instead of adding
qualifiers of where clause to avoid the leaky-view scenario, it will make its
implementation mush simpler.

> Another issue, BTW, are FOREIGN KEY constraints. Should you be allowed to
> created references to rows which are invisible to you, or should FOREIGN KEY
> constraints be exempt from security policies? I'd say they shouldn't be, i.e.
> the policy WHERE clause should be added to constraint checking queries like
> usual. But maybe I'm missing some reason why that'd be undesirable…
>
I agree. The row level security policy should not be applied during FK checks
(or other internal stuff; to be harmless). At the previous discussion, it was
issued that iteration of FK/PK proving enables malicious one to estimate
existence of invisible tuple and its key value, although they cannot see the
actual values. It is well documented limitation, thus, user should not use row-
level security (or should not use natural key) if they cannot accept
this limitation.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
I'd like to summarize the current design being discussed.

syntax: ALTER TABLE <tblname> WITH ROW LEVEL SECURITY     ( <condition clause> ) [FOR (SELECT | UPDATE | DELETE)];
ALTERTABLE <tblname> WITHOUT ROW LEVEL SECURITY; 

I tried to patch the parser/gram.y, but here was syntax conflicts
on ADD / DROP sub-command.
And, I noticed "ROW LEVEL SECURITY" allows to implement
without adding new keyword, unlike "SECURITY POLICY".

As we discussed, it causes a problem with approach to append
additional qualifiers to where clause implicitly, because it does
not solve the matter corresponding to the order to execute
qualifiers. So, I'm inclined to the approach to replace reference
to tables with security policy by sub-queries with security barrier
flag.
For example, if "tbl" has security policy, this query shall be rewritten
internally, as follows:

original) SELECT * FROM tbl WHERE X > 20 AND f_leak(Y);

rewritten) SELECT * FROM (   SELECT * FROM tbl WHERE uname = getpgusername() ) AS tbl_subqry WHERE X > 20 AND
f_leak(Y);

The sub-query shall have security-barrier flag, so f_leak() is never
pushed down but X > 20 will be pushed down because of leakproof
attribute of the function.

It is a bit complex at UPDATE or DELETE statement, but
what I try to do is same.
 original)   UPDATE tbl SET X = X + 1 WHERE X > 20 AND f_leak(Y);
 rewritten)   UPDATE tbl SET X = X + 1 WHERE ctid = (       SELECT ctid FROM (           SELECT ctid, * FROM uname =
getpgusername()      ) AS tbl_subqry WHERE X > 20 AND f_leak(Y)   ); 

That guarantees the security policy ("uname = getpgusername()")
is evaluated prior to user given conditions.

One thing still I'm thinking is whether the security policy should
be provided as a function or a clause. Enough simple sql function
is inlined at simplify_function(), so here is no meaningful difference.
I was afraid of code complexity, but all we should do is to append
configured clause on the where clause of sub-query inside.

|  ALTER TABLE <tblname> WITH ROW LEVEL SECURITY
|      ( <condition clause> ) [FOR (SELECT | UPDATE | DELETE)];

So, I tried to put "condition clause" instead of a function, right now.


Regarding to FK constraints, I don't think it is a situation to
apply row-level security policy towards internal queries.
So, I plan to disable during FK checks.


One other issue we didn't have discussed is table inheritance.
In case when a table TBLP has a child table TBLC and only
TBLC has its security policy, what security policy should be
applied when we run "SELECT * FROM TBLP".
My preference is, the security policy is only applied to scan on
TBLC, not TBLP. It is not desirable behavior that visible tuples
are different from way to reference a certain table.
In addition, if and when TBLP and TBLC have their own policy
individually, what is a desirable behavior?
I think, the security policy of both TBLP and TBLC should be
applied on TBLC; in other words, it applies the security policy
of all the parent tables to scan on child table.

Any comments please. Thanks,

2012/5/24 Kohei KaiGai <kaigai@kaigai.gr.jp>:
> 2012/5/24 Florian Pflug <fgp@phlo.org>:
>> On May24, 2012, at 16:19 , Kohei KaiGai wrote:
>>> So, the proposed interface might be revised as follows:
>>>  ALTER TABLE <tblname> ADD SECURITY POLICY
>>>      <func_name>(<args>, ...) [FOR SELECT |
>>>                                                         INSERT |
>>>                                                         [BEFORE|AFTER] UPDATE |
>>>                                                         DELETE];
>>>
>>> In case of INSERT or AFTER UPDATE, I assume the check shall be applied
>>> on the tail of before-row triggers.
>>
>> I'd go with just "SELECT, UPDATE, DELETE", and leave the INSERT and BEFORE
>> UPDATE case to regular triggers, for two reasons
>>
>> First, it's conceptually much simpler, since the policy always just adds
>> an implicit WHERE clause, period. This of course assumes that DELETE and
>> (BEFORE) UPDATE simply skips rows for which the policy function returns false,
>> instead of reporting 'permission denied' or something. But that's the most
>> reasonable behaviour anyway, I think, because otherwise you'd make batch
>> UPDATEs and DELETEs pretty much unusable, 'cause there'd always be the risk
>> of tripping over some invisible row and getting and error.
>>
> I definitely agree with starting a new feature from simple implementation.
>
> Although I'm inclined to the approach to replace references to tables with
> security policy by sub-queries with security barrier flag, instead of adding
> qualifiers of where clause to avoid the leaky-view scenario, it will make its
> implementation mush simpler.
>
>> Another issue, BTW, are FOREIGN KEY constraints. Should you be allowed to
>> created references to rows which are invisible to you, or should FOREIGN KEY
>> constraints be exempt from security policies? I'd say they shouldn't be, i.e.
>> the policy WHERE clause should be added to constraint checking queries like
>> usual. But maybe I'm missing some reason why that'd be undesirable…
>>
> I agree. The row level security policy should not be applied during FK checks
> (or other internal stuff; to be harmless). At the previous discussion, it was
> issued that iteration of FK/PK proving enables malicious one to estimate
> existence of invisible tuple and its key value, although they cannot see the
> actual values. It is well documented limitation, thus, user should not use row-
> level security (or should not use natural key) if they cannot accept
> this limitation.
>
> Thanks,
> --
> KaiGai Kohei <kaigai@kaigai.gr.jp>



--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Robert Haas
Дата:
On Thu, May 24, 2012 at 6:11 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>> Perhaps when we see that RLS
>> applies, we should replace the reference to the original table with a
>> subquery RTE that has the security_barrier flag set - essentially
>> treating a table with RLS as if it were a security view.
>>
> I become to think it is a better approach than tracking origin of each
> qualifiers. One problem is case handling on update or delete statement.
>
> It may be possible to rewrite the update / delete query as follows:
>
> From:
>  UPDATE tbl SET X = X + 1 WHERE f_leak(Y)
> To:
>  UPDATE tbl SET X = X + 1 WHERE ctid = (
>      SELECT * FROM (
>          SELECT ctid FROM tbl WHERE uname = getpgusername()  <== (*)
> should have security-barrier
>      ) AS tbl_subqry WHERE f_leak(Y)
>  );
>
> Expanded sub-queries will have security-barrier flag, so it enforces
> the "uname = getpgusername()" being checked earlier than f_leak(Y).
> We may need to measure the performance impact due to the reform.

The problem with this is that it introduces an extra instance of tbl
into the query - there are now two rather than one.  UPDATE .. FROM is
supposed to be a way to avoid this, but it's insufficiently general to
handle all the cases (e.g. UPDATE a LEFT JOIN b can't be written using
the existing syntax).  Anyway we want to avoid inserting self-joins
for performance reasons if at all possible.  It should be easy to do
that in the case of SELECT; UPDATE and DELETE may need a bit more
work.

> I think, this situation is similar to a case when we reference a view
> without privileges to underlying tables. If Bob set up a view with
> something "tricky" function, it allows Bob to reference credentials
> of users who reference the view.
> More or less, it might be a problem when a user try to invoke
> a user defined function declared by others.
> (Thus, sepgsql policy does not allow users to invoke a function
> declared by another one in different domain; without DBA's checks.)

This is true, but there are still some new threat models.  For
example, currently, pg_dump isn't going to run any user-defined code
just because you do SELECT * FROM table, but that will change with
this patch.  Note that pg_dump need not actually select from views,
only tables.

> I think it is a good idea not to apply RLS when current user has
> superuser privilege from perspective of security model consistency,
> but it is inconsistent to check privileges underlying tables.

Seems like a somewhat random wart, if it's just an exception for
superusers.  I think we need to do better than that.  For example, at
my last company, sales reps A and B were permitted to see all
customers of the company, but sales reps C, D, E, F, G, H, I, and J
were permitted to see only their own accounts.  Those sorts of
policies need to be easy to implement.

>> Another idea is to set things up so that the RLS policy function isn't
>> applied to each row directly; instead, it's invoked once per query and
>> *returns* a WHERE clause.  This would be a lot more powerful than the
>> proposed design, because now the table owner can write a function that
>> imposes quals on some people but not others, which seems very useful.
>>
> Sorry, I don't favor this idea. Even if table owner set up a function to
> generate additional qualifiers, it also has no guarantee the qualifiers
> are invoked prior to user-given one.
> It seems to me this approach will have same problem...

It's not intended to solve the qual-ordering problem, just to allow
additional policy flexibility.

>> It's not clear to me that there is any need for built-in server
>> functionality here.  If the table owner wants to enforce some sort of
>> policy regarding INSERT or UPDATE or DELETE, they can already do that
>> today just by attaching a trigger to the table.  And they can enforce
>> whatever policy they like that way.  Before designing any new
>> mechanism, what's wrong with the existing one?
>>
> Yes, we don't need any new invent to check the value of new tuples.
> But it should be done after all the user-defined triggers. Existing
> trigger does not have a mechanism to enforce order to be invoked.
> So, what I really implement is a mechanism to inject some pseudo
> triggers "at tail of the Trigger array".

Start the trigger names with the letter "z".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [RFC] Interface of Row Level Security

От
Robert Haas
Дата:
On Thu, May 24, 2012 at 6:20 AM, Florian Pflug <fgp@phlo.org> wrote:
> But the security policy should still apply to the old rows, i.e.
> you shouldn't be after to UPDATE or DELETE rows you cannot see, no?

Agreed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [RFC] Interface of Row Level Security

От
Robert Haas
Дата:
On Thu, May 24, 2012 at 12:00 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>> Another issue, BTW, are FOREIGN KEY constraints. Should you be allowed to
>> created references to rows which are invisible to you, or should FOREIGN KEY
>> constraints be exempt from security policies? I'd say they shouldn't be, i.e.
>> the policy WHERE clause should be added to constraint checking queries like
>> usual. But maybe I'm missing some reason why that'd be undesirable…
>>
> I agree. The row level security policy should not be applied during FK checks
> (or other internal stuff; to be harmless). At the previous discussion, it was
> issued that iteration of FK/PK proving enables malicious one to estimate
> existence of invisible tuple and its key value, although they cannot see the
> actual values. It is well documented limitation, thus, user should not use row-
> level security (or should not use natural key) if they cannot accept
> this limitation.

You say "I agree", but it seems to me that you and Florian are in fact
taking opposite positions.

FWIW, I'm inclined to think that you should NOT be able to create a
row that references an invisible row.  You might end up with that
situation anyway, because we don't know what the semantics of the
security policy are: rows might become visible or invisible after the
fact, and we can't police that.  But I think that if you take the
opposite position that the select queries inside fkey triggers ought
to be exempt from security policy, then you need to build some new
mechanism to make that happen, which seems like extra work for no
benefit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [RFC] Interface of Row Level Security

От
Florian Pflug
Дата:
On May24, 2012, at 18:42 , Kohei KaiGai wrote:
> I'd like to summarize the current design being discussed.
>
> syntax:
>  ALTER TABLE <tblname> WITH ROW LEVEL SECURITY
>      ( <condition clause> ) [FOR (SELECT | UPDATE | DELETE)];
>  ALTER TABLE <tblname> WITHOUT ROW LEVEL SECURITY;
>
> I tried to patch the parser/gram.y, but here was syntax conflicts
> on ADD / DROP sub-command.
> And, I noticed "ROW LEVEL SECURITY" allows to implement
> without adding new keyword, unlike "SECURITY POLICY".

Let the bike-shedding begin ;-)

ALTER TABLE … WITH sounds a bit weird. What about
 ALTER TABLE <tblname> SET ROW POLICY <condition> FOR { SELECT | UPDATE | DELETE } ALTER TABLE <tblname> RESET ROW
POLICY

> As we discussed, it causes a problem with approach to append
> additional qualifiers to where clause implicitly, because it does
> not solve the matter corresponding to the order to execute
> qualifiers. So, I'm inclined to the approach to replace reference
> to tables with security policy by sub-queries with security barrier
> flag.

Since the security barrier flag carries a potentially hefty performance
penalty, I think it should be optional. Application which don't allow
SQL-level access to the database might still benefit from row-level security,
because it saves them from having to manually add the WHERE clause to every
statement, or having to wrap all their tables with views. Yet without direct
SQL-level access, the security barrier thing isn't really necessary, so
it'd be nice if they wouldn't have to pay for it. How about
 ALTER TABLE … SET ROW POLICY … WITH (security_barrier)

> One thing still I'm thinking is whether the security policy should
> be provided as a function or a clause. Enough simple sql function
> is inlined at simplify_function(), so here is no meaningful difference.
> I was afraid of code complexity, but all we should do is to append
> configured clause on the where clause of sub-query inside.
>
> |  ALTER TABLE <tblname> WITH ROW LEVEL SECURITY
> |      ( <condition clause> ) [FOR (SELECT | UPDATE | DELETE)];
>
> So, I tried to put "condition clause" instead of a function, right now.

A single function seems much easier implementation-wise, since you wouldn't
need to store an arbitrary expression in the catalog, just an oid. It also
delegates the dependency tracking problem to the function. It also simplies
the grammar, because the "FOR … " clause cannot be mistaken to belong to
the condition clause.

> One other issue we didn't have discussed is table inheritance.
> In case when a table TBLP has a child table TBLC and only
> TBLC has its security policy, what security policy should be
> applied when we run "SELECT * FROM TBLP".
> My preference is, the security policy is only applied to scan on
> TBLC, not TBLP.

Agreed.

> It is not desirable behavior that visible tuples
> are different from way to reference a certain table.
> In addition, if and when TBLP and TBLC have their own policy
> individually, what is a desirable behavior?
> I think, the security policy of both TBLP and TBLC should be
> applied on TBLC; in other words, it applies the security policy
> of all the parent tables to scan on child table.

I think security policies should only apply to the table they're
declared for, not their child tables. Mostly because that is how
triggers operate, and security policies and triggers will often
be used together, so having their semantics regarding inheritance
be the same seems to be the least surprising option.

Also, if policies are inherited, how would you define a policy
which applies only to the parent, not to the child?

best regards,
Florian Pflug



Re: [RFC] Interface of Row Level Security

От
Florian Pflug
Дата:
On May24, 2012, at 19:25 , Robert Haas wrote:
> FWIW, I'm inclined to think that you should NOT be able to create a
> row that references an invisible row.  You might end up with that
> situation anyway, because we don't know what the semantics of the
> security policy are: rows might become visible or invisible after the
> fact, and we can't police that.

Right. I just realized, however, that there's another case which wasn't
considered yet, which is how to handle the initial check during
ALTER TABEL ADD CONSTRAINT. I'm thinking that it's fine to only consider
visible rows in the parent table there too, but we should be checking
all rows in the child table. The easiest way would be to restrict
ALTER TABLE ADD CONSTRAINT to the table owner for tables with RLS
(it seems that currently the REFERENCES privilege is sufficient), and
make the table owner exempt from RLS on that table. The latter means you'd
need at least two roles to use RLS, but anyone security-conscious enough
to use RLS will probably not user the same role for DDL and DML operations
anyway...

> But I think that if you take the
> opposite position that the select queries inside fkey triggers ought
> to be exempt from security policy, then you need to build some new
> mechanism to make that happen, which seems like extra work for no
> benefit.

Hm, interesting angle. Continuing this thought, without any extra work,
UNIQUE and EXCLUSION constraints *will* be enforced regardless of row
visibility, because their implementation isn't SPI-based but instead
detects conflicts while inserting tuples into the index.

For being so obviously inconsistent in its treatment of UNIQUE and
EXCLUSION constraints vs. FK constraints, this feels surprisingly
right. So, to prevent design by accident, here's an attempt to explain
that divergence.

For UNIQUE and EXCLUSION constraints, the most conservative assumption
possible is that all rows are visible, since that leads to the most
rejections. With that assumption, no matter what the actual policy is,
the data returned by a query will always satisfy the constraint. Plus,
the constraint is still sensible because it neither rejects nor allows
all rows. So that conservative assumption is the one we make, i.e. we
ignore RLS visibility when checking those kinds of constraints.

For FK constraints, OTOH, the most conservative assumption is that
no rows are visible. But that is meaningless, since it will simply reject
all possible rows. Having thus no chance of enforcing the constraint
ourselves under all possible policies, the best we can do is to at least
make it possible for the constraint to work correctly for as many policies
as possible. Now, if we go with KaiGai's suggestion of skipping RLS
while checking FK constraints, the only policy that the constraint will
work correctly for is one which doesn't actually hide any parent rows.
Whereas if we apply RLS checks while checking FK constraints, all policies
which behave consistently for parent and child rows (i.e. don't hide the
former but show the latter) will work correctly. We thus go with the
second option, since the class of working policies is larger.

best regards,
Florian Pflug



Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/5/24 Robert Haas <robertmhaas@gmail.com>:
> On Thu, May 24, 2012 at 6:11 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>>> Perhaps when we see that RLS
>>> applies, we should replace the reference to the original table with a
>>> subquery RTE that has the security_barrier flag set - essentially
>>> treating a table with RLS as if it were a security view.
>>>
>> I become to think it is a better approach than tracking origin of each
>> qualifiers. One problem is case handling on update or delete statement.
>>
>> It may be possible to rewrite the update / delete query as follows:
>>
>> From:
>>  UPDATE tbl SET X = X + 1 WHERE f_leak(Y)
>> To:
>>  UPDATE tbl SET X = X + 1 WHERE ctid = (
>>      SELECT * FROM (
>>          SELECT ctid FROM tbl WHERE uname = getpgusername()  <== (*)
>> should have security-barrier
>>      ) AS tbl_subqry WHERE f_leak(Y)
>>  );
>>
>> Expanded sub-queries will have security-barrier flag, so it enforces
>> the "uname = getpgusername()" being checked earlier than f_leak(Y).
>> We may need to measure the performance impact due to the reform.
>
> The problem with this is that it introduces an extra instance of tbl
> into the query - there are now two rather than one.  UPDATE .. FROM is
> supposed to be a way to avoid this, but it's insufficiently general to
> handle all the cases (e.g. UPDATE a LEFT JOIN b can't be written using
> the existing syntax).  Anyway we want to avoid inserting self-joins
> for performance reasons if at all possible.  It should be easy to do
> that in the case of SELECT; UPDATE and DELETE may need a bit more
> work.
>
I'll try to investigate a way to solve the matter without twice scan.
Right now, I have no reasonable ideas. Please give us suggestion
if you have something...

>> I think, this situation is similar to a case when we reference a view
>> without privileges to underlying tables. If Bob set up a view with
>> something "tricky" function, it allows Bob to reference credentials
>> of users who reference the view.
>> More or less, it might be a problem when a user try to invoke
>> a user defined function declared by others.
>> (Thus, sepgsql policy does not allow users to invoke a function
>> declared by another one in different domain; without DBA's checks.)
>
> This is true, but there are still some new threat models.  For
> example, currently, pg_dump isn't going to run any user-defined code
> just because you do SELECT * FROM table, but that will change with
> this patch.  Note that pg_dump need not actually select from views,
> only tables.
>
>> I think it is a good idea not to apply RLS when current user has
>> superuser privilege from perspective of security model consistency,
>> but it is inconsistent to check privileges underlying tables.
>
> Seems like a somewhat random wart, if it's just an exception for
> superusers.  I think we need to do better than that.  For example, at
> my last company, sales reps A and B were permitted to see all
> customers of the company, but sales reps C, D, E, F, G, H, I, and J
> were permitted to see only their own accounts.  Those sorts of
> policies need to be easy to implement.
>
Probably, if "sales_rep" column records its responsible repo, its
security policy is able to be described as: (my_sales_rep() in ('A', 'B') OR sales_rep = my_sales_rep())

Indeed, the design to check underlying table seems to me like
column-level privileges towards table-level privileges, since it
is checked only when user does not have requires privileges
on whole of the table.
However, I have no idea to modify ExecCheckRTEPerms()
regarding to RLS. If we assume RLS is applied when user has
no privileges on tables, the current ExecCheckRTEPerms()
always raises an error towards unprivileged users, prior to
execution of queries.
Isn't it preferable behavior to allow unprivileged users to
reference a table (or columns) when it has RLS policy?

I think, table and column level privilege should be checked
individually, in addition to row-level security policy.

>>> Another idea is to set things up so that the RLS policy function isn't
>>> applied to each row directly; instead, it's invoked once per query and
>>> *returns* a WHERE clause.  This would be a lot more powerful than the
>>> proposed design, because now the table owner can write a function that
>>> imposes quals on some people but not others, which seems very useful.
>>>
>> Sorry, I don't favor this idea. Even if table owner set up a function to
>> generate additional qualifiers, it also has no guarantee the qualifiers
>> are invoked prior to user-given one.
>> It seems to me this approach will have same problem...
>
> It's not intended to solve the qual-ordering problem, just to allow
> additional policy flexibility.
>
At the beginning, I thought it takes complex code to parse
where-clause being provided as security policy, so it is the
reason why I was inclined to give a function, instead of a clause.
But I noticed we already have similar code at CreateTrigger()
to handle it.
Does it give policy flexibility?

>>> It's not clear to me that there is any need for built-in server
>>> functionality here.  If the table owner wants to enforce some sort of
>>> policy regarding INSERT or UPDATE or DELETE, they can already do that
>>> today just by attaching a trigger to the table.  And they can enforce
>>> whatever policy they like that way.  Before designing any new
>>> mechanism, what's wrong with the existing one?
>>>
>> Yes, we don't need any new invent to check the value of new tuples.
>> But it should be done after all the user-defined triggers. Existing
>> trigger does not have a mechanism to enforce order to be invoked.
>> So, what I really implement is a mechanism to inject some pseudo
>> triggers "at tail of the Trigger array".
>
> Start the trigger names with the letter "z".
>
OK, I'd like to postpone this future from the 1st stage.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/5/24 Robert Haas <robertmhaas@gmail.com>:
> On Thu, May 24, 2012 at 12:00 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>>> Another issue, BTW, are FOREIGN KEY constraints. Should you be allowed to
>>> created references to rows which are invisible to you, or should FOREIGN KEY
>>> constraints be exempt from security policies? I'd say they shouldn't be, i.e.
>>> the policy WHERE clause should be added to constraint checking queries like
>>> usual. But maybe I'm missing some reason why that'd be undesirable…
>>>
>> I agree. The row level security policy should not be applied during FK checks
>> (or other internal stuff; to be harmless). At the previous discussion, it was
>> issued that iteration of FK/PK proving enables malicious one to estimate
>> existence of invisible tuple and its key value, although they cannot see the
>> actual values. It is well documented limitation, thus, user should not use row-
>> level security (or should not use natural key) if they cannot accept
>> this limitation.
>
> You say "I agree", but it seems to me that you and Florian are in fact
> taking opposite positions.
>
Sorry, I misread what he described.

> FWIW, I'm inclined to think that you should NOT be able to create a
> row that references an invisible row.  You might end up with that
> situation anyway, because we don't know what the semantics of the
> security policy are: rows might become visible or invisible after the
> fact, and we can't police that.  But I think that if you take the
> opposite position that the select queries inside fkey triggers ought
> to be exempt from security policy, then you need to build some new
> mechanism to make that happen, which seems like extra work for no
> benefit.
>
I think it is fair enough for RI_FKey_check_ins and RI_FKey_check_upd;
RLS policy inside these trigger function will exhibit to create a row that
references invisible row.

However, it should not be applied on triggers being set on PK tables,
because it allows to modify or delete primary-key being referenced by
invisible foreign-key from the viewpoint of operators.
I think, it makes sense to have exceptional cases; to make sure
foreign-key constraint at the baseline.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/5/25 Florian Pflug <fgp@phlo.org>:
> On May24, 2012, at 19:25 , Robert Haas wrote:
>> FWIW, I'm inclined to think that you should NOT be able to create a
>> row that references an invisible row.  You might end up with that
>> situation anyway, because we don't know what the semantics of the
>> security policy are: rows might become visible or invisible after the
>> fact, and we can't police that.
>
> Right. I just realized, however, that there's another case which wasn't
> considered yet, which is how to handle the initial check during
> ALTER TABEL ADD CONSTRAINT. I'm thinking that it's fine to only consider
> visible rows in the parent table there too, but we should be checking
> all rows in the child table. The easiest way would be to restrict
> ALTER TABLE ADD CONSTRAINT to the table owner for tables with RLS
> (it seems that currently the REFERENCES privilege is sufficient), and
> make the table owner exempt from RLS on that table. The latter means you'd
> need at least two roles to use RLS, but anyone security-conscious enough
> to use RLS will probably not user the same role for DDL and DML operations
> anyway...
>
I think, the RLS policy should perform as a view with qualifiers.
It means database constraints have to be kept from the viewpoint of
any clients, so, any orphan foreign-keys or any rows violating check
constraint should not exist.

In case when a user insert a row with foreign-key, it is an exceptional
case. Even if he cannot insert a foreign-key that references invisible
primary-key, it never breaks database constraints from the viewpoint
of other folks.

My standpoint deals with database constraints as first class customer
that should be always kept in the lowest level, even though a part of
results would be filtered out due to RLS.
Isn't it a simple enough criteria?

>> But I think that if you take the
>> opposite position that the select queries inside fkey triggers ought
>> to be exempt from security policy, then you need to build some new
>> mechanism to make that happen, which seems like extra work for no
>> benefit.
>
> Hm, interesting angle. Continuing this thought, without any extra work,
> UNIQUE and EXCLUSION constraints *will* be enforced regardless of row
> visibility, because their implementation isn't SPI-based but instead
> detects conflicts while inserting tuples into the index.
>
> For being so obviously inconsistent in its treatment of UNIQUE and
> EXCLUSION constraints vs. FK constraints, this feels surprisingly
> right. So, to prevent design by accident, here's an attempt to explain
> that divergence.
>
> For UNIQUE and EXCLUSION constraints, the most conservative assumption
> possible is that all rows are visible, since that leads to the most
> rejections. With that assumption, no matter what the actual policy is,
> the data returned by a query will always satisfy the constraint. Plus,
> the constraint is still sensible because it neither rejects nor allows
> all rows. So that conservative assumption is the one we make, i.e. we
> ignore RLS visibility when checking those kinds of constraints.
>
> For FK constraints, OTOH, the most conservative assumption is that
> no rows are visible. But that is meaningless, since it will simply reject
> all possible rows. Having thus no chance of enforcing the constraint
> ourselves under all possible policies, the best we can do is to at least
> make it possible for the constraint to work correctly for as many policies
> as possible. Now, if we go with KaiGai's suggestion of skipping RLS
> while checking FK constraints, the only policy that the constraint will
> work correctly for is one which doesn't actually hide any parent rows.
> Whereas if we apply RLS checks while checking FK constraints, all policies
> which behave consistently for parent and child rows (i.e. don't hide the
> former but show the latter) will work correctly. We thus go with the
> second option, since the class of working policies is larger.
>
--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Alastair Turner
Дата:
Excerpts from Kohei KaiGai <kaigai@kaigai.gr.jp> wrote on Fri, May 25,
2012 at 11:08 PM:
> If we assume RLS is applied when user has
> no privileges on tables, the current ExecCheckRTEPerms()
> always raises an error towards unprivileged users, prior to
> execution of queries.
> Isn't it preferable behavior to allow unprivileged users to
> reference a table (or columns) when it has RLS policy?
>
Rather than assuming the the RLS checks will be applied when there are
no privileges it would make sense to regard RLS as a limitation on the
scope of a particular privilege. This makes RLS a property of a
particular grant of a privilege rather than of the table. Viewed this
way it is possible to control which users are subject to restrictions
imposed by the RLS function, which will avoid RLS overhead for
operations which don't require it while allowing checks for those that
do, provide a mechanism exempting object owners from RLS checks and
make it possible to avoid pg_dump calling user defined code.

This suggests an alternative declaration syntax, to use the salesman example:

GRANT SELECT ON TABLE client_detail TO super_sales_group;
GRANT SELECT TABLE ON client_detail TO limited_sales_group WITH
(QUALIFICATION FUNCTION sales_view_check);

and since more easily usable security features will be used more
effectively, a possible future inlining of the condition:

GRANT SELECT ON TABLE client_detail TO limited_sales_group WHERE
sales_rep = my_sales_rep();

Regards,

Alastair.


Re: [RFC] Interface of Row Level Security

От
Noah Misch
Дата:
On Thu, May 24, 2012 at 07:31:37PM +0200, Florian Pflug wrote:
> On May24, 2012, at 18:42 , Kohei KaiGai wrote:
> > As we discussed, it causes a problem with approach to append
> > additional qualifiers to where clause implicitly, because it does
> > not solve the matter corresponding to the order to execute
> > qualifiers. So, I'm inclined to the approach to replace reference
> > to tables with security policy by sub-queries with security barrier
> > flag.
> 
> Since the security barrier flag carries a potentially hefty performance
> penalty, I think it should be optional. Application which don't allow
> SQL-level access to the database might still benefit from row-level security,
> because it saves them from having to manually add the WHERE clause to every
> statement, or having to wrap all their tables with views. Yet without direct
> SQL-level access, the security barrier thing isn't really necessary, so
> it'd be nice if they wouldn't have to pay for it. How about
> 
>   ALTER TABLE ? SET ROW POLICY ? WITH (security_barrier)

The conventional case for a RLS facility is to wholly implement a security
policy, so security_barrier should be the default.  Using the same facility to
implement a security policy in cooperation with a trusted query generator is
the variant case.

Backward compatibility concerns limited our options when retrofitting the
security_barrier treatment for views, but I'd rather not add a knob completely
disabling it in the context of a brand new feature.  A better avenue is to
enhance our facilities for identifying safe query fragments.  For example,
ALTER FUNCTION ... LEAKPROOF is superuser-only.  Adding a way for a table
owner to similarly trust functions for the purpose of his own tables would
help close the gap that motivates such an all-or-nothing knob.

Thanks,
nm


Re: [RFC] Interface of Row Level Security

От
Florian Pflug
Дата:
On May28, 2012, at 02:46 , Noah Misch wrote:
> On Thu, May 24, 2012 at 07:31:37PM +0200, Florian Pflug wrote:
>> Since the security barrier flag carries a potentially hefty performance
>> penalty, I think it should be optional. Application which don't allow
>> SQL-level access to the database might still benefit from row-level security,
>> because it saves them from having to manually add the WHERE clause to every
>> statement, or having to wrap all their tables with views. Yet without direct
>> SQL-level access, the security barrier thing isn't really necessary, so
>> it'd be nice if they wouldn't have to pay for it. How about
>>
>>  ALTER TABLE ? SET ROW POLICY ? WITH (security_barrier)
>
> Backward compatibility concerns limited our options when retrofitting the
> security_barrier treatment for views, but I'd rather not add a knob completely
> disabling it in the context of a brand new feature.  A better avenue is to
> enhance our facilities for identifying safe query fragments.  For example,
> ALTER FUNCTION ... LEAKPROOF is superuser-only.  Adding a way for a table
> owner to similarly trust functions for the purpose of his own tables would
> help close the gap that motivates such an all-or-nothing knob.

Hm, I'm not sure a per-function flag is really the solution here. Neither,
however, is a per-RLS flag as your arguments made me realize. There really are
three orthogonal concepts here, all of which allow security barriers to be
ignored, namely
 A) Trusting the use not to exploit leaks, i.e. what you call a trusted query    generator
 B) There being no leaks, i.e. all functions being LEAKPROOF
 C) Not caring about leaks, i.e. the security_barrier flag

Concept B is handled adequately by the LEAKPROOF flag. Concept C is handled
by the security_barrier flag. However, as you pointed out, it's a bit of a
dubious concept and only really necessary for backwards compatibility because
it reflects pre-9.2 behaviour.

Concept A is what I was aiming for. Per the above, a per-RLS flag is clearly the
wrong tool for the job, so consider my suggestion withdrawn. What we actually
want, I think, is a per-role flag which marks a role as "leak trusted". Queries
issued by such a role would behave as if all functions are LEAKPROOF, since even
if there is a leak, the role is trusted not to exploit it.

best regards,
Florian Pflug



Re: [RFC] Interface of Row Level Security

От
Florian Pflug
Дата:
On May25, 2012, at 23:32 , Kohei KaiGai wrote:
> However, it should not be applied on triggers being set on PK tables,
> because it allows to modify or delete primary-key being referenced by
> invisible foreign-key from the viewpoint of operators.
> I think, it makes sense to have exceptional cases; to make sure
> foreign-key constraint at the baseline.

Good point. If we simply ignore RLS policies when executing parent-side
FK constraint triggers, however, we'll happily UPDATE or DELETE invisible
rows should the constraint be defined as CASCADE or SET NULL. Which is,
um, not exactly desirable…

FK triggers already deal with a related issue, namely a child row being
invisible to a REPEATABLE READ transaction which attempts to update the
parent row. We handle this case by always executing queries in
parent-side FK constraint triggers with a newly created READ COMMITTED
snapshot, but verifying that all matching rows are also visible under
the transaction's actual REPEATABLE READ snapshot. If some are not, we
throw a serialization error.

FK constraints in the presence of RLS should do the same, I think. When
executing a parent-side constraint trigger we should *not* apply the
RLS policy as a filter, but instead check that all returned rows are
indeed visible according to the RLS policy. If some are not we throw,
say, a "policy enforcement error".

best regards,
Florian Pflug



Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/5/27 Alastair Turner <bell@ctrlf5.co.za>:
> Excerpts from Kohei KaiGai <kaigai@kaigai.gr.jp> wrote on Fri, May 25,
> 2012 at 11:08 PM:
>> If we assume RLS is applied when user has
>> no privileges on tables, the current ExecCheckRTEPerms()
>> always raises an error towards unprivileged users, prior to
>> execution of queries.
>> Isn't it preferable behavior to allow unprivileged users to
>> reference a table (or columns) when it has RLS policy?
>>
> Rather than assuming the the RLS checks will be applied when there are
> no privileges it would make sense to regard RLS as a limitation on the
> scope of a particular privilege. This makes RLS a property of a
> particular grant of a privilege rather than of the table. Viewed this
> way it is possible to control which users are subject to restrictions
> imposed by the RLS function, which will avoid RLS overhead for
> operations which don't require it while allowing checks for those that
> do, provide a mechanism exempting object owners from RLS checks and
> make it possible to avoid pg_dump calling user defined code.
>
> This suggests an alternative declaration syntax, to use the salesman example:
>
> GRANT SELECT ON TABLE client_detail TO super_sales_group;
> GRANT SELECT TABLE ON client_detail TO limited_sales_group WITH
> (QUALIFICATION FUNCTION sales_view_check);
>
> and since more easily usable security features will be used more
> effectively, a possible future inlining of the condition:
>
> GRANT SELECT ON TABLE client_detail TO limited_sales_group WHERE
> sales_rep = my_sales_rep();
>
It seems to me an interesting proposition, because I didn't thought such kind of
statement to provide row-level policies.

We have a common issue for the idea that check privileges of underlying tables;
when we should check the privileges and make a decision whether RLS policy is
appended, or not.
Due to query optimization reason, the RLS policy should be appended prior to
the query optimization. At least, we want to utilize RLS policy for index scans,
rather than sequential scan.

On the other hand, all the permission checks are currently applied at executor
stage, not planner stage. Table / column-level privileges are also applied at
head of the executor. It is headache, if we go ahead with an idea to integrate
RLS and existing privilege checks.
One exception is simplify_function() that tries to expand enough simple SQL
functions if current user has ACL_EXECUTE privilege at planner stage.

One other issue is whether we should allow any users with enough privileges
to bypass RLS, or only superusers. I'm still not sure how the existing checks
perform with RLS.
If and when Alice has SELECT permission on column X and Y of TBL with RLS,
but her query tries to reference X,Y and Z. In this case, existing privilege
mechanism shall raise an error, but the criteria with underlying table allows to
run this query. It seems to me it is a straightforward criteria to
limit superusers
to bypass RLS...

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/5/28 Florian Pflug <fgp@phlo.org>:
> On May28, 2012, at 02:46 , Noah Misch wrote:
>> On Thu, May 24, 2012 at 07:31:37PM +0200, Florian Pflug wrote:
>>> Since the security barrier flag carries a potentially hefty performance
>>> penalty, I think it should be optional. Application which don't allow
>>> SQL-level access to the database might still benefit from row-level security,
>>> because it saves them from having to manually add the WHERE clause to every
>>> statement, or having to wrap all their tables with views. Yet without direct
>>> SQL-level access, the security barrier thing isn't really necessary, so
>>> it'd be nice if they wouldn't have to pay for it. How about
>>>
>>>  ALTER TABLE ? SET ROW POLICY ? WITH (security_barrier)
>>
>> Backward compatibility concerns limited our options when retrofitting the
>> security_barrier treatment for views, but I'd rather not add a knob completely
>> disabling it in the context of a brand new feature.  A better avenue is to
>> enhance our facilities for identifying safe query fragments.  For example,
>> ALTER FUNCTION ... LEAKPROOF is superuser-only.  Adding a way for a table
>> owner to similarly trust functions for the purpose of his own tables would
>> help close the gap that motivates such an all-or-nothing knob.
>
> Hm, I'm not sure a per-function flag is really the solution here. Neither,
> however, is a per-RLS flag as your arguments made me realize. There really are
> three orthogonal concepts here, all of which allow security barriers to be
> ignored, namely
>
>  A) Trusting the use not to exploit leaks, i.e. what you call a trusted query
>     generator
>
>  B) There being no leaks, i.e. all functions being LEAKPROOF
>
>  C) Not caring about leaks, i.e. the security_barrier flag
>
> Concept B is handled adequately by the LEAKPROOF flag. Concept C is handled
> by the security_barrier flag. However, as you pointed out, it's a bit of a
> dubious concept and only really necessary for backwards compatibility because
> it reflects pre-9.2 behaviour.
>
> Concept A is what I was aiming for. Per the above, a per-RLS flag is clearly the
> wrong tool for the job, so consider my suggestion withdrawn. What we actually
> want, I think, is a per-role flag which marks a role as "leak trusted". Queries
> issued by such a role would behave as if all functions are LEAKPROOF, since even
> if there is a leak, the role is trusted not to exploit it.
>
It seems to me we are confusing about security-barrier and leakproof flag.
The purpose of leakproof flag is to ensure the function is safe to execute,
so it is allowed to pushed down the function into a sub-query with security-
barrier. It works to decide whether the user given qualifier is safe to
push-down. The RLS policy itself is placed within a sub-query from the
beginning, thus not needed them to be leakproof function.

Existing view with security-barrier flag cannot prevent information leaks,
even when the owner set leaky function on its qualifiers. However, it is
owner's responsibility. In a similar fashion, I don't think RLS works fine
in case when owner set leaky function as its policy by himself.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Florian Pflug
Дата:
On May29, 2012, at 13:59 , Kohei KaiGai wrote:
> 2012/5/28 Florian Pflug <fgp@phlo.org>:
>> Concept B is handled adequately by the LEAKPROOF flag. Concept C is handled
>> by the security_barrier flag. However, as you pointed out, it's a bit of a
>> dubious concept and only really necessary for backwards compatibility because
>> it reflects pre-9.2 behaviour.
>>
>> Concept A is what I was aiming for. Per the above, a per-RLS flag is clearly the
>> wrong tool for the job, so consider my suggestion withdrawn. What we actually
>> want, I think, is a per-role flag which marks a role as "leak trusted". Queries
>> issued by such a role would behave as if all functions are LEAKPROOF, since even
>> if there is a leak, the role is trusted not to exploit it.
>>
> It seems to me we are confusing about security-barrier and leakproof flag.
> The purpose of leakproof flag is to ensure the function is safe to execute,
> so it is allowed to pushed down the function into a sub-query with security-
> barrier. It works to decide whether the user given qualifier is safe to
> push-down. The RLS policy itself is placed within a sub-query from the
> beginning, thus not needed them to be leakproof function.

My suggestion (which I then withdrew) was for that per-policy flag to decide
whether the sub-query which applies the RLS policy acts as a security barrier
or not. Thus, with the flag set, only leakproof predicates would haven been
allowed to be pushed down into that subquery, whereas without the flag every
predicate would haven been a candidate for being pushed down. The flag was
never intended to mark the RLS policy itself as leakproof or leaky - that,
as you said, makes little sense.

My motivation for suggesting that flag was to prevent people who want RLS,
yet aren't concerned about leaks, from having to pay the performance penalty
associated with not pushing down predicates.

Noah's comments, however, made me realize that whether one cares about
potential leaks is usually not a per-table property, but rather a property
of the user executing the query. Some users (like the middle-ware that sits
on top of your database) you might trust to not exploit leaks, while wanting
the tightest security possible for others. Which made me suggest a per-role
flag which essentially overrides the security barrier stuff. Explaining that
behaviour as "behave as if all functions are LEAKPROOF" might haven been a
tad confusion, though. Maybe a better explanation is "behave as if no
sub-query has the security barrier flag set", or even "don't let security
concerns prevent predicate push-down".

best regards,
Florian Pflug



Re: [RFC] Interface of Row Level Security

От
Florian Pflug
Дата:
On May29, 2012, at 13:37 , Kohei KaiGai wrote:
> 2012/5/27 Alastair Turner <bell@ctrlf5.co.za>:
>> Excerpts from Kohei KaiGai <kaigai@kaigai.gr.jp> wrote on Fri, May 25,
>> 2012 at 11:08 PM:
>>> If we assume RLS is applied when user has
>>> no privileges on tables, the current ExecCheckRTEPerms()
>>> always raises an error towards unprivileged users, prior to
>>> execution of queries.
>>> Isn't it preferable behavior to allow unprivileged users to
>>> reference a table (or columns) when it has RLS policy?

I don't think so. Per-table and per-column permission checks should still
apply independent from any RLS policy. You can always grant SELECT access
to PUBLIC if want to rely solely on the RLS policy...

>> Rather than assuming the the RLS checks will be applied when there are
>> no privileges it would make sense to regard RLS as a limitation on the
>> scope of a particular privilege. This makes RLS a property of a
>> particular grant of a privilege rather than of the table. Viewed this
>> way it is possible to control which users are subject to restrictions
>> imposed by the RLS function, which will avoid RLS overhead for
>> operations which don't require it while allowing checks for those that
>> do, provide a mechanism exempting object owners from RLS checks and
>> make it possible to avoid pg_dump calling user defined code.
>>
>> This suggests an alternative declaration syntax, to use the salesman example:
>>
>> GRANT SELECT ON TABLE client_detail TO super_sales_group;
>> GRANT SELECT TABLE ON client_detail TO limited_sales_group WITH
>> (QUALIFICATION FUNCTION sales_view_check);
>>
>> and since more easily usable security features will be used more
>> effectively, a possible future inlining of the condition:
>>
>> GRANT SELECT ON TABLE client_detail TO limited_sales_group WHERE
>> sales_rep = my_sales_rep();
>>
> It seems to me an interesting proposition, because I didn't thought such kind of
> statement to provide row-level policies.

Note, though, that due to role inheritance, multiple such QUALIFICATION FUNCTIONS
could be "in scope" for a single table. In that case, I guess you'd have to
OR them together, i.e. hide a row only if all of them return false.

> We have a common issue for the idea that check privileges of underlying tables;
> when we should check the privileges and make a decision whether RLS policy is
> appended, or not.
> Due to query optimization reason, the RLS policy should be appended prior to
> the query optimization. At least, we want to utilize RLS policy for index scans,
> rather than sequential scan.

Agreed. With the current design, the RLS policy has to be applied before planning,
not during execution.

> One other issue is whether we should allow any users with enough privileges
> to bypass RLS, or only superusers. I'm still not sure how the existing checks
> perform with RLS.

Every user who has the power to disable the RLS policy should also at least be
able to circumvent it temporarily, I think. This includes superusers, the table
owner (since he's presumably allowed to do ALTER TABLE .. RESET ROW POLICY),
and maybe the database owner.

> If and when Alice has SELECT permission on column X and Y of TBL with RLS,
> but her query tries to reference X,Y and Z. In this case, existing privilege
> mechanism shall raise an error, but the criteria with underlying table allows to
> run this query. It seems to me it is a straightforward criteria to
> limit superusers
> to bypass RLS…

I'm not sure I understand. Do you mean the Alice references only columns X
and Y in her query, but the RLS policy adds the reference to column Z?

best regards,
Florian Pflug



Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/5/29 Florian Pflug <fgp@phlo.org>:
> On May29, 2012, at 13:59 , Kohei KaiGai wrote:
>> 2012/5/28 Florian Pflug <fgp@phlo.org>:
>>> Concept B is handled adequately by the LEAKPROOF flag. Concept C is handled
>>> by the security_barrier flag. However, as you pointed out, it's a bit of a
>>> dubious concept and only really necessary for backwards compatibility because
>>> it reflects pre-9.2 behaviour.
>>>
>>> Concept A is what I was aiming for. Per the above, a per-RLS flag is clearly the
>>> wrong tool for the job, so consider my suggestion withdrawn. What we actually
>>> want, I think, is a per-role flag which marks a role as "leak trusted". Queries
>>> issued by such a role would behave as if all functions are LEAKPROOF, since even
>>> if there is a leak, the role is trusted not to exploit it.
>>>
>> It seems to me we are confusing about security-barrier and leakproof flag.
>> The purpose of leakproof flag is to ensure the function is safe to execute,
>> so it is allowed to pushed down the function into a sub-query with security-
>> barrier. It works to decide whether the user given qualifier is safe to
>> push-down. The RLS policy itself is placed within a sub-query from the
>> beginning, thus not needed them to be leakproof function.
>
> My suggestion (which I then withdrew) was for that per-policy flag to decide
> whether the sub-query which applies the RLS policy acts as a security barrier
> or not. Thus, with the flag set, only leakproof predicates would haven been
> allowed to be pushed down into that subquery, whereas without the flag every
> predicate would haven been a candidate for being pushed down. The flag was
> never intended to mark the RLS policy itself as leakproof or leaky - that,
> as you said, makes little sense.
>
> My motivation for suggesting that flag was to prevent people who want RLS,
> yet aren't concerned about leaks, from having to pay the performance penalty
> associated with not pushing down predicates.
>
I think it is a reasonable selection. For example, it make sense in case when
users obviously don't have privilege to create a function and don't care about
estimation of invisible values using iteration of proving.
The owner is the only person who can determine whether it is harmless, or not.

> Noah's comments, however, made me realize that whether one cares about
> potential leaks is usually not a per-table property, but rather a property
> of the user executing the query. Some users (like the middle-ware that sits
> on top of your database) you might trust to not exploit leaks, while wanting
> the tightest security possible for others. Which made me suggest a per-role
> flag which essentially overrides the security barrier stuff. Explaining that
> behaviour as "behave as if all functions are LEAKPROOF" might haven been a
> tad confusion, though. Maybe a better explanation is "behave as if no
> sub-query has the security barrier flag set", or even "don't let security
> concerns prevent predicate push-down".
>
Hmm... It might make sense to allow table-owner to set up suitable grade
between security and performance. However, isn't it a feature to be
discussed in the 2nd commit-fest? I think we can construct this type of
adjustment on the basis of minimum functionality.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Florian Pflug
Дата:
On May29, 2012, at 16:13 , Kohei KaiGai wrote:
> 2012/5/29 Florian Pflug <fgp@phlo.org>:
>> My motivation for suggesting that flag was to prevent people who want RLS,
>> yet aren't concerned about leaks, from having to pay the performance penalty
>> associated with not pushing down predicates.
>>
> I think it is a reasonable selection. For example, it make sense in case when
> users obviously don't have privilege to create a function and don't care about
> estimation of invisible values using iteration of proving.
> The owner is the only person who can determine whether it is harmless, or not.

Nice to know that we're on the same page here.

>> Noah's comments, however, made me realize that whether one cares about
>> potential leaks is usually not a per-table property, but rather a property
>> of the user executing the query. Some users (like the middle-ware that sits
>> on top of your database) you might trust to not exploit leaks, while wanting
>> the tightest security possible for others. Which made me suggest a per-role
>> flag which essentially overrides the security barrier stuff. Explaining that
>> behaviour as "behave as if all functions are LEAKPROOF" might haven been a
>> tad confusion, though. Maybe a better explanation is "behave as if no
>> sub-query has the security barrier flag set", or even "don't let security
>> concerns prevent predicate push-down".
>>
> Hmm... It might make sense to allow table-owner to set up suitable grade
> between security and performance. However, isn't it a feature to be
> discussed in the 2nd commit-fest? I think we can construct this type of
> adjustment on the basis of minimum functionality.

Agreed. If the flag is per-role, not per-policy, the feature is orthogonal
to the whole RLS feature. So yeah, let's postpone it to a later date.

best regards,
Florian Pflug



Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/5/29 Florian Pflug <fgp@phlo.org>:
> On May29, 2012, at 13:37 , Kohei KaiGai wrote:
>> 2012/5/27 Alastair Turner <bell@ctrlf5.co.za>:
>>> Excerpts from Kohei KaiGai <kaigai@kaigai.gr.jp> wrote on Fri, May 25,
>>> 2012 at 11:08 PM:
>>>> If we assume RLS is applied when user has
>>>> no privileges on tables, the current ExecCheckRTEPerms()
>>>> always raises an error towards unprivileged users, prior to
>>>> execution of queries.
>>>> Isn't it preferable behavior to allow unprivileged users to
>>>> reference a table (or columns) when it has RLS policy?
>
> I don't think so. Per-table and per-column permission checks should still
> apply independent from any RLS policy. You can always grant SELECT access
> to PUBLIC if want to rely solely on the RLS policy...
>
Frankly, I have same opinion. Existing per-table and column permission
checks should perform independently from RLS. What I wanted to explain
is the combination of them makes confusion for us.

>>> Rather than assuming the the RLS checks will be applied when there are
>>> no privileges it would make sense to regard RLS as a limitation on the
>>> scope of a particular privilege. This makes RLS a property of a
>>> particular grant of a privilege rather than of the table. Viewed this
>>> way it is possible to control which users are subject to restrictions
>>> imposed by the RLS function, which will avoid RLS overhead for
>>> operations which don't require it while allowing checks for those that
>>> do, provide a mechanism exempting object owners from RLS checks and
>>> make it possible to avoid pg_dump calling user defined code.
>>>
>>> This suggests an alternative declaration syntax, to use the salesman example:
>>>
>>> GRANT SELECT ON TABLE client_detail TO super_sales_group;
>>> GRANT SELECT TABLE ON client_detail TO limited_sales_group WITH
>>> (QUALIFICATION FUNCTION sales_view_check);
>>>
>>> and since more easily usable security features will be used more
>>> effectively, a possible future inlining of the condition:
>>>
>>> GRANT SELECT ON TABLE client_detail TO limited_sales_group WHERE
>>> sales_rep = my_sales_rep();
>>>
>> It seems to me an interesting proposition, because I didn't thought such kind of
>> statement to provide row-level policies.
>
> Note, though, that due to role inheritance, multiple such QUALIFICATION FUNCTIONS
> could be "in scope" for a single table. In that case, I guess you'd have to
> OR them together, i.e. hide a row only if all of them return false.
>
Nice catch! Indeed, it makes RLS policy to be applied complex, and hard to apply
index-scan.

>> We have a common issue for the idea that check privileges of underlying tables;
>> when we should check the privileges and make a decision whether RLS policy is
>> appended, or not.
>> Due to query optimization reason, the RLS policy should be appended prior to
>> the query optimization. At least, we want to utilize RLS policy for index scans,
>> rather than sequential scan.
>
> Agreed. With the current design, the RLS policy has to be applied before planning,
> not during execution.
>
>> One other issue is whether we should allow any users with enough privileges
>> to bypass RLS, or only superusers. I'm still not sure how the existing checks
>> perform with RLS.
>
> Every user who has the power to disable the RLS policy should also at least be
> able to circumvent it temporarily, I think. This includes superusers, the table
> owner (since he's presumably allowed to do ALTER TABLE .. RESET ROW POLICY),
> and maybe the database owner.
>
I'm not inclined this criteria, because existing privilege system
allows to restrict
owner's behavior, even though it can be invalidated by owner itself.
In other words, the privilege mechanism performs according to the current
setting unless it is not disabled by owner itself.
On the other hand, superuser is allowed to "ignore" the security setting.
In RLS situation, it is corresponding to run the query without RLS policy.

>> If and when Alice has SELECT permission on column X and Y of TBL with RLS,
>> but her query tries to reference X,Y and Z. In this case, existing privilege
>> mechanism shall raise an error, but the criteria with underlying table allows to
>> run this query. It seems to me it is a straightforward criteria to
>> limit superusers
>> to bypass RLS…
>
> I'm not sure I understand. Do you mean the Alice references only columns X
> and Y in her query, but the RLS policy adds the reference to column Z?
>
No, in this scenario, Alice tries to reference X,Y and Z, then it
should be violated.
I wanted to show is; If RLS performs only when current user has no table/column
level privilege, it makes strange behavior.
Here is no matter if RLS works independently from per table / column
permissions.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Robert Haas
Дата:
On Tue, May 29, 2012 at 9:12 AM, Florian Pflug <fgp@phlo.org> wrote:
> On May29, 2012, at 13:37 , Kohei KaiGai wrote:
>> 2012/5/27 Alastair Turner <bell@ctrlf5.co.za>:
>>> Excerpts from Kohei KaiGai <kaigai@kaigai.gr.jp> wrote on Fri, May 25,
>>> 2012 at 11:08 PM:
>>>> If we assume RLS is applied when user has
>>>> no privileges on tables, the current ExecCheckRTEPerms()
>>>> always raises an error towards unprivileged users, prior to
>>>> execution of queries.
>>>> Isn't it preferable behavior to allow unprivileged users to
>>>> reference a table (or columns) when it has RLS policy?
>
> I don't think so. Per-table and per-column permission checks should still
> apply independent from any RLS policy. You can always grant SELECT access
> to PUBLIC if want to rely solely on the RLS policy...

I strongly agree with this.  Permissions checks should be applied
first, so that we can boot anyone who has no business being there at
all as early in the process as possible.  As a second step, if you
have a right to the query the table, row-level security should
intervene to control which rows you can see.

One idea might be to have a grantable permission that permits the RLS
policy to be bypassed.  So, if a user has only SELECT permission, they
can select from the table, but the RLS policy will apply.  If they
have both SELECT and RLSBYPASS (probably not what we really want to
call it) permission, then they can select from the table and the RLS
policy will be skipped.  This means that superusers automatically skip
all RLS policies (which seems right) and table owners skip them by
default (but could revoke their own privileges) and other people can
skip them if the table owner (or the superuser) grants them the
appropriate privilege on the table involved.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [RFC] Interface of Row Level Security

От
Florian Pflug
Дата:
On May29, 2012, at 16:34 , Robert Haas wrote:
> One idea might be to have a grantable permission that permits the RLS
> policy to be bypassed.  So, if a user has only SELECT permission, they
> can select from the table, but the RLS policy will apply.  If they
> have both SELECT and RLSBYPASS (probably not what we really want to
> call it) permission, then they can select from the table and the RLS
> policy will be skipped.  This means that superusers automatically skip
> all RLS policies (which seems right) and table owners skip them by
> default (but could revoke their own privileges) and other people can
> skip them if the table owner (or the superuser) grants them the
> appropriate privilege on the table involved.

I like it. Seems to support all use-cases I can come up with, and extends
existing privilege semantics in a natural way.

best regards,
Florian Pflug



Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/5/29 Robert Haas <robertmhaas@gmail.com>:
> One idea might be to have a grantable permission that permits the RLS
> policy to be bypassed.  So, if a user has only SELECT permission, they
> can select from the table, but the RLS policy will apply.  If they
> have both SELECT and RLSBYPASS (probably not what we really want to
> call it) permission, then they can select from the table and the RLS
> policy will be skipped.  This means that superusers automatically skip
> all RLS policies (which seems right) and table owners skip them by
> default (but could revoke their own privileges) and other people can
> skip them if the table owner (or the superuser) grants them the
> appropriate privilege on the table involved.
>
Isn't it unavailable to describe using RLS policy?
In case when 'alice' and 'bob' should bypass RLS policy on a certain table,
we will be able to describe it as follows:   (current_user in ('alice', 'bob') OR rls_policy_this_table(X, Y, Z))

I have one concern the "current_user in (...)" is not wiped out at the planner
stage, although its evaluation result is obvious prior to execution.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Robert Haas
Дата:
On Fri, May 25, 2012 at 5:08 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>>> I think it is a good idea not to apply RLS when current user has
>>> superuser privilege from perspective of security model consistency,
>>> but it is inconsistent to check privileges underlying tables.
>>
>> Seems like a somewhat random wart, if it's just an exception for
>> superusers.  I think we need to do better than that.  For example, at
>> my last company, sales reps A and B were permitted to see all
>> customers of the company, but sales reps C, D, E, F, G, H, I, and J
>> were permitted to see only their own accounts.  Those sorts of
>> policies need to be easy to implement.
>>
> Probably, if "sales_rep" column records its responsible repo, its
> security policy is able to be described as:
>  (my_sales_rep() in ('A', 'B') OR sales_rep = my_sales_rep())

Yes, but that's a pain to optimize.  When A or B tries to select from
the table, the query optimizer has to realize that my_sales_rep() is
stable, inline it, do constant simplification and throw away the
entire OR clause.  Note that this won't work today, because we only
constant-fold immutable functions, not stable ones.  Then, since there
are no remaining security quals, we have to realize that we actually
don't need the security_barrier subquery RTE at all, and optimize that
away as well.  Maybe we can make all of that work, and maybe we should
make all of it work, but it's fairly complex.  The advantage of having
the function return the qual rather than contain the qual is that all
of that goes away.  The function can choose to return nothing (no RLS
for this user) or it can choose to return something (which will likely
be simpler than what it would have needed to return out of the chute).One disadvantage is that we have to parse the
returnedqual instead 
of just sucking in a node-tree.

Anyway, I don't feel super-strongly about this particular idea, so if
I'm the only one who likes it, fine, but that having been said, I
think users are going to want a *declarative* way to control which
policies are applied to which users.  Suppose Bob is a sales rep who
is only allowed to see his own customers, but then one day, we decide
we trust Bob after all, so we want to let him see everything.  We
could go back and update the IN (...) list in the security policy
function, but that's an ugly and unscalable nuisance, especially if
we've got 10,000 users.  It's much nicer to be able to just grant bob
a permission using some kind of, well, GRANT command.  That's what
we're doing, after all.  Alastair's proposal of making the security
policy a property of the GRANT is one way of tackling that, and the
RLSBYPASS permission I proposed elsewhere is another.  Something along
these lines seems likely to improve performance (by replacing a query
optimization problem with a syscache lookup) as well as ease-of-use.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [RFC] Interface of Row Level Security

От
Robert Haas
Дата:
On Tue, May 29, 2012 at 10:57 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
> 2012/5/29 Robert Haas <robertmhaas@gmail.com>:
>> One idea might be to have a grantable permission that permits the RLS
>> policy to be bypassed.  So, if a user has only SELECT permission, they
>> can select from the table, but the RLS policy will apply.  If they
>> have both SELECT and RLSBYPASS (probably not what we really want to
>> call it) permission, then they can select from the table and the RLS
>> policy will be skipped.  This means that superusers automatically skip
>> all RLS policies (which seems right) and table owners skip them by
>> default (but could revoke their own privileges) and other people can
>> skip them if the table owner (or the superuser) grants them the
>> appropriate privilege on the table involved.
>>
> Isn't it unavailable to describe using RLS policy?
> In case when 'alice' and 'bob' should bypass RLS policy on a certain table,
> we will be able to describe it as follows:
>    (current_user in ('alice', 'bob') OR rls_policy_this_table(X, Y, Z))
>
> I have one concern the "current_user in (...)" is not wiped out at the planner
> stage, although its evaluation result is obvious prior to execution.

Yes, that's one problem with doing it that way.  The fact that the
superuser is not guaranteed-excluded is another; that can of course be
fixed by adding a special-case hack for superusers, but IMHO this is
more elegant.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [RFC] Interface of Row Level Security

От
Florian Pflug
Дата:
On May29, 2012, at 16:57 , Kohei KaiGai wrote:
> 2012/5/29 Robert Haas <robertmhaas@gmail.com>:
>> One idea might be to have a grantable permission that permits the RLS
>> policy to be bypassed.  So, if a user has only SELECT permission, they
>> can select from the table, but the RLS policy will apply.  If they
>> have both SELECT and RLSBYPASS (probably not what we really want to
>> call it) permission, then they can select from the table and the RLS
>> policy will be skipped.  This means that superusers automatically skip
>> all RLS policies (which seems right) and table owners skip them by
>> default (but could revoke their own privileges) and other people can
>> skip them if the table owner (or the superuser) grants them the
>> appropriate privilege on the table involved.

> Isn't it unavailable to describe using RLS policy?
> In case when 'alice' and 'bob' should bypass RLS policy on a certain table,
> we will be able to describe it as follows:
>    (current_user in ('alice', 'bob') OR rls_policy_this_table(X, Y, Z))
>
> I have one concern the "current_user in (...)" is not wiped out at the planner
> stage, although its evaluation result is obvious prior to execution.

I think you'd simply pretend the RLS policy isn't there (i.e., don't rewrite
the table reference into a sub-query) if the user has the RLSBYPASS privilege,
not add another clause which selectively neuters the RLS policy.

Wait a moment.. now I get it.

You're saying that Robert's RLSBYPASS privilege is redundant, since one can
always add a "current_user IN ..." clause to the RLS policy function. That might
be the case theoretically, but I still believe that Robert's suggestion has a
lot of benefits. First, due to role inheritance, a simply clause like the above
isn't sufficient. Second, it doesn't make superusers and table owner bypass
RLS by default. Which I think is a good idea (bypassing by default, that is),
because it prevents broken backups (for those using pg_dump) and data exports.
It doesn't cost anything in terms of security, because those users could disable
the RLS policy away if they are so inclined.

best regards,
Florian Pflug



Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/5/29 Robert Haas <robertmhaas@gmail.com>:
> On Fri, May 25, 2012 at 5:08 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>>>> I think it is a good idea not to apply RLS when current user has
>>>> superuser privilege from perspective of security model consistency,
>>>> but it is inconsistent to check privileges underlying tables.
>>>
>>> Seems like a somewhat random wart, if it's just an exception for
>>> superusers.  I think we need to do better than that.  For example, at
>>> my last company, sales reps A and B were permitted to see all
>>> customers of the company, but sales reps C, D, E, F, G, H, I, and J
>>> were permitted to see only their own accounts.  Those sorts of
>>> policies need to be easy to implement.
>>>
>> Probably, if "sales_rep" column records its responsible repo, its
>> security policy is able to be described as:
>>  (my_sales_rep() in ('A', 'B') OR sales_rep = my_sales_rep())
>
> Yes, but that's a pain to optimize.  When A or B tries to select from
> the table, the query optimizer has to realize that my_sales_rep() is
> stable, inline it, do constant simplification and throw away the
> entire OR clause.  Note that this won't work today, because we only
> constant-fold immutable functions, not stable ones.  Then, since there
> are no remaining security quals, we have to realize that we actually
> don't need the security_barrier subquery RTE at all, and optimize that
> away as well.  Maybe we can make all of that work, and maybe we should
> make all of it work, but it's fairly complex.  The advantage of having
> the function return the qual rather than contain the qual is that all
> of that goes away.  The function can choose to return nothing (no RLS
> for this user) or it can choose to return something (which will likely
> be simpler than what it would have needed to return out of the chute).
>  One disadvantage is that we have to parse the returned qual instead
> of just sucking in a node-tree.
>
> Anyway, I don't feel super-strongly about this particular idea, so if
> I'm the only one who likes it, fine, but that having been said, I
> think users are going to want a *declarative* way to control which
> policies are applied to which users.  Suppose Bob is a sales rep who
> is only allowed to see his own customers, but then one day, we decide
> we trust Bob after all, so we want to let him see everything.  We
> could go back and update the IN (...) list in the security policy
> function, but that's an ugly and unscalable nuisance, especially if
> we've got 10,000 users.  It's much nicer to be able to just grant bob
> a permission using some kind of, well, GRANT command.  That's what
> we're doing, after all.  Alastair's proposal of making the security
> policy a property of the GRANT is one way of tackling that, and the
> RLSBYPASS permission I proposed elsewhere is another.  Something along
> these lines seems likely to improve performance (by replacing a query
> optimization problem with a syscache lookup) as well as ease-of-use.
>
My preference is RLSBYPASS permission rather than the approach
with functions that return policy clause at run-time, because it needs
to invalidate prepared statement at random timing.
In case of this function approach, the RLS policy shall be generated
on planner stage, and we cannot have any assumption to the criteria
of RLS policy. A function might generate RLS policy regarding to the
current user id. Yes, it is straightforward. The prepared statement
should be invalidate whenever current user-id got switched.
However, someone may define a function that generate RLS policy
depending on the value of "client_min_messages" for example.
Do we need to invalidate prepared statement whenever GUC get
updated? I think it is overkill. We cannot predicate all the criteria
user want to control the RLS policy using the functions.
So, RLSBYPASS permission is more simple way to limit number of
situations to invalidate prepared statements.


If we would have an "ideal optimizer", I'd still like the optimizer to
wipe out redundant clauses transparently, rather than RLSBYPASS
permissions, because it just controls all-or-nothing stuff.
For example, if tuples are categorized to unclassified, classified or
secret, and RLS policy is configured as: ((current_user IN ('alice', 'bob') AND X IN ('unclassified',
'classified')) OR (X IN 'unclassified)),
superuser can see all the tuples, and alice and bob can see
up to classified tuples.
Is it really hard to wipe out redundant condition at planner stage?
If current_user is obviously 'kaigai', it seems to me the left-side of
this clause can be wiped out at the planner stage.
Do I consider the issue too simple?

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Yeb Havinga
Дата:
On 2012-05-30 21:26, Kohei KaiGai wrote:
> If we would have an "ideal optimizer", I'd still like the optimizer to
> wipe out redundant clauses transparently, rather than RLSBYPASS
> permissions, because it just controls all-or-nothing stuff.
> For example, if tuples are categorized to unclassified, classified or
> secret, and RLS policy is configured as:
>    ((current_user IN ('alice', 'bob') AND X IN ('unclassified',
> 'classified')) OR (X IN 'unclassified)),
> superuser can see all the tuples, and alice and bob can see
> up to classified tuples.
> Is it really hard to wipe out redundant condition at planner stage?
> If current_user is obviously 'kaigai', it seems to me the left-side of
> this clause can be wiped out at the planner stage.

The query's RLS policy would be simpler if the RLS policy function that 
returns the WHERE clause would take the user as argument, so its result 
does not contain user conditionals.

IF (current_user IN ('alice', 'bob')
THEN  RETURN X IN ('unclassified', 'classified'))
ELSE  RETURN X IN ('unclassified')
END IF;


regards,
Yeb




Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/5/31 Yeb Havinga <yebhavinga@gmail.com>:
> On 2012-05-30 21:26, Kohei KaiGai wrote:
>>
>> If we would have an "ideal optimizer", I'd still like the optimizer to
>> wipe out redundant clauses transparently, rather than RLSBYPASS
>> permissions, because it just controls all-or-nothing stuff.
>> For example, if tuples are categorized to unclassified, classified or
>> secret, and RLS policy is configured as:
>>   ((current_user IN ('alice', 'bob') AND X IN ('unclassified',
>> 'classified')) OR (X IN 'unclassified)),
>> superuser can see all the tuples, and alice and bob can see
>> up to classified tuples.
>> Is it really hard to wipe out redundant condition at planner stage?
>> If current_user is obviously 'kaigai', it seems to me the left-side of
>> this clause can be wiped out at the planner stage.
>
>
> The query's RLS policy would be simpler if the RLS policy function that
> returns the WHERE clause would take the user as argument, so its result does
> not contain user conditionals.
>
> IF (current_user IN ('alice', 'bob')
> THEN
>  RETURN X IN ('unclassified', 'classified'))
> ELSE
>  RETURN X IN ('unclassified')
> END IF;
>
Yes, it is a happy case. But the point I'm concern about is, the conditions
to branch cases are not limited to current user-id.
The RLS policy shall be appended at planner stage, so prepared statement
needs to be invalidated whenever its prerequisites are changed.
For example, someone may assign a function that returns RLS policy
depending on whether the current hour is even-number of odd-number.
It implies that we cannot predicate all the cases to invalidate prepared
statement with RLS policy, because of too much flexibility.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Robert Haas
Дата:
On Wed, May 30, 2012 at 3:26 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
> My preference is RLSBYPASS permission rather than the approach
> with functions that return policy clause at run-time, because it needs
> to invalidate prepared statement at random timing.
> In case of this function approach, the RLS policy shall be generated
> on planner stage, and we cannot have any assumption to the criteria
> of RLS policy. A function might generate RLS policy regarding to the
> current user id. Yes, it is straightforward. The prepared statement
> should be invalidate whenever current user-id got switched.
> However, someone may define a function that generate RLS policy
> depending on the value of "client_min_messages" for example.
> Do we need to invalidate prepared statement whenever GUC get
> updated? I think it is overkill. We cannot predicate all the criteria
> user want to control the RLS policy using the functions.
> So, RLSBYPASS permission is more simple way to limit number of
> situations to invalidate prepared statements.

That's a good point.

> If we would have an "ideal optimizer", I'd still like the optimizer to
> wipe out redundant clauses transparently, rather than RLSBYPASS
> permissions, because it just controls all-or-nothing stuff.
> For example, if tuples are categorized to unclassified, classified or
> secret, and RLS policy is configured as:
>  ((current_user IN ('alice', 'bob') AND X IN ('unclassified',
> 'classified')) OR (X IN 'unclassified)),
> superuser can see all the tuples, and alice and bob can see
> up to classified tuples.
> Is it really hard to wipe out redundant condition at planner stage?
> If current_user is obviously 'kaigai', it seems to me the left-side of
> this clause can be wiped out at the planner stage.
> Do I consider the issue too simple?

Yes.  :-)

There are two problems.  First, if using the extended query protocol
(e.g. pgbench -M prepared) you can prepare a statement just once and
then execute it multiple times.  In this case, stable-functions cannot
be constant-folded at plan time, because they are only guaranteed to
remain constant for a *single* execution of the query, not for all
executions of the query.  So any optimization in this area would have
to be limited to cases where the simple query protocol is used.  I
think that might still be worth doing, but it's a significant
limitation, to be sure.  Second, at present, there is no guarantee
that the snapshot used for planning the query is the same as the
snapshot used for executing the query, though commit
d573e239f03506920938bf0be56c868d9c3416da made that happen in some
common cases.  If we were to do constant-folding of stable functions
using the planner snapshot, it would represent a behavior change from
previous releases.  I am not clear whether that has any real-world
consequences that we should be worried about.  It seems to me that the
path of least resistance might be to refactor the portal stuff so that
we can provide a uniform guarantee that, when using the simple query
protocol, the planner and executor snapshots will be the same ... but
I might be wrong.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/5/31 Robert Haas <robertmhaas@gmail.com>:
>> If we would have an "ideal optimizer", I'd still like the optimizer to
>> wipe out redundant clauses transparently, rather than RLSBYPASS
>> permissions, because it just controls all-or-nothing stuff.
>> For example, if tuples are categorized to unclassified, classified or
>> secret, and RLS policy is configured as:
>>  ((current_user IN ('alice', 'bob') AND X IN ('unclassified',
>> 'classified')) OR (X IN 'unclassified)),
>> superuser can see all the tuples, and alice and bob can see
>> up to classified tuples.
>> Is it really hard to wipe out redundant condition at planner stage?
>> If current_user is obviously 'kaigai', it seems to me the left-side of
>> this clause can be wiped out at the planner stage.
>> Do I consider the issue too simple?
>
> Yes.  :-)
>
> There are two problems.  First, if using the extended query protocol
> (e.g. pgbench -M prepared) you can prepare a statement just once and
> then execute it multiple times.  In this case, stable-functions cannot
> be constant-folded at plan time, because they are only guaranteed to
> remain constant for a *single* execution of the query, not for all
> executions of the query.  So any optimization in this area would have
> to be limited to cases where the simple query protocol is used.  I
> think that might still be worth doing, but it's a significant
> limitation, to be sure.  Second, at present, there is no guarantee
> that the snapshot used for planning the query is the same as the
> snapshot used for executing the query, though commit
> d573e239f03506920938bf0be56c868d9c3416da made that happen in some
> common cases.  If we were to do constant-folding of stable functions
> using the planner snapshot, it would represent a behavior change from
> previous releases.  I am not clear whether that has any real-world
> consequences that we should be worried about.  It seems to me that the
> path of least resistance might be to refactor the portal stuff so that
> we can provide a uniform guarantee that, when using the simple query
> protocol, the planner and executor snapshots will be the same ... but
> I might be wrong.
>
It may be an option to separate the case into two; a situation to execute
the given query immediately just after optimization and never reused,
and others.
Even though the second situation, it may give us better query execution
plan, if we try to reconstruct query plan just before executor with
assumption that expects immutable / stable function can be replaced
by constant value prior to execution.
In other words, this idea tries to query optimization again on EXECUTE
statement against to its nature, to replace immutable / stable functions
by constant value, and to generate wiser execute plan.
At least, it may make sense to have a flag on prepared statement to
indicate whether it has possible better plan with this re-construction.

Then, if so, we will be able to push the stuff corresponding to
RLSBYPASS into the query optimization, and works transparently
for users.

Isn't it feasible to implement?

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/5/31 Kohei KaiGai <kaigai@kaigai.gr.jp>:
> 2012/5/31 Robert Haas <robertmhaas@gmail.com>:
>>> If we would have an "ideal optimizer", I'd still like the optimizer to
>>> wipe out redundant clauses transparently, rather than RLSBYPASS
>>> permissions, because it just controls all-or-nothing stuff.
>>> For example, if tuples are categorized to unclassified, classified or
>>> secret, and RLS policy is configured as:
>>>  ((current_user IN ('alice', 'bob') AND X IN ('unclassified',
>>> 'classified')) OR (X IN 'unclassified)),
>>> superuser can see all the tuples, and alice and bob can see
>>> up to classified tuples.
>>> Is it really hard to wipe out redundant condition at planner stage?
>>> If current_user is obviously 'kaigai', it seems to me the left-side of
>>> this clause can be wiped out at the planner stage.
>>> Do I consider the issue too simple?
>>
>> Yes.  :-)
>>
>> There are two problems.  First, if using the extended query protocol
>> (e.g. pgbench -M prepared) you can prepare a statement just once and
>> then execute it multiple times.  In this case, stable-functions cannot
>> be constant-folded at plan time, because they are only guaranteed to
>> remain constant for a *single* execution of the query, not for all
>> executions of the query.  So any optimization in this area would have
>> to be limited to cases where the simple query protocol is used.  I
>> think that might still be worth doing, but it's a significant
>> limitation, to be sure.  Second, at present, there is no guarantee
>> that the snapshot used for planning the query is the same as the
>> snapshot used for executing the query, though commit
>> d573e239f03506920938bf0be56c868d9c3416da made that happen in some
>> common cases.  If we were to do constant-folding of stable functions
>> using the planner snapshot, it would represent a behavior change from
>> previous releases.  I am not clear whether that has any real-world
>> consequences that we should be worried about.  It seems to me that the
>> path of least resistance might be to refactor the portal stuff so that
>> we can provide a uniform guarantee that, when using the simple query
>> protocol, the planner and executor snapshots will be the same ... but
>> I might be wrong.
>>
> It may be an option to separate the case into two; a situation to execute
> the given query immediately just after optimization and never reused,
> and others.
> Even though the second situation, it may give us better query execution
> plan, if we try to reconstruct query plan just before executor with
> assumption that expects immutable / stable function can be replaced
> by constant value prior to execution.
> In other words, this idea tries to query optimization again on EXECUTE
> statement against to its nature, to replace immutable / stable functions
> by constant value, and to generate wiser execute plan.
> At least, it may make sense to have a flag on prepared statement to
> indicate whether it has possible better plan with this re-construction.
>
> Then, if so, we will be able to push the stuff corresponding to
> RLSBYPASS into the query optimization, and works transparently
> for users.
>
> Isn't it feasible to implement?
>
If we could replace a particular term that consists of constant values
and stable / immutable functions only by parameter references,
it may enable to handle the term as if a constant value, but actual
calculation is delayed to executor stage.

For example, according to this idea, PREPARE p1(int) AS SELECT * FROM tbl WHERE     current_user in ('alice','bob') AND
X> $1; 
shall be internally rewritten to, PREPARE p1(int) AS SELECT * FROM tbl WHERE     $2 AND X>$1;

then, $2 is implicitly calculated just before execution of this prepared
statement. The snapshot to be used for this calculation is same with
executor's one. It seems to me it is a feasible idea with less invasive
implementation to existing planner.

Does it make sense to describe exceptional condition using regular
clause, instead of special permission?

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Robert Haas
Дата:
On Thu, May 31, 2012 at 3:52 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
> It may be an option to separate the case into two; a situation to execute
> the given query immediately just after optimization and never reused,
> and others.

Yes.  Simon suggested exactly this a while back, and I agree with him
that we ought to have it.

> Even though the second situation, it may give us better query execution
> plan, if we try to reconstruct query plan just before executor with
> assumption that expects immutable / stable function can be replaced
> by constant value prior to execution.
> In other words, this idea tries to query optimization again on EXECUTE
> statement against to its nature, to replace immutable / stable functions
> by constant value, and to generate wiser execute plan.
> At least, it may make sense to have a flag on prepared statement to
> indicate whether it has possible better plan with this re-construction.

This sounds complex and inefficient to me.

> Then, if so, we will be able to push the stuff corresponding to
> RLSBYPASS into the query optimization, and works transparently
> for users.

You're still going to need a way to make sure that the cluster can be
dumped properly.  RLSBYPASS accomplishes that; your scheme doesn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/6/1 Robert Haas <robertmhaas@gmail.com>:
> On Thu, May 31, 2012 at 3:52 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>> It may be an option to separate the case into two; a situation to execute
>> the given query immediately just after optimization and never reused,
>> and others.
>
> Yes.  Simon suggested exactly this a while back, and I agree with him
> that we ought to have it.
>
It is good for me, also.

>> Then, if so, we will be able to push the stuff corresponding to
>> RLSBYPASS into the query optimization, and works transparently
>> for users.
>
> You're still going to need a way to make sure that the cluster can be
> dumped properly.  RLSBYPASS accomplishes that; your scheme doesn't.
>
If something like "has_superuser_privilege() OR" is automatically
appended to user given clauses, it makes sure whole of the database
cluster is dumped.
That also means any permission checks are delayed to executor stage
(except for the case on simple query protocol, I mentioned above),
thus it simplifies the condition to invalidate prepared statement.

One problem I noticed last night around RLSBYPASS approach is:
it can take much number of invalidation whenever current user-id is
switched. Not only SET AUTHORIZATION, but SECURITY DEFINER
function's invocation also. I'm not sure whether this invalidation
storm is reasonable level, or not.

Is it unavailable to handle this type of implicit superuser checks
with existing EXECUTE statement?
I tried to run EXPLAIN with similar case.

postgres=# PREPARE p1(int, bool) AS SELECT * FROM tbl WHERE y > $1 OR $2;
PREPARE
postgres=# EXPLAIN EXECUTE p1(10, current_user in ('kaigai','rhaas'));                      QUERY PLAN
--------------------------------------------------------Seq Scan on tbl  (cost=0.00..21.60 rows=1160 width=40)
(1 row

However,

postgres=# PREPARE p2(int) AS SELECT * FROM tbl                  WHERE y > $1 OR current_user in ('kaigai','rhaas');
PREPARE
postgres=# EXPLAIN EXECUTE p2(10);                                QUERY PLAN
-----------------------------------------------------------------------------Seq Scan on tbl  (cost=0.00..30.30
rows=394width=40)  Filter: ((y > 10) OR ("current_user"() = ANY ('{kaigai,rhaas}'::name[]))) 
(2 rows)

Please assume the second condition something like superuser
checks in addition to the main security policy.
It implies an idea to replace a certain portion of clause that
consists of only constant values and stable / immutable functions
by shadow parameter to be calculated at executor stage, makes
sense to wipe out RLS policy for superusers. In addition, it requires
no new invalidation mechanism to prepared statements.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Robert Haas
Дата:
On Sat, Jun 2, 2012 at 12:58 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
> 2012/6/1 Robert Haas <robertmhaas@gmail.com>:
>> On Thu, May 31, 2012 at 3:52 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>>> It may be an option to separate the case into two; a situation to execute
>>> the given query immediately just after optimization and never reused,
>>> and others.
>>
>> Yes.  Simon suggested exactly this a while back, and I agree with him
>> that we ought to have it.
>>
> It is good for me, also.
>
>>> Then, if so, we will be able to push the stuff corresponding to
>>> RLSBYPASS into the query optimization, and works transparently
>>> for users.
>>
>> You're still going to need a way to make sure that the cluster can be
>> dumped properly.  RLSBYPASS accomplishes that; your scheme doesn't.
>>
> If something like "has_superuser_privilege() OR" is automatically
> appended to user given clauses, it makes sure whole of the database
> cluster is dumped.
> That also means any permission checks are delayed to executor stage
> (except for the case on simple query protocol, I mentioned above),
> thus it simplifies the condition to invalidate prepared statement.
>
> One problem I noticed last night around RLSBYPASS approach is:
> it can take much number of invalidation whenever current user-id is
> switched. Not only SET AUTHORIZATION, but SECURITY DEFINER
> function's invocation also. I'm not sure whether this invalidation
> storm is reasonable level, or not.
>
> Is it unavailable to handle this type of implicit superuser checks
> with existing EXECUTE statement?
> I tried to run EXPLAIN with similar case.
>
> postgres=# PREPARE p1(int, bool) AS SELECT * FROM tbl WHERE y > $1 OR $2;
> PREPARE
> postgres=# EXPLAIN EXECUTE p1(10, current_user in ('kaigai','rhaas'));
>                       QUERY PLAN
> --------------------------------------------------------
>  Seq Scan on tbl  (cost=0.00..21.60 rows=1160 width=40)
> (1 row
>
> However,
>
> postgres=# PREPARE p2(int) AS SELECT * FROM tbl
>                   WHERE y > $1 OR current_user in ('kaigai','rhaas');
> PREPARE
> postgres=# EXPLAIN EXECUTE p2(10);
>                                 QUERY PLAN
> -----------------------------------------------------------------------------
>  Seq Scan on tbl  (cost=0.00..30.30 rows=394 width=40)
>   Filter: ((y > 10) OR ("current_user"() = ANY ('{kaigai,rhaas}'::name[])))
> (2 rows)
>
> Please assume the second condition something like superuser
> checks in addition to the main security policy.
> It implies an idea to replace a certain portion of clause that
> consists of only constant values and stable / immutable functions
> by shadow parameter to be calculated at executor stage, makes
> sense to wipe out RLS policy for superusers. In addition, it requires
> no new invalidation mechanism to prepared statements.

I'm not sure how best to handle the invalidation issue... but I still
think that relying on the query planner to use theorem-proving to
simplify RLS conditions is going to be problematic.  You can certainly
try it ad see how it comes out...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/6/4 Robert Haas <robertmhaas@gmail.com>:
> On Sat, Jun 2, 2012 at 12:58 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>> 2012/6/1 Robert Haas <robertmhaas@gmail.com>:
>>> On Thu, May 31, 2012 at 3:52 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>>>> It may be an option to separate the case into two; a situation to execute
>>>> the given query immediately just after optimization and never reused,
>>>> and others.
>>>
>>> Yes.  Simon suggested exactly this a while back, and I agree with him
>>> that we ought to have it.
>>>
>> It is good for me, also.
>>
>>>> Then, if so, we will be able to push the stuff corresponding to
>>>> RLSBYPASS into the query optimization, and works transparently
>>>> for users.
>>>
>>> You're still going to need a way to make sure that the cluster can be
>>> dumped properly.  RLSBYPASS accomplishes that; your scheme doesn't.
>>>
>> If something like "has_superuser_privilege() OR" is automatically
>> appended to user given clauses, it makes sure whole of the database
>> cluster is dumped.
>> That also means any permission checks are delayed to executor stage
>> (except for the case on simple query protocol, I mentioned above),
>> thus it simplifies the condition to invalidate prepared statement.
>>
>> One problem I noticed last night around RLSBYPASS approach is:
>> it can take much number of invalidation whenever current user-id is
>> switched. Not only SET AUTHORIZATION, but SECURITY DEFINER
>> function's invocation also. I'm not sure whether this invalidation
>> storm is reasonable level, or not.
>>
>> Is it unavailable to handle this type of implicit superuser checks
>> with existing EXECUTE statement?
>> I tried to run EXPLAIN with similar case.
>>
>> postgres=# PREPARE p1(int, bool) AS SELECT * FROM tbl WHERE y > $1 OR $2;
>> PREPARE
>> postgres=# EXPLAIN EXECUTE p1(10, current_user in ('kaigai','rhaas'));
>>                       QUERY PLAN
>> --------------------------------------------------------
>>  Seq Scan on tbl  (cost=0.00..21.60 rows=1160 width=40)
>> (1 row
>>
>> However,
>>
>> postgres=# PREPARE p2(int) AS SELECT * FROM tbl
>>                   WHERE y > $1 OR current_user in ('kaigai','rhaas');
>> PREPARE
>> postgres=# EXPLAIN EXECUTE p2(10);
>>                                 QUERY PLAN
>> -----------------------------------------------------------------------------
>>  Seq Scan on tbl  (cost=0.00..30.30 rows=394 width=40)
>>   Filter: ((y > 10) OR ("current_user"() = ANY ('{kaigai,rhaas}'::name[])))
>> (2 rows)
>>
>> Please assume the second condition something like superuser
>> checks in addition to the main security policy.
>> It implies an idea to replace a certain portion of clause that
>> consists of only constant values and stable / immutable functions
>> by shadow parameter to be calculated at executor stage, makes
>> sense to wipe out RLS policy for superusers. In addition, it requires
>> no new invalidation mechanism to prepared statements.
>
> I'm not sure how best to handle the invalidation issue... but I still
> think that relying on the query planner to use theorem-proving to
> simplify RLS conditions is going to be problematic.  You can certainly
> try it ad see how it comes out...
>
I think, the enhancement of planner can be handled independently
from RLS policy stuff, and we should avoid to contain many complex
stuff into one functionality.
At least, it will work correctly with idea to append implicit condition
("OR has_superuser_privilege()") without any invalidation mechanism,
although its performance penalty is not negligible; without planner-
enhancement.
How about your opinion?

I'm worry about future maintenance issues, once we have
RLSBYPASS permission or something user visible...

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Florian Pflug
Дата:
On Jun4, 2012, at 17:38 , Kohei KaiGai wrote:
> I'm worry about future maintenance issues, once we have
> RLSBYPASS permission or something user visible…

I fear that without a generic way to disable RLS regardless which
RLS policy function is in effect, we're creating a huge maintenance
issue for DBAs. In a lot of shops, the DBA is responsible for a large
number of databases, each potentially using a completely different
approach to RLS and hence a completely different policy function.

Without something like RLSBYPASS, the DBA needs to have intimate
knowledge about the different RLS policies to e.g. guarantee that his
backups aren't missing crucial information, or that the replication
system indeed replicates all rows.

With RLSBYPASS, all he needs to do is grant one privilege to his
replication or backup user. The rest can be left to the development
or support team for a specific application.

best regards,
Florian Pflug



Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/6/4 Florian Pflug <fgp@phlo.org>:
> On Jun4, 2012, at 17:38 , Kohei KaiGai wrote:
>> I'm worry about future maintenance issues, once we have
>> RLSBYPASS permission or something user visible…
>
> I fear that without a generic way to disable RLS regardless which
> RLS policy function is in effect, we're creating a huge maintenance
> issue for DBAs. In a lot of shops, the DBA is responsible for a large
> number of databases, each potentially using a completely different
> approach to RLS and hence a completely different policy function.
>
Here is two problems around RLSBYPASS. The first is we have
no idea to handle invalidation of prepared-statement when current
user is switched, right now. The second is we can have another
way to describe same RLS policy without PG original enhancement
towards permission mechanism...

> Without something like RLSBYPASS, the DBA needs to have intimate
> knowledge about the different RLS policies to e.g. guarantee that his
> backups aren't missing crucial information, or that the replication
> system indeed replicates all rows.
>
> With RLSBYPASS, all he needs to do is grant one privilege to his
> replication or backup user. The rest can be left to the development
> or support team for a specific application.
>
It seems to me you can define a function which implements site-
specific security requirement (E.g "backup should not be prevented
by RLS policy"), then include it as a part of RLS policy
(or implicitly added by extensions, like sepgsql tries to do).

These are the reason why I hesitate to go ahead with RLSBYPASS
permission.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Tom Lane
Дата:
Kohei KaiGai <kaigai@kaigai.gr.jp> writes:
> Here is two problems around RLSBYPASS. The first is we have
> no idea to handle invalidation of prepared-statement when current
> user is switched, right now.

How is that specifically the fault of RLSBYPASS?  *Any* of the schemes
you're proposing for inlined RLS checks will have problems with userID
switching.

My guess is we'd have to treat the effective userID as part of the
plancache lookup key to make it safe to inline anything related to RLS.
        regards, tom lane


Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/6/4 Tom Lane <tgl@sss.pgh.pa.us>:
> Kohei KaiGai <kaigai@kaigai.gr.jp> writes:
>> Here is two problems around RLSBYPASS. The first is we have
>> no idea to handle invalidation of prepared-statement when current
>> user is switched, right now.
>
> How is that specifically the fault of RLSBYPASS?  *Any* of the schemes
> you're proposing for inlined RLS checks will have problems with userID
> switching.
>
Really? I don't find out a scenario that cause a problem with user-id
switching in case when RLS policy is *unconditionally* appended then
evaluated on executor stage. I'd like to see the scenario.

> My guess is we'd have to treat the effective userID as part of the
> plancache lookup key to make it safe to inline anything related to RLS.
>
It might be a solution, if we append individual RLS policy at the
planner stage, depending on user-id.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Florian Pflug
Дата:
On Jun4, 2012, at 18:38 , Kohei KaiGai wrote:
> 2012/6/4 Florian Pflug <fgp@phlo.org>:
>> Without something like RLSBYPASS, the DBA needs to have intimate
>> knowledge about the different RLS policies to e.g. guarantee that his
>> backups aren't missing crucial information, or that the replication
>> system indeed replicates all rows.
>> 
>> With RLSBYPASS, all he needs to do is grant one privilege to his
>> replication or backup user. The rest can be left to the development
>> or support team for a specific application.

> It seems to me you can define a function which implements site-
> specific security requirement (E.g "backup should not be prevented
> by RLS policy"), then include it as a part of RLS policy
> (or implicitly added by extensions, like sepgsql tries to do).

Sure. But that requires each and every application which uses RLS
to provide support for special backup (or replication, or whatever)
privileges. And it requires the DBA to know how to assign these
privileges to certain roles for each any every application in question.
For shops which uses a lot of different applications, all with their
own RLS policy, that can quickly get out of hand.

Plus, a bug in one of these RLS policies has the potential to render
backups incomplete.

best regards,
Florian Pflug



Re: [RFC] Interface of Row Level Security

От
Tom Lane
Дата:
Florian Pflug <fgp@phlo.org> writes:
> On Jun4, 2012, at 18:38 , Kohei KaiGai wrote:
>> 2012/6/4 Florian Pflug <fgp@phlo.org>:
>>> Without something like RLSBYPASS, the DBA needs to have intimate
>>> knowledge about the different RLS policies to e.g. guarantee that his
>>> backups aren't missing crucial information, or that the replication
>>> system indeed replicates all rows.

>> It seems to me you can define a function which implements site-
>> specific security requirement (E.g "backup should not be prevented
>> by RLS policy"), then include it as a part of RLS policy
>> (or implicitly added by extensions, like sepgsql tries to do).

> Sure. But that requires each and every application which uses RLS
> to provide support for special backup (or replication, or whatever)
> privileges. And it requires the DBA to know how to assign these
> privileges to certain roles for each any every application in question.
> For shops which uses a lot of different applications, all with their
> own RLS policy, that can quickly get out of hand.

> Plus, a bug in one of these RLS policies has the potential to render
> backups incomplete.

I agree with Florian here: if there is no way to disable RLS for
backups, the database will be un-administratable.  RLSBYPASS is not
necessarily the only or best way to provide such an override, but we
have to have something that is simple, foolproof, and *not* dependent
on the details of any one RLS policy.

I suspect that KaiGai-san's objection basically comes down to not
wanting to have what amounts to a backdoor in RLS policies.  However,
what Florian is saying is that you have to have a backdoor anyway,
unless you'd like to be at risk of losing data because it wasn't
backed up.  You can either have one well-understood, well-documented,
well-tested backdoor, or you can have an ad-hoc backdoor in each RLS
policy.  Nobody can think that the latter approach is preferable.
        regards, tom lane


Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/6/5 Tom Lane <tgl@sss.pgh.pa.us>:
> Florian Pflug <fgp@phlo.org> writes:
>> On Jun4, 2012, at 18:38 , Kohei KaiGai wrote:
>>> 2012/6/4 Florian Pflug <fgp@phlo.org>:
>>>> Without something like RLSBYPASS, the DBA needs to have intimate
>>>> knowledge about the different RLS policies to e.g. guarantee that his
>>>> backups aren't missing crucial information, or that the replication
>>>> system indeed replicates all rows.
>
>>> It seems to me you can define a function which implements site-
>>> specific security requirement (E.g "backup should not be prevented
>>> by RLS policy"), then include it as a part of RLS policy
>>> (or implicitly added by extensions, like sepgsql tries to do).
>
>> Sure. But that requires each and every application which uses RLS
>> to provide support for special backup (or replication, or whatever)
>> privileges. And it requires the DBA to know how to assign these
>> privileges to certain roles for each any every application in question.
>> For shops which uses a lot of different applications, all with their
>> own RLS policy, that can quickly get out of hand.
>
>> Plus, a bug in one of these RLS policies has the potential to render
>> backups incomplete.
>
> I agree with Florian here: if there is no way to disable RLS for
> backups, the database will be un-administratable.  RLSBYPASS is not
> necessarily the only or best way to provide such an override, but we
> have to have something that is simple, foolproof, and *not* dependent
> on the details of any one RLS policy.
>
> I suspect that KaiGai-san's objection basically comes down to not
> wanting to have what amounts to a backdoor in RLS policies.  However,
> what Florian is saying is that you have to have a backdoor anyway,
> unless you'd like to be at risk of losing data because it wasn't
> backed up.  You can either have one well-understood, well-documented,
> well-tested backdoor, or you can have an ad-hoc backdoor in each RLS
> policy.  Nobody can think that the latter approach is preferable.
>
At least, database superusers shall bypass the RLS policy; it is a well
understandable behavior and an approach to minimize the back-door;
and allows to get complete database backup.

It is easy to add a special privilege mechanism to bypass RLS policy
later, however, not easy in opposite side. It seems to me a reasonable
start-up to allow only superusers to bypass RLS policy.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Florian Pflug
Дата:
On Jun5, 2012, at 10:22 , Kohei KaiGai wrote:
> 2012/6/5 Tom Lane <tgl@sss.pgh.pa.us>:
>> I suspect that KaiGai-san's objection basically comes down to not
>> wanting to have what amounts to a backdoor in RLS policies.  However,
>> what Florian is saying is that you have to have a backdoor anyway,
>> unless you'd like to be at risk of losing data because it wasn't
>> backed up.  You can either have one well-understood, well-documented,
>> well-tested backdoor, or you can have an ad-hoc backdoor in each RLS
>> policy.  Nobody can think that the latter approach is preferable.

> At least, database superusers shall bypass the RLS policy; it is a well
> understandable behavior and an approach to minimize the back-door;
> and allows to get complete database backup.

I don't think we want to force people to run stuff with superuser
privileges just for the sake of bypassing a RLS policy. On the whole,
that reduces the overall security, not adds to it.

> It is easy to add a special privilege mechanism to bypass RLS policy
> later, however, not easy in opposite side. It seems to me a reasonable
> start-up to allow only superusers to bypass RLS policy.

What's to be gained by that? Once there's *any* way to bypass a RLS
policy, you'll have to deal with the plan invalidation issues you
mentioned anyway. ISTM that complexity-wide, you don't save much by not
having RLSBYPASS (or something similar), but feature-wise you lose at
lot...

best regards,
Florian Pflug



Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/6/5 Florian Pflug <fgp@phlo.org>:
> On Jun5, 2012, at 10:22 , Kohei KaiGai wrote:
>> 2012/6/5 Tom Lane <tgl@sss.pgh.pa.us>:
>>> I suspect that KaiGai-san's objection basically comes down to not
>>> wanting to have what amounts to a backdoor in RLS policies.  However,
>>> what Florian is saying is that you have to have a backdoor anyway,
>>> unless you'd like to be at risk of losing data because it wasn't
>>> backed up.  You can either have one well-understood, well-documented,
>>> well-tested backdoor, or you can have an ad-hoc backdoor in each RLS
>>> policy.  Nobody can think that the latter approach is preferable.
>
>> At least, database superusers shall bypass the RLS policy; it is a well
>> understandable behavior and an approach to minimize the back-door;
>> and allows to get complete database backup.
>
> I don't think we want to force people to run stuff with superuser
> privileges just for the sake of bypassing a RLS policy. On the whole,
> that reduces the overall security, not adds to it.
>
I can understand your opinion. But I'm still uncertain whether the new
permission is the best idea we can offer...

>> It is easy to add a special privilege mechanism to bypass RLS policy
>> later, however, not easy in opposite side. It seems to me a reasonable
>> start-up to allow only superusers to bypass RLS policy.
>
> What's to be gained by that? Once there's *any* way to bypass a RLS
> policy, you'll have to deal with the plan invalidation issues you
> mentioned anyway. ISTM that complexity-wide, you don't save much by not
> having RLSBYPASS (or something similar), but feature-wise you lose at
> lot...
>
All we need to change is selection of the function to be appended
automatically. In case when superusers are allowed to bypass RLS
policy, "OR has_superuser_privilege()" shall be appended to the
user given clause, however, "OR has_table_privilege()" shall be
appended instead in case of RLSBYPASS permission.
(Note that has_table_privilege() always true for superusers.)

I think it does not require to add a mechanism to invalidate
prepared-statement, because all the checks are applied on
executor stage. And these functions can be stable functions,
so I believe some enhancements at planner will correctly wipe
out prior to query execution at the next step.
Conditional RLS policy in planner stage seems to me problematic
so I don't want to include this feature on the first version.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Florian Pflug
Дата:
On Jun5, 2012, at 11:43 , Kohei KaiGai wrote:
> 2012/6/5 Florian Pflug <fgp@phlo.org>:
>> What's to be gained by that? Once there's *any* way to bypass a RLS
>> policy, you'll have to deal with the plan invalidation issues you
>> mentioned anyway. ISTM that complexity-wide, you don't save much by not
>> having RLSBYPASS (or something similar), but feature-wise you lose at
>> lot…

> All we need to change is selection of the function to be appended
> automatically. In case when superusers are allowed to bypass RLS
> policy, "OR has_superuser_privilege()" shall be appended to the
> user given clause, however, "OR has_table_privilege()" shall be
> appended instead in case of RLSBYPASS permission.
> (Note that has_table_privilege() always true for superusers.)
>
> I think it does not require to add a mechanism to invalidate
> prepared-statement, because all the checks are applied on
> executor stage. And these functions can be stable functions,
> so I believe some enhancements at planner will correctly wipe
> out prior to query execution at the next step.

That won't work, since the current role can change any time mid-query,
at least if you're using cursors. Here's an example

First, setup a table
> CREATE TABLE data AS SELECT id FROM generate_series(1,1000000) AS id;
> GRANT SELECT ON data TO PUBLIC;

Then create a cursor which doesn't materialize its result
> BEGIN;
> DECLARE mycursor NO SCROLL CURSOR WITHOUT HOLD FOR SELECT id, current_user FROM data;
> SET ROLE=anybody;
> FETCH mycursor;id | current_user
----+-------------- 1 | anybody
(1 row)
> SET role=somebody;
> FETCH mycursor;id | current_user
----+-------------- 2 | somebody
(1 row)
> ROLLBACK;

And now, for comparison, a cursor which does materialize its result
> BEGIN;
> DECLARE mycursor NO SCROLL CURSOR WITHOUT HOLD FOR SELECT id, current_user FROM data ORDER BY id;
> SET ROLE=anybody;
> FETCH mycursor;id | current_user
----+-------------- 1 | anybody
(1 row)

> SET role=somebody;
> FETCH mycursor;id | current_user
----+-------------- 2 | anybody
(1 row)

Imagine that the current_user function call was actually part of the
RLS policy. Then the example above shows that *which* role we check
access against (the one active at the first fetch, or currently active
one) depends on which plan the optimizer happens to pick. This, IMHO,
is not acceptable for a security feature like RLS.

best regards,
Florian Pflug



Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/6/5 Florian Pflug <fgp@phlo.org>:
> On Jun5, 2012, at 11:43 , Kohei KaiGai wrote:
>> 2012/6/5 Florian Pflug <fgp@phlo.org>:
>>> What's to be gained by that? Once there's *any* way to bypass a RLS
>>> policy, you'll have to deal with the plan invalidation issues you
>>> mentioned anyway. ISTM that complexity-wide, you don't save much by not
>>> having RLSBYPASS (or something similar), but feature-wise you lose at
>>> lot…
>
>> All we need to change is selection of the function to be appended
>> automatically. In case when superusers are allowed to bypass RLS
>> policy, "OR has_superuser_privilege()" shall be appended to the
>> user given clause, however, "OR has_table_privilege()" shall be
>> appended instead in case of RLSBYPASS permission.
>> (Note that has_table_privilege() always true for superusers.)
>>
>> I think it does not require to add a mechanism to invalidate
>> prepared-statement, because all the checks are applied on
>> executor stage. And these functions can be stable functions,
>> so I believe some enhancements at planner will correctly wipe
>> out prior to query execution at the next step.
>
> That won't work, since the current role can change any time mid-query,
> at least if you're using cursors. Here's an example
>
> First, setup a table
>> CREATE TABLE data AS SELECT id FROM generate_series(1,1000000) AS id;
>> GRANT SELECT ON data TO PUBLIC;
>
> Then create a cursor which doesn't materialize its result
>> BEGIN;
>> DECLARE mycursor NO SCROLL CURSOR WITHOUT HOLD FOR
>  SELECT id, current_user FROM data;
>> SET ROLE=anybody;
>> FETCH mycursor;
>  id | current_user
> ----+--------------
>  1 | anybody
> (1 row)
>> SET role=somebody;
>> FETCH mycursor;
>  id | current_user
> ----+--------------
>  2 | somebody
> (1 row)
>> ROLLBACK;
>
> And now, for comparison, a cursor which does materialize its result
>> BEGIN;
>> DECLARE mycursor NO SCROLL CURSOR WITHOUT HOLD FOR
>  SELECT id, current_user FROM data ORDER BY id;
>> SET ROLE=anybody;
>> FETCH mycursor;
>  id | current_user
> ----+--------------
>  1 | anybody
> (1 row)
>
>> SET role=somebody;
>> FETCH mycursor;
>  id | current_user
> ----+--------------
>  2 | anybody
> (1 row)
>
> Imagine that the current_user function call was actually part of the
> RLS policy. Then the example above shows that *which* role we check
> access against (the one active at the first fetch, or currently active
> one) depends on which plan the optimizer happens to pick. This, IMHO,
> is not acceptable for a security feature like RLS.
>
Which is your expected behavior in this example?

In case when the query is internally materialized, the current_user function
is *already* executed prior to switch user-id, thus, the result set internally
stored in was already filtered out using older user-id.

It seems to me, caution of the problem is current_user is performing out
of the snapshot controls. According to the definition of stable function,
its result should not be changed during a particular scan.
At least, it is not a fundamental problem of RLS implementation, although
it needs to take an enhancement something like effective user-id per
snapshot basis.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Florian Pflug
Дата:
On Jun5, 2012, at 15:07 , Kohei KaiGai wrote:
> 2012/6/5 Florian Pflug <fgp@phlo.org>:
>> On Jun5, 2012, at 11:43 , Kohei KaiGai wrote:
>>> I think it does not require to add a mechanism to invalidate
>>> prepared-statement, because all the checks are applied on
>>> executor stage. And these functions can be stable functions,
>>> so I believe some enhancements at planner will correctly wipe
>>> out prior to query execution at the next step.

>> That won't work, since the current role can change any time mid-query,
>> at least if you're using cursors. Here's an example

<snipped example>

>> Imagine that the current_user function call was actually part of the
>> RLS policy. Then the example above shows that *which* role we check
>> access against (the one active at the first fetch, or currently active
>> one) depends on which plan the optimizer happens to pick. This, IMHO,
>> is not acceptable for a security feature like RLS.

> Which is your expected behavior in this example?

I can live with any behaviour, as long as it doesn't depends on details
of the query plan. My vote would be for always using the role which was
active at statement creation time (i.e. at PREPARE/DECLARE time) for
RLS purposes though. That way, we could treat the role as being IMMUTABLE
I think.

> In case when the query is internally materialized, the current_user function
> is *already* executed prior to switch user-id, thus, the result set internally
> stored in was already filtered out using older user-id.

Exactly. But whether or not the query, some parts of it, or even only some
parts of the policy function are materialized is outside the control of the
user and may change at any time.

> It seems to me, caution of the problem is current_user is performing out
> of the snapshot controls. According to the definition of stable function,
> its result should not be changed during a particular scan.

Yeah, well, ISTM that our definition of STABLE doesn't really take functions
whose return value depends on GUCs into account.

> At least, it is not a fundamental problem of RLS implementation, although
> it needs to take an enhancement something like effective user-id per
> snapshot basis.

Right. We need something like "statement_role" which returns the role the
statement is to be executed as (However we define that, though as I said
above my vote is for it being the role which created the statement). To make
that work, however, the plan cache needs to store that role, since the
statement might be re-planned after its initial creation, but before it is
executed. At which point the amount of complexity saved by not implementing
RLSBYPASS is very nearly zero.

To reiterate, my point is that we won't get away with zero planner changes
even without RLSBYPASS. In fact, we'll be paying *most* of the complexity
costs of RLSBYPASS whether we actually add that feature or not. Which makes
not adding it look like a pretty bad deal to me.

best regards,
Florian Pflug



Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
2012/6/5 Florian Pflug <fgp@phlo.org>:
> On Jun5, 2012, at 15:07 , Kohei KaiGai wrote:
>> 2012/6/5 Florian Pflug <fgp@phlo.org>:
>>> On Jun5, 2012, at 11:43 , Kohei KaiGai wrote:
>>>> I think it does not require to add a mechanism to invalidate
>>>> prepared-statement, because all the checks are applied on
>>>> executor stage. And these functions can be stable functions,
>>>> so I believe some enhancements at planner will correctly wipe
>>>> out prior to query execution at the next step.
>
>>> That won't work, since the current role can change any time mid-query,
>>> at least if you're using cursors. Here's an example
>
> <snipped example>
>
>>> Imagine that the current_user function call was actually part of the
>>> RLS policy. Then the example above shows that *which* role we check
>>> access against (the one active at the first fetch, or currently active
>>> one) depends on which plan the optimizer happens to pick. This, IMHO,
>>> is not acceptable for a security feature like RLS.
>
>> Which is your expected behavior in this example?
>
> I can live with any behaviour, as long as it doesn't depends on details
> of the query plan. My vote would be for always using the role which was
> active at statement creation time (i.e. at PREPARE/DECLARE time) for
> RLS purposes though. That way, we could treat the role as being IMMUTABLE
> I think.
>
I have same opinion at the point where the "active role" should be consistent,
but a bit different in details. I think, it should be a timing at
beginning of execution,
not plan creation time, like as existing permission checks are applied
on InitPlan()
or its callee.
So, I believe EXECUTE is more right place to check it rather than PREPARE.
Although FETCH command actually handles query execution, do we ought to
consider DECLARE do both of query plan and execution from the perspective
of model?

>> It seems to me, caution of the problem is current_user is performing out
>> of the snapshot controls. According to the definition of stable function,
>> its result should not be changed during a particular scan.
>
> Yeah, well, ISTM that our definition of STABLE doesn't really take functions
> whose return value depends on GUCs into account.
>
Probably, has_XXX_privilege() should work with reliable data source being
safe from user-id switching. But it is a tough work to enhance whole of the
GUC stuff for snapshot aware...

>> At least, it is not a fundamental problem of RLS implementation, although
>> it needs to take an enhancement something like effective user-id per
>> snapshot basis.
>
> Right. We need something like "statement_role" which returns the role the
> statement is to be executed as (However we define that, though as I said
> above my vote is for it being the role which created the statement). To make
> that work, however, the plan cache needs to store that role, since the
> statement might be re-planned after its initial creation, but before it is
> executed. At which point the amount of complexity saved by not implementing
> RLSBYPASS is very nearly zero.
>
I'm not sure why the plan cache needs to be stored with a particular user-id.
As long as we check permissions at executor stage, all we need to do is
implementation of a function to check permission based on a particular
user-id at the timing of executor initialization.

And, I think it can be a separated efforts from the first version of RLS.

> To reiterate, my point is that we won't get away with zero planner changes
> even without RLSBYPASS. In fact, we'll be paying *most* of the complexity
> costs of RLSBYPASS whether we actually add that feature or not. Which makes
> not adding it look like a pretty bad deal to me.
>
It seems to me, 95% of our opinion is common, except for a few detailed stuff.
I never dislike the idea of RLSBYPASS permission, but I'd like to investigate
this idea with more time, until v9.3 being closed.
At least, nobody opposed to allow superusers to bypass RLS policy.
So, it can be a minimum startup point, isn't it? RLS patch will takes hundreds
of lines due to syntax support or query rewriting stuff. It is always good idea
to focus on the core functionality at the starting point.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: [RFC] Interface of Row Level Security

От
Florian Pflug
Дата:
On Jun5, 2012, at 22:33 , Kohei KaiGai wrote:
> 2012/6/5 Florian Pflug <fgp@phlo.org>:
>> I can live with any behaviour, as long as it doesn't depends on details
>> of the query plan. My vote would be for always using the role which was
>> active at statement creation time (i.e. at PREPARE/DECLARE time) for
>> RLS purposes though. That way, we could treat the role as being IMMUTABLE
>> I think.

> I have same opinion at the point where the "active role" should be consistent,
> but a bit different in details. I think, it should be a timing at
> beginning of execution,
> not plan creation time, like as existing permission checks are applied
> on InitPlan()
> or its callee.

From a consistency POV, I'd agree. But binding the active role after
planning means giving up a lot of query optimization opportunities. It
will prevent the role from participating in constant folding, which
translates to potentially enormous loss of performance. Take the following
policy function as an example
 (role = 'some_role' AND expensive_function(row)) OR (role = 'other_role' AND row.field = 'some_value')

and assume that there's an index on 'field'. Now, if role is known to be
'other_row', the planner can eliminate the expensive_function() from the
plan, and satisfy row.field = 'some_value' with an index scan. Without
that knowledge, you'll probably get a sequential scan.

I do value consistency, but in this case the benefit of not being
consistent with other privilege checks outweighs the drawbacks I think.

>>> It seems to me, caution of the problem is current_user is performing out
>>> of the snapshot controls. According to the definition of stable function,
>>> its result should not be changed during a particular scan.
>>
>> Yeah, well, ISTM that our definition of STABLE doesn't really take functions
>> whose return value depends on GUCs into account.

> Probably, has_XXX_privilege() should work with reliable data source being
> safe from user-id switching. But it is a tough work to enhance whole of the
> GUC stuff for snapshot aware…

We don't need that for every GUC, though. I'd say its sufficient to somehow
provide the policy function with a stable value of the active role (whatever
"active" ends up meaning). If we do that, we can simply document that making
the policy function depend on other GUCs is a seriously bad idea.

We *do* need that stable "active role" value though, since we cannot very
well advise against using current_role in the policy function without providing
an alternativ…

>>> At least, it is not a fundamental problem of RLS implementation, although
>>> it needs to take an enhancement something like effective user-id per
>>> snapshot basis.
>>
>> Right. We need something like "statement_role" which returns the role the
>> statement is to be executed as (However we define that, though as I said
>> above my vote is for it being the role which created the statement). To make
>> that work, however, the plan cache needs to store that role, since the
>> statement might be re-planned after its initial creation, but before it is
>> executed. At which point the amount of complexity saved by not implementing
>> RLSBYPASS is very nearly zero.

> I'm not sure why the plan cache needs to be stored with a particular user-id.
> As long as we check permissions at executor stage, all we need to do is
> implementation of a function to check permission based on a particular
> user-id at the timing of executor initialization.

Hm, yeah, if you defer everything to execution time, that's true.

>> To reiterate, my point is that we won't get away with zero planner changes
>> even without RLSBYPASS. In fact, we'll be paying *most* of the complexity
>> costs of RLSBYPASS whether we actually add that feature or not. Which makes
>> not adding it look like a pretty bad deal to me.

> It seems to me, 95% of our opinion is common, except for a few detailed stuff.
> I never dislike the idea of RLSBYPASS permission, but I'd like to investigate
> this idea with more time, until v9.3 being closed.
> At least, nobody opposed to allow superusers to bypass RLS policy.
> So, it can be a minimum startup point, isn't it? RLS patch will takes hundreds
> of lines due to syntax support or query rewriting stuff. It is always good idea
> to focus on the core functionality at the starting point.

Absolutely. But at the same time, it's important to check that the design allows
the additional features to be added later easily. In the case of RLS, I'm
worried that decreeing that it's the role at execution time, not planning
time, that counts, we're painting ourselves into a corner. I view RLSBYPASS
as a good sanity check for that. Another one is that GRANT-with-filter-function
idea. Both seem to fall into place quite naturally if handled while planning -
you simply have to add decide which additional WHERE clause to add, if any.
Without any negative performance effects from policies which don't apply to
the current role, if the optimizer does it's job. Getting the same level of
performance when policies are added unconditionally seems quite impossible.
You might be able to make up for some losses by caching STABLE values (which
amounts to another constant folding pass), but you won't ever be able to make up
for a lost index scan opportunity or other optimizations which change the structure
of the whole query plan.

best regards,
Florian Pflug



Re: [RFC] Interface of Row Level Security

От
Kohei KaiGai
Дата:
Sorry for my late reply.

2012/6/6 Florian Pflug <fgp@phlo.org>:
> On Jun5, 2012, at 22:33 , Kohei KaiGai wrote:
>> 2012/6/5 Florian Pflug <fgp@phlo.org>:
>>> I can live with any behaviour, as long as it doesn't depends on details
>>> of the query plan. My vote would be for always using the role which was
>>> active at statement creation time (i.e. at PREPARE/DECLARE time) for
>>> RLS purposes though. That way, we could treat the role as being IMMUTABLE
>>> I think.
>
>> I have same opinion at the point where the "active role" should be consistent,
>> but a bit different in details. I think, it should be a timing at
>> beginning of execution,
>> not plan creation time, like as existing permission checks are applied
>> on InitPlan()
>> or its callee.
>
> From a consistency POV, I'd agree. But binding the active role after
> planning means giving up a lot of query optimization opportunities. It
> will prevent the role from participating in constant folding, which
> translates to potentially enormous loss of performance. Take the following
> policy function as an example
>
>  (role = 'some_role' AND expensive_function(row)) OR
>  (role = 'other_role' AND row.field = 'some_value')
>
> and assume that there's an index on 'field'. Now, if role is known to be
> 'other_row', the planner can eliminate the expensive_function() from the
> plan, and satisfy row.field = 'some_value' with an index scan. Without
> that knowledge, you'll probably get a sequential scan.
>
> I do value consistency, but in this case the benefit of not being
> consistent with other privilege checks outweighs the drawbacks I think.
>
I never deny some possibility to have special optimization using some
information to be determined at planner stage, in the near future.
However, I wonder the first version of this patch should include all
the desirable features at once.

>>>> It seems to me, caution of the problem is current_user is performing out
>>>> of the snapshot controls. According to the definition of stable function,
>>>> its result should not be changed during a particular scan.
>>>
>>> Yeah, well, ISTM that our definition of STABLE doesn't really take functions
>>> whose return value depends on GUCs into account.
>
>> Probably, has_XXX_privilege() should work with reliable data source being
>> safe from user-id switching. But it is a tough work to enhance whole of the
>> GUC stuff for snapshot aware…
>
> We don't need that for every GUC, though. I'd say its sufficient to somehow
> provide the policy function with a stable value of the active role (whatever
> "active" ends up meaning). If we do that, we can simply document that making
> the policy function depend on other GUCs is a seriously bad idea.
>
> We *do* need that stable "active role" value though, since we cannot very
> well advise against using current_role in the policy function without providing
> an alternativ…
>
I have no preference about implementation detail of this. As long as we can
have the actually stable user-id until v9.3 release, it is sufficient for me.

>>> To reiterate, my point is that we won't get away with zero planner changes
>>> even without RLSBYPASS. In fact, we'll be paying *most* of the complexity
>>> costs of RLSBYPASS whether we actually add that feature or not. Which makes
>>> not adding it look like a pretty bad deal to me.
>
>> It seems to me, 95% of our opinion is common, except for a few detailed stuff.
>> I never dislike the idea of RLSBYPASS permission, but I'd like to investigate
>> this idea with more time, until v9.3 being closed.
>> At least, nobody opposed to allow superusers to bypass RLS policy.
>> So, it can be a minimum startup point, isn't it? RLS patch will takes hundreds
>> of lines due to syntax support or query rewriting stuff. It is always good idea
>> to focus on the core functionality at the starting point.
>
> Absolutely. But at the same time, it's important to check that the design allows
> the additional features to be added later easily. In the case of RLS, I'm
> worried that decreeing that it's the role at execution time, not planning
> time, that counts, we're painting ourselves into a corner. I view RLSBYPASS
> as a good sanity check for that. Another one is that GRANT-with-filter-function
> idea. Both seem to fall into place quite naturally if handled while planning -
> you simply have to add decide which additional WHERE clause to add, if any.
> Without any negative performance effects from policies which don't apply to
> the current role, if the optimizer does it's job. Getting the same level of
> performance when policies are added unconditionally seems quite impossible.
> You might be able to make up for some losses by caching STABLE values (which
> amounts to another constant folding pass), but you won't ever be able to make up
> for a lost index scan opportunity or other optimizations which change the structure
> of the whole query plan.
>
If we have a mechanism to check permissions at planner stage, it should perform
as if it works at executor stage, as existing permission mechanism doing.
Thus, here is no inconsistency even if the first version assumes RLS policy is
evaluated at executor stage.
Please note that I never deny your idea, it probably allows to generate better
query plan, but all my point is that I'd like to develop such kind of
features at 2nd
steps.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>