Обсуждение: INSERT ... ON CONFLICT UPDATE and RLS
The patch that implements INSERT ... ON CONFLICT UPDATE has support and tests for per-column privileges (which are not relevant to the IGNORE variant, AFAICT). However, RLS support is another thing entirely. It has not been properly thought out, and unlike per-column privileges requires careful consideration, as the correct behavior isn't obvious. I've documented the current problems with RLS here: https://wiki.postgresql.org/wiki/UPSERT#RLS It's not clear whether or not the auxiliary UPDATE within an INSERT... ON CONFLICT UPDATE statement should have security quals appended. Stephen seemed to think that that might not be the best solution [1]. I am not sure. I'd like to learn what other people think. What is the best way of integrating RLS with ON CONFLICT UPDATE? What behavior is most consistent with the guarantees of RLS? In particular, should the implementation append security quals to the auxiliary UPDATE, or fail sooner? [1] http://www.postgresql.org/message-id/20141121205926.GK28859@tamriel.snowman.net -- Peter Geoghegan
On Mon, Jan 5, 2015 at 12:54 PM, Peter Geoghegan <pg@heroku.com> wrote: > The patch that implements INSERT ... ON CONFLICT UPDATE has support > and tests for per-column privileges (which are not relevant to the > IGNORE variant, AFAICT). However, RLS support is another thing > entirely. It has not been properly thought out, and unlike per-column > privileges requires careful consideration, as the correct behavior > isn't obvious. > > I've documented the current problems with RLS here: > > https://wiki.postgresql.org/wiki/UPSERT#RLS > > It's not clear whether or not the auxiliary UPDATE within an INSERT... > ON CONFLICT UPDATE statement should have security quals appended. > Stephen seemed to think that that might not be the best solution [1]. > I am not sure. I'd like to learn what other people think. > > What is the best way of integrating RLS with ON CONFLICT UPDATE? What > behavior is most consistent with the guarantees of RLS? In particular, > should the implementation append security quals to the auxiliary > UPDATE, or fail sooner? I think the INSERT .. ON CONFLICT UPDATE shouldn't be able to attempt an update unless the UPDATE policies of the table are such that a regular UPDATE would find the affected row. The post-image of the row needs to satisfy any UPDATE CHECK OPTION. If the INSERT fails due to a conflict with an unseen row, and the UPDATE can't find that row either due to RLS, then it should probably error out; the alternative is to silently do nothing, but that feels wrong. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Jan 6, 2015 at 9:39 AM, Robert Haas <robertmhaas@gmail.com> wrote: > I think the INSERT .. ON CONFLICT UPDATE shouldn't be able to attempt > an update unless the UPDATE policies of the table are such that a > regular UPDATE would find the affected row. The post-image of the row > needs to satisfy any UPDATE CHECK OPTION. If the INSERT fails due to > a conflict with an unseen row, and the UPDATE can't find that row > either due to RLS, then it should probably error out; the alternative > is to silently do nothing, but that feels wrong. I can certainly see the argument for that behavior. I'm inclined to agree that this is better. With th existing implementation, UPDATE check options are effective at preventing updates. However, the implementation is not effective at preventing row locking from finding a row that the user would not otherwise be able to find (with a conventional UPDATE). So I guess what I have to do now is figure out a way of also having the "... FOR UPDATE .... USING ( )" qual be enforced after row locking in respect of the row locked (locked, but not yet used to generate a post-image) in the target table. If it isn't satisfied, throw an error that doesn't leak anything about the target row, rather than silently not affecting the row. So, as you say, a divergence from what regular RLS UPDATEs do that happens to make more sense here. -- Peter Geoghegan
Another issue that I see is that there is only one resultRelation between the INSERT and UPDATE. That means that without some additional special measure, both UPDATE and INSERT "WITH CHECK ( ... ) " options are applied regardless of whether the INSERT path was taken, or the UPDATE path. Maybe that's actually sensible (note that even this doesn't happen right now, since the auxiliary UPDATE is currently minimally processed by the rewriter). What do people think about that? It seems like it might be less surprising to have that fail earlier when there are separate policies for INSERT and UPDATE -- for UPSERT, all policies are applied regardless of which path is taken. The task of making the security qual enforced like a check option before we go to update a locked row (at the start of the UPDATE path, for any UPDATE policy with a USING security qual) looks complicated. I'd appreciate a little help on the implementation details. Apparently Oracle does not offer "fine-grained access control" with MERGE, which I believe means they just don't support this kind of thing at all. Obviously I'd rather avoid that here, but the correct semantics are not obvious. ON CONFLICT UPDATE could almost justify making CREATE POLICY FOR INSERT accept a USING expression, since that's really where the row to UPDATE comes from. -- Peter Geoghegan
On 6 January 2015 at 20:44, Peter Geoghegan <pg@heroku.com> wrote: > Another issue that I see is that there is only one resultRelation > between the INSERT and UPDATE. That means that without some additional > special measure, both UPDATE and INSERT "WITH CHECK ( ... ) " options > are applied regardless of whether the INSERT path was taken, or the > UPDATE path. Maybe that's actually sensible (note that even this > doesn't happen right now, since the auxiliary UPDATE is currently > minimally processed by the rewriter). What do people think about that? > It seems like it might be less surprising to have that fail earlier > when there are separate policies for INSERT and UPDATE -- for UPSERT, > all policies are applied regardless of which path is taken. > I think the policies applied should depend on the path taken, so if it does an INSERT, then only the INSERT CHECK policy should be applied (after the insert), but if it ends up doing an UPDATE, I would expect the UPDATE USING policy to be applied (before the update) and the UPDATE CHECK policy to be applied (after the update). I would not expect the INSERT CHECK policy to be applied on the UPDATE path. As to whether the UPDATE USING policy should cause an error to be thrown if it is not satisfied, my inclination would be to not error, and use the command tag to report that no rows were updated, since that is what would happen with a regular UPDATE. So overall INSERT .. ON CONFLICT UPDATE would be consistent with either an INSERT or an UPDATE, depending on whether the row existed beforehand, which is easier to explain than having some special UPSERT semantics. Regards, Dean
On Wed, Jan 7, 2015 at 4:04 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > I think the policies applied should depend on the path taken, so if it > does an INSERT, then only the INSERT CHECK policy should be applied > (after the insert), but if it ends up doing an UPDATE, I would expect > the UPDATE USING policy to be applied (before the update) and the > UPDATE CHECK policy to be applied (after the update). I would not > expect the INSERT CHECK policy to be applied on the UPDATE path. I agree. > As to whether the UPDATE USING policy should cause an error to be > thrown if it is not satisfied, my inclination would be to not error, > and use the command tag to report that no rows were updated, since > that is what would happen with a regular UPDATE. My inclination would be to error, but I'd be OK with your proposal. > So overall INSERT .. ON CONFLICT UPDATE would be consistent with > either an INSERT or an UPDATE, depending on whether the row existed > beforehand, which is easier to explain than having some special UPSERT > semantics. Yeah. We won't escape the question so easily where triggers are concerned, but at least for RLS it seems like it should be possible to avoid confusing, one-off semantics. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Dean Rasheed (dean.a.rasheed@gmail.com) wrote: > On 6 January 2015 at 20:44, Peter Geoghegan <pg@heroku.com> wrote: > > Another issue that I see is that there is only one resultRelation > > between the INSERT and UPDATE. That means that without some additional > > special measure, both UPDATE and INSERT "WITH CHECK ( ... ) " options > > are applied regardless of whether the INSERT path was taken, or the > > UPDATE path. Maybe that's actually sensible (note that even this > > doesn't happen right now, since the auxiliary UPDATE is currently > > minimally processed by the rewriter). What do people think about that? > > It seems like it might be less surprising to have that fail earlier > > when there are separate policies for INSERT and UPDATE -- for UPSERT, > > all policies are applied regardless of which path is taken. > > I think the policies applied should depend on the path taken, so if it > does an INSERT, then only the INSERT CHECK policy should be applied > (after the insert), but if it ends up doing an UPDATE, I would expect > the UPDATE USING policy to be applied (before the update) and the > UPDATE CHECK policy to be applied (after the update). I would not > expect the INSERT CHECK policy to be applied on the UPDATE path. You're making a distinction where I'm not convinced there should be one. While I understand that, internally, there are two paths (INSERT vs. UPDATE), it's still only one command for the user and their goal is simply "update if this exists, insert if it doesn't." I also don't like the idea that the policy to be applied depends on if it ends up being an INSERT or an UPDATE. I liked Peter's suggestion that the row must pass both WITH CHECK conditions- at least that's consistent and understandable. > As to whether the UPDATE USING policy should cause an error to be > thrown if it is not satisfied, my inclination would be to not error, > and use the command tag to report that no rows were updated, since > that is what would happen with a regular UPDATE. Yes, for a regular UPDATE that's what would happen- but this isn't a regular UPDATE, it's an UPSERT. Consider how users handle this now- they have routines which continually try to do one or the other until either the INSERT or the UPDATE is successful. I've never seen such a function, properly written, that says "try to INSERT, if that fails, try to UPDATE and if that doesn't update anything, then simply exit." What is the use-case for that? > So overall INSERT .. ON CONFLICT UPDATE would be consistent with > either an INSERT or an UPDATE, depending on whether the row existed > beforehand, which is easier to explain than having some special UPSERT > semantics. Any semantics which result in a no-op would be surprising for users of UPSERT. I like the idea of failing earlier- if you try to INSERT .. ON CONFLICT UPDATE a row which you're not allowed to INSERT, erroring strikes me as the right result. If you try to INSERT .. ON CONFLICT UPDATE a row which you're not allowed to UPDATE, I'd also think an error would be the right answer, even if you don't end up doing an actual UPDATE- you ran a command asking for an UPDATE to happen under a certain condition (the row already exists) and the policy states that you're not allowed to do such an UPDATE. As for the UPDATE's USING clause, if you're not allowed to view the records to be updated (which evidently exists because the INSERT failed), I'd handle that the same way we handle the case that a SELECT policy might not allow a user to see rows that they can attempt to INSERT- they'll get an error back in that case too. In the end, these are edge cases as policies are generally self consistent and so I think we have some leeway, but I don't think we have the right to turn an INSERT .. ON CONFLICT UPDATE into a no-op. That would be like allowing an INSERT to be a no-op. Thanks, Stephen
* Robert Haas (robertmhaas@gmail.com) wrote: > On Wed, Jan 7, 2015 at 4:04 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > I think the policies applied should depend on the path taken, so if it > > does an INSERT, then only the INSERT CHECK policy should be applied > > (after the insert), but if it ends up doing an UPDATE, I would expect > > the UPDATE USING policy to be applied (before the update) and the > > UPDATE CHECK policy to be applied (after the update). I would not > > expect the INSERT CHECK policy to be applied on the UPDATE path. > > I agree. I can certainly understand the appeal of this approach, but I don't think it makes sense. Consider what happens later on down the road, after the code has been written and deployed using INSERT .. ON CONFLICT UPDATE where 99% of the time only one path or the other is taken. Then the other path is taken and suddenly the exact same command and row ends up returning errors. Additional testing should have been done to check if that happens, of course, but I really don't like the idea that the exact same command, with the exact same policies, would succeed or fail, due to policies, based on the data in the database. That's not to say that, generally, speaking, the data in the database shouldn't impact policy failures, but in this case, the policies might not even reference any other tables in the database. > > As to whether the UPDATE USING policy should cause an error to be > > thrown if it is not satisfied, my inclination would be to not error, > > and use the command tag to report that no rows were updated, since > > that is what would happen with a regular UPDATE. > > My inclination would be to error, but I'd be OK with your proposal. I'm pretty strongly in favor of erroring. :) > > So overall INSERT .. ON CONFLICT UPDATE would be consistent with > > either an INSERT or an UPDATE, depending on whether the row existed > > beforehand, which is easier to explain than having some special UPSERT > > semantics. > > Yeah. We won't escape the question so easily where triggers are > concerned, but at least for RLS it seems like it should be possible to > avoid confusing, one-off semantics. Triggers are another question.. One approach that I don't recall seeing (and I'm not convinced that it's a good idea either, but thought it deserved mention) is the idea of having an UPSERT-specific set of policies (and perhaps an UPSERT-specific trigger...?). That gets pretty grotty when you consider existing systems which have INSERT and UPDATE triggers already, but UPSERT is a pretty big deal and a big change and for users who are just starting to use it, requiring that they write triggers and policies specifically to address that case might be acceptable. That doesn't address the cases where users have direct SQL access and might be able to then bypass existing triggers, but we might be able to work out an answer to that case.. Other databases have this capability and have triggers and at least one ends up firing both INSERT and UPDATE triggers, with many complaints from users about how that ends up making the performance suck. Perhaps we could use that as a fallback but support the explicit single trigger option too.. Just some thoughts, apologies if it's already been convered in depth previously. Thanks! Stephen
On Wed, Jan 7, 2015 at 3:01 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: >> On Wed, Jan 7, 2015 at 4:04 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >> > I think the policies applied should depend on the path taken, so if it >> > does an INSERT, then only the INSERT CHECK policy should be applied >> > (after the insert), but if it ends up doing an UPDATE, I would expect >> > the UPDATE USING policy to be applied (before the update) and the >> > UPDATE CHECK policy to be applied (after the update). I would not >> > expect the INSERT CHECK policy to be applied on the UPDATE path. >> >> I agree. > > I can certainly understand the appeal of this approach, but I don't > think it makes sense. Consider what happens later on down the road, > after the code has been written and deployed using INSERT .. ON CONFLICT > UPDATE where 99% of the time only one path or the other is taken. Then > the other path is taken and suddenly the exact same command and row ends > up returning errors. I'd say: that's life. If you don't test your policies, they might not work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > On Wed, Jan 7, 2015 at 3:01 PM, Stephen Frost <sfrost@snowman.net> wrote: > > * Robert Haas (robertmhaas@gmail.com) wrote: > >> On Wed, Jan 7, 2015 at 4:04 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > >> > I think the policies applied should depend on the path taken, so if it > >> > does an INSERT, then only the INSERT CHECK policy should be applied > >> > (after the insert), but if it ends up doing an UPDATE, I would expect > >> > the UPDATE USING policy to be applied (before the update) and the > >> > UPDATE CHECK policy to be applied (after the update). I would not > >> > expect the INSERT CHECK policy to be applied on the UPDATE path. > >> > >> I agree. > > > > I can certainly understand the appeal of this approach, but I don't > > think it makes sense. Consider what happens later on down the road, > > after the code has been written and deployed using INSERT .. ON CONFLICT > > UPDATE where 99% of the time only one path or the other is taken. Then > > the other path is taken and suddenly the exact same command and row ends > > up returning errors. > > I'd say: that's life. If you don't test your policies, they might not work. I agree with that up to a point- this case feels more like a POLA violation though rather than a run-of-the-mill "you need to test all that." I'm a bit worried this might lead to cases where users can't use UPSERT because they don't know which set of policies will end up getting applied, although the above approach is strictly a subset of the approach I'm advocating, but at least with the 'throw it all together' case, you know exactly what you're getting with UPSERT. Thanks, Stephen
On Wed, Jan 7, 2015 at 12:01 PM, Stephen Frost <sfrost@snowman.net> wrote: > Other databases have this capability and have triggers and at least one > ends up firing both INSERT and UPDATE triggers, with many complaints > from users about how that ends up making the performance suck. Perhaps > we could use that as a fallback but support the explicit single trigger > option too.. Just some thoughts, apologies if it's already been > convered in depth previously. I would like to expose whether or not statement-level triggers are being called in the context of an INSERT ... ON CONFLICT UPDATE, FWIW. -- Peter Geoghegan
On Wed, Jan 7, 2015 at 12:26 PM, Stephen Frost <sfrost@snowman.net> wrote: > I agree with that up to a point- this case feels more like a POLA > violation though rather than a run-of-the-mill "you need to test all > that." I'm a bit worried this might lead to cases where users can't use > UPSERT because they don't know which set of policies will end up getting > applied, although the above approach is strictly a subset of the > approach I'm advocating, but at least with the 'throw it all together' > case, you know exactly what you're getting with UPSERT. I think that it makes sense to bunch the policies together, because INSERT ... ON CONFLICT UPDATE is mostly an INSERT - informally speaking, the "UPDATE part" is just enough to resolve a conflict - this is ultimately just a new clause of INSERT. Besides, I think that making the permissions as restrictive as possible is the least surprising behavior. It's clearly the most secure behavior. I think that we collectively lean towards actively throwing an error as the preferred behavior - I think that's good, because the intent of the user to UPDATE some particular tuple that they're not allowed to UPDATE is clear and unambiguous. As long as we're doing that, clearly writing a query that will fail based on the "wrong" path being taken is wrong-headed. Now, you can construct an UPSERT case that a "bunch together policies" approach doesn't serve well. I could be mistaken, but I think that case is likely to be more than a little contrived. We're now going to be testing the TARGET.* tuple before going to update, having failed to insert (and after row locking the TARGET.* tuple) - the target tuple is really how the update finds the existing row to update, so it should be tested, without necessarily being blamed directly (certainly, we cannot leak the TARGET.* tuple values, so maybe we should try blaming the EXCLUDED.* tuple first for error messaging reasons). If it fails because of the insert check after row locking but before update, well, it was liable to anyway, when the insert path was taken. Conversely, if it failed after actually inserting where that would not happen with a vanilla INSERT (due to a bunched-together update "USING()" security barrier qual or explicit user-supplied CHECK OPTION), well, you'd have gotten the same thing if you took the UPDATE path anyway. That just leaves the regular ExecUpdate() call of ExecWithCheckOptions(). When that is called with "bunched together" policies, you're now testing the final tuple. So it might be a bit confusing that the policy of final tuples originating from INSERTs is also enforced here, for the final tuple of an "UPDATE". But in order for even that to be the case, you'd have to have a final tuple that was substantially different from the one originally proposed for insertion that made the difference between a qual passing and not passing. How likely is that in realistic use cases? And besides, isn't the policy for INSERTs still quite relevant even here? We're talking about a final tuple that originated from an INSERT statement, after all. -- Peter Geoghegan
On Wed, Jan 07, 2015 at 03:01:20PM -0500, Stephen Frost wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: > > On Wed, Jan 7, 2015 at 4:04 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > > I think the policies applied should depend on the path taken, so if it > > > does an INSERT, then only the INSERT CHECK policy should be applied > > > (after the insert), but if it ends up doing an UPDATE, I would expect > > > the UPDATE USING policy to be applied (before the update) and the > > > UPDATE CHECK policy to be applied (after the update). I would not > > > expect the INSERT CHECK policy to be applied on the UPDATE path. > > > > I agree. > > I can certainly understand the appeal of this approach, but I don't > think it makes sense. Consider what happens later on down the road, > after the code has been written and deployed using INSERT .. ON CONFLICT > UPDATE where 99% of the time only one path or the other is taken. Then > the other path is taken and suddenly the exact same command and row ends > up returning errors. Additional testing should have been done to check > if that happens, of course, but I really don't like the idea that the > exact same command, with the exact same policies, would succeed or fail, > due to policies, based on the data in the database. There's precedent. Unique constraints, for example. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Wed, Jan 7, 2015 at 5:16 PM, David Fetter <david@fetter.org> wrote: > There's precedent. Unique constraints, for example. I don't see that as any kind of precedent. -- Peter Geoghegan
On Wed, Jan 07, 2015 at 05:18:58PM -0800, Peter Geoghegan wrote: > On Wed, Jan 7, 2015 at 5:16 PM, David Fetter <david@fetter.org> wrote: > > There's precedent. Unique constraints, for example. > > I don't see that as any kind of precedent. In the part you sliced off, Stephen described a situation where the contents of a database either do or don't cause a query to violate a constraint. Uniqueness constraints are one example of this. CREATE TABLE foo(id INTEGER PRIMARY KEY); INSERT INTO foo(id) VALUES(1); /* Works the first time */ INSERT INTO foo(id) VALUES(1); /* Fails the next time */ Same database, same constraints, different outcome. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Wed, Jan 7, 2015 at 5:33 PM, David Fetter <david@fetter.org> wrote: > Same database, same constraints, different outcome. You're missing the point, I think. UPSERT means that the user doesn't care about whether or not a given tuple proposed for insertion was handled with an insert or an update. If you have distinct INSERT and UPDATE policies (not that I think that many people will), and if they differ in such a way as to make the difference between some actual INSERT ... ON CONFLICT UPDATE in the app throwing an error or not throwing an error depending only on whether the UPDATE (or INSERT) path was taken, then that combination (the combination of that UPSERT command, that UPDATE policy and that INSERT policy) is wrong-headed. It's not based on some particular set of data in the database, but any and all possible sets. Stephen's objection isn't that the update operation throws an error; it's that the insert doesn't. I'm with Stephen; ISTM that selectively enforcing RLS like that is a foot gun. For column level privileges, you wouldn't expect to only get an error about not having the relevant update permissions at runtime, when the update path happens to be taken. And so it is for RLS. -- Peter Geoghegan
Peter, * Peter Geoghegan (pg@heroku.com) wrote: > For column level privileges, you wouldn't expect to only get an error > about not having the relevant update permissions at runtime, when the > update path happens to be taken. And so it is for RLS. Right, that's the precedent we should be considering. Column-level privileges is a great example- you need both insert and update privileges for the columns involved for the command to succeed. It shouldn't depend on which path actually ends up being taken. Thanks! Stephen
On 9 January 2015 at 00:49, Stephen Frost <sfrost@snowman.net> wrote: > Peter, > > * Peter Geoghegan (pg@heroku.com) wrote: >> For column level privileges, you wouldn't expect to only get an error >> about not having the relevant update permissions at runtime, when the >> update path happens to be taken. And so it is for RLS. > > Right, that's the precedent we should be considering. Column-level > privileges is a great example- you need both insert and update > privileges for the columns involved for the command to succeed. It > shouldn't depend on which path actually ends up being taken. > It's not the same as column-level privileges though, because RLS policies depend on the new/existing data, not just the table metadata. So for example an "UPDATE ... WHERE false" can fail column-level privilege checks even though it isn't updating anything, but it cannot fail a RLS policy check. I was trying to think up an example where you might actually have different INSERT and UPDATE policies, and the best I can think of is some sort of mod_count column where you have an INSERT CHECK (mod_count = 0) and an UPDATE CHECK (mod_count > 0). In that case, checking both policies would make an UPSERT impossible, whereas if you think of it as doing either an INSERT or an UPDATE, as the syntax suggests, it becomes possible. (I'm assuming here from your description that you intend for the policy expressions to be AND'ed together, so it might end up being an AND of 2 OR expressions.) Regards, Dean
On Fri, Jan 9, 2015 at 12:19 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > I was trying to think up an example where you might actually have > different INSERT and UPDATE policies, and the best I can think of is > some sort of mod_count column where you have an INSERT CHECK > (mod_count = 0) and an UPDATE CHECK (mod_count > 0). In that case, > checking both policies would make an UPSERT impossible, whereas if you > think of it as doing either an INSERT or an UPDATE, as the syntax > suggests, it becomes possible. Why does this user want to do this upsert? If they're upserting, then the inserted row could only reasonably have a value of (mod_count = 0). If updating, then they must have a constant value for the update path (a value that's greater than 0, naturally - say 2), which doesn't make any sense in the context of an upsert's auxiliary update - what happened to the 0 value? Sorry, but I don't think your example makes sense - I can't see what would motivate anyone to write a query like that with those RLS policies in place. It sounds like you're talking about an insert and a separate update that may or may not affect the same row, and not an upsert. Then those policies make sense, but in practice they render the upsert you describe contradictory. FWIW, I'm not suggesting that there couldn't possibly be a use case that doesn't do well with this convention where we enforce RLS deepening on the path taken. The cases are just very marginal, as I think your difficulty in coming up with a convincing counter-argument shows. I happen to think what Stephen and I favor ("bunching together" USING() barrier quals and check options from INSERT and UPDATE policies) is likely to be the best alternative available on balance. More generally, you could point out that I'm actually testing different tuples at different points in query processing under that regime (e.g. the post-insert tuple, or the before-update conflicting, existing tuple from the target, or the post update tuple) and so things could fail when the update path is taken despite the fact that they didn't fail when the insert path was taken. That's technically true, of course, but with idiomatic usage it isn't true, and that's what I care about. Does anyone have another counter-example of a practical upsert statement that cannot be used with certain RLS policies due to the fact that we chose to "bunch together" INSERT and UPDATE RLS policies? -- Peter Geoghegan
On 9 January 2015 at 08:49, Peter Geoghegan <pg@heroku.com> wrote: > On Fri, Jan 9, 2015 at 12:19 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >> I was trying to think up an example where you might actually have >> different INSERT and UPDATE policies, and the best I can think of is >> some sort of mod_count column where you have an INSERT CHECK >> (mod_count = 0) and an UPDATE CHECK (mod_count > 0). In that case, >> checking both policies would make an UPSERT impossible, whereas if you >> think of it as doing either an INSERT or an UPDATE, as the syntax >> suggests, it becomes possible. > > Why does this user want to do this upsert? If they're upserting, then > the inserted row could only reasonably have a value of (mod_count = > 0). If updating, then they must have a constant value for the update > path (a value that's greater than 0, naturally - say 2), which doesn't > make any sense in the context of an upsert's auxiliary update - what > happened to the 0 value? Sorry, but I don't think your example makes > sense - I can't see what would motivate anyone to write a query like > that with those RLS policies in place. It sounds like you're talking > about an insert and a separate update that may or may not affect the > same row, and not an upsert. Then those policies make sense, but in > practice they render the upsert you describe contradictory. > Whoa, hang on. I think you're being a bit quick to dismiss that example. Why shouldn't I want an upsert where the majority of the table columns follow the usual "make it so" pattern of an upsert, but there is also this kind of audit column to be maintained? Then I would write something like INSERT INTO tbl (<some values>, 0) ON CONFLICT UPDATE SET <same values>, mod_count=mod_count+1; The root of the problem is the way that you're proposing to combine the RLS policies (using AND), which runs contrary to the way RLS policies are usually combined (using OR), which is why this kind of example fails -- RLS policies in general aren't intended to all be true simultaneously. Regards, Dean
Dean, Peter, * Dean Rasheed (dean.a.rasheed@gmail.com) wrote: > On 9 January 2015 at 08:49, Peter Geoghegan <pg@heroku.com> wrote: > > On Fri, Jan 9, 2015 at 12:19 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > >> I was trying to think up an example where you might actually have > >> different INSERT and UPDATE policies, and the best I can think of is > >> some sort of mod_count column where you have an INSERT CHECK > >> (mod_count = 0) and an UPDATE CHECK (mod_count > 0). In that case, > >> checking both policies would make an UPSERT impossible, whereas if you > >> think of it as doing either an INSERT or an UPDATE, as the syntax > >> suggests, it becomes possible. > > > > Why does this user want to do this upsert? If they're upserting, then > > the inserted row could only reasonably have a value of (mod_count = > > 0). If updating, then they must have a constant value for the update > > path (a value that's greater than 0, naturally - say 2), which doesn't > > make any sense in the context of an upsert's auxiliary update - what > > happened to the 0 value? Sorry, but I don't think your example makes > > sense - I can't see what would motivate anyone to write a query like > > that with those RLS policies in place. It sounds like you're talking > > about an insert and a separate update that may or may not affect the > > same row, and not an upsert. Then those policies make sense, but in > > practice they render the upsert you describe contradictory. > > > > Whoa, hang on. I think you're being a bit quick to dismiss that > example. Why shouldn't I want an upsert where the majority of the > table columns follow the usual "make it so" pattern of an upsert, but > there is also this kind of audit column to be maintained? Then I would > write something like > > INSERT INTO tbl (<some values>, 0) > ON CONFLICT UPDATE SET <same values>, mod_count=mod_count+1; That's a good point- it doesn't make sense to compare the INSERT values against the UPDATE policy as the result of the UPDATE could be a completely different tuple, one we can't know the contents of until we actually find a conflicting tuple. We can't simply combine the policies and run them at the same time, but I'm not sure that then means we should let an INSERT .. ON CONFLICT UPDATE pass in values to the INSERT which are not permitted by any INSERT policy on the relation. Further, forgetting about RLS, what happens if a user doesn't have INSERT rights on the mod_count column at all? We still allow the above to execute if the row already exists? That doesn't strike me as a good idea. In this case, I do think it makes sense to consider RLS and our regular permissions system as they are both 'OR' cases. I might have UPDATE rights for a particular column through multiple different roles for a given relation, but if none of them give me INSERT rights for that column, then I'd expect to get an error if I try to do an INSERT into that column. > The root of the problem is the way that you're proposing to combine > the RLS policies (using AND), which runs contrary to the way RLS > policies are usually combined (using OR), which is why this kind of > example fails -- RLS policies in general aren't intended to all be > true simultaneously. I agree that RLS policies, in general, are not intended to all be true simultaneously. I don't think it then follows that an INSERT .. ON CONFLICT UPDATE should be allowed if, for example, there are no INSERT policies on an RLS-enabled relation, because the call happens to end up doing an UPDATE. To try and outline another possible use-case, consider that you have tuples coming in from two independent sets of users, orange and blue. All users are allowed to INSERT, but the 'color' column must equal the user's. For UPDATEs, the blue users can see both orange and blue records while orange users can only see orange records. Rows resulting from an UPDATE can be orange or blue for blue users, but must be orange for orange users. Further, in this environment, attempts by orange users to insert blue records are flagged to be audited for review because orange users should never have access to nor be handling blue data. With the approach you're outlining, a command like: INSERT INTO data VALUES (... , 'blue') ON CONFLICT UPDATE set ... = ...; run by an orange user wouldn't error if it happened to result in an UPDATE (presuming, of course, that the UPDATE didn't try to change the existing color), but from a security perspective, it's clearly an error for an orange user to be attempting to insert blue data. I don't see this as impacting the more general case of how RLS policies are applied. An individual blue user might also be a member of some orange-related role and it wouldn't make any sense for that user to then be only able to see orange records (were we to AND RLS policies together). Where this leaves me, at least, is feeling like we should always apply the INSERT WITH CHECK policy, then if there is a conflict, check the UPDATE USING policy and throw an error if the row isn't visible but otherwise perform the UPDATE and then check the UPDATE WITH CHECK policy. I see your point that this runs counter to the 'mod_count' example use-case and could cause problems for users using RLS with such a strategy. For my part, I expect users of RLS to expect errors in such a case rather than allowing it, but it's certainly a judgement call. The only reasonable way that I can see to support both sides would be to allow UPSERT to be a top-level policy definition in its own right and let users specify exactly what is allowed in the UPSERT case (possibly requiring three different expressions to represent the initial INSERT, what the UPDATE can see, and what the UPDATE results in..). I tend to doubt it would be worth it unless we end up supporting UPSERT-specific triggers and permissions.. Thanks, Stephen
On Fri, Jan 9, 2015 at 2:22 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > Whoa, hang on. I think you're being a bit quick to dismiss that > example. Why shouldn't I want an upsert where the majority of the > table columns follow the usual "make it so" pattern of an upsert, but > there is also this kind of audit column to be maintained? Then I would > write something like > > INSERT INTO tbl (<some values>, 0) > ON CONFLICT UPDATE SET <same values>, mod_count=mod_count+1; > > The root of the problem is the way that you're proposing to combine > the RLS policies (using AND), which runs contrary to the way RLS > policies are usually combined (using OR), which is why this kind of > example fails -- RLS policies in general aren't intended to all be > true simultaneously. In case I wasn't clear, I'm proposing that we AND together the already OR'd together UPDATE and INSERT quals. -- Peter Geoghegan
Peter, * Peter Geoghegan (pg@heroku.com) wrote: > On Fri, Jan 9, 2015 at 2:22 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > Whoa, hang on. I think you're being a bit quick to dismiss that > > example. Why shouldn't I want an upsert where the majority of the > > table columns follow the usual "make it so" pattern of an upsert, but > > there is also this kind of audit column to be maintained? Then I would > > write something like > > > > INSERT INTO tbl (<some values>, 0) > > ON CONFLICT UPDATE SET <same values>, mod_count=mod_count+1; > > > > The root of the problem is the way that you're proposing to combine > > the RLS policies (using AND), which runs contrary to the way RLS > > policies are usually combined (using OR), which is why this kind of > > example fails -- RLS policies in general aren't intended to all be > > true simultaneously. > > In case I wasn't clear, I'm proposing that we AND together the already > OR'd together UPDATE and INSERT quals. I'm not sure how that would work exactly though, since the tuple the UPDATE results in might be different from what the INSERT has, as Dean pointed out. The INSERT tuple might even pass the UPDATE policy where the resulting tuple from the actual UPDATE part doesn't. Thanks! Stephen
On Fri, Jan 9, 2015 at 12:26 PM, Stephen Frost <sfrost@snowman.net> wrote: > Where this leaves me, at least, is feeling like we should always apply > the INSERT WITH CHECK policy, then if there is a conflict, check the > UPDATE USING policy and throw an error if the row isn't visible but > otherwise perform the UPDATE and then check the UPDATE WITH CHECK > policy. I see your point that this runs counter to the 'mod_count' > example use-case and could cause problems for users using RLS with such > a strategy. For my part, I expect users of RLS to expect errors in such > a case rather than allowing it, but it's certainly a judgement call. I mostly agree, but I don't know that I fully agree. Specifically, I also think we should check the update policy even when we only insert, on the theory that if we did go to update, the would-be inserted row would be a proxy for what we'd check then (the existing, target relation's tuple). What do you think of that? I certainly agree that the correct behavior here is at least a bit subjective. We cannot exactly generalize from other areas of the code, nor can we look for a precedent set by other systems (AFAICT there is none). > The only reasonable way that I can see to support both sides would be to > allow UPSERT to be a top-level policy definition in its own right and > let users specify exactly what is allowed in the UPSERT case (possibly > requiring three different expressions to represent the initial INSERT, > what the UPDATE can see, and what the UPDATE results in..). I tend to > doubt it would be worth it unless we end up supporting UPSERT-specific > triggers and permissions.. Agreed. That would technically make everyone happy, in that it defers to the user, but is unlikely to be worth it. -- Peter Geoghegan
On Fri, Jan 9, 2015 at 12:53 PM, Stephen Frost <sfrost@snowman.net> wrote: > I'm not sure how that would work exactly though, since the tuple the > UPDATE results in might be different from what the INSERT has, as Dean > pointed out. The INSERT tuple might even pass the UPDATE policy where > the resulting tuple from the actual UPDATE part doesn't. I'm not suggesting that Dean's example is totally without merit (his revised explanation made it clearer what he meant). I just think, on balance, that enforcing all INSERT + UPDATE policies at various points is the best behavior. Part of the reason for that is that if you ask people how they think it works, you'll get as many answers as the number of people you ask. My proposal (which may or may not be the same as yours, but if not is only slightly more restrictive) is unlikely to affect many users negatively, and is relatively easy to explain and reason about. There are workarounds for Dean's case, too. -- Peter Geoghegan
* Peter Geoghegan (pg@heroku.com) wrote: > On Fri, Jan 9, 2015 at 12:26 PM, Stephen Frost <sfrost@snowman.net> wrote: > > Where this leaves me, at least, is feeling like we should always apply > > the INSERT WITH CHECK policy, then if there is a conflict, check the > > UPDATE USING policy and throw an error if the row isn't visible but > > otherwise perform the UPDATE and then check the UPDATE WITH CHECK > > policy. I see your point that this runs counter to the 'mod_count' > > example use-case and could cause problems for users using RLS with such > > a strategy. For my part, I expect users of RLS to expect errors in such > > a case rather than allowing it, but it's certainly a judgement call. > > I mostly agree, but I don't know that I fully agree. Specifically, I > also think we should check the update policy even when we only insert, > on the theory that if we did go to update, the would-be inserted row > would be a proxy for what we'd check then (the existing, target > relation's tuple). What do you think of that? To flip it around a bit, I don't think we can avoid checking the *resulting* tuple from the UPDATE against the UPDATE policy. Therefore, I'm not sure that I see the point in checking the INSERT tuple against the UPDATE policy. I also don't like the idea that a completely legitimate command (one which allows the to-be-inserted row via the INSERT policy AND allows the tuple-post-update via the UPDATE policy) to throw an error because the to-be-inserted row violated the UPDATE policy. That doesn't make sense to me. Thanks! Stephen
On Fri, Jan 9, 2015 at 1:09 PM, Stephen Frost <sfrost@snowman.net> wrote: > To flip it around a bit, I don't think we can avoid checking the > *resulting* tuple from the UPDATE against the UPDATE policy. We can avoid it - by not updating. What I'm suggesting is that an enforcement occurs that is more or less equivalent to the enforcement that occurs when the UPDATE path is taken, without it actually being taken. I accept that that isn't perfect, but it has its advantages. > Therefore, > I'm not sure that I see the point in checking the INSERT tuple against > the UPDATE policy. I guess it wouldn't be hard to modify the struct WithCheckOption to store the cmd (e.g. RowSecurityPolicy-style ACL_*_CHR permissions), usable only in this context. That way, when ExecWithCheckOptions() is called from the INSERT, we can tell it to only enforce INSERT-applicable policy cmds (in particular, not UPDATE-only-applicable policy cmds that happen to end up associated with the same ResultRelation). Whereas, at the end of ExecUpdate(), that final ExecWithCheckOptions() call (call 3 of 3 when the UPDATE path is taken), INSERT-based policies can still be enforced against the final tuple (as can USING() quals that would ordinarily not be checked at that point either). A given ResultRelation's policies wouldn't necessarily always need to be enforced within ExecWithCheckOptions(), which is a change. Does that make sense to you? Note that I've already taught ExecWithCheckOptions() to not leak the existing, target tuple when the UPDATE path is taken (the tuple that can be referenced in the UPDATE using the TARGET.* alias), while still performing enforcement against it. I can teach it this too. -- Peter Geoghegan
* Peter Geoghegan (pg@heroku.com) wrote: > On Fri, Jan 9, 2015 at 1:09 PM, Stephen Frost <sfrost@snowman.net> wrote: > > Therefore, > > I'm not sure that I see the point in checking the INSERT tuple against > > the UPDATE policy. > > I guess it wouldn't be hard to modify the struct WithCheckOption to > store the cmd (e.g. RowSecurityPolicy-style ACL_*_CHR permissions), > usable only in this context. That way, when ExecWithCheckOptions() is > called from the INSERT, we can tell it to only enforce > INSERT-applicable policy cmds (in particular, not > UPDATE-only-applicable policy cmds that happen to end up associated > with the same ResultRelation). Whereas, at the end of ExecUpdate(), > that final ExecWithCheckOptions() call (call 3 of 3 when the UPDATE > path is taken), INSERT-based policies can still be enforced against > the final tuple (as can USING() quals that would ordinarily not be > checked at that point either). A given ResultRelation's policies > wouldn't necessarily always need to be enforced within > ExecWithCheckOptions(), which is a change. Does that make sense to > you? That sounds reasonable on first blush. I'll note that I've not looked deeply at the UPSERT patch, but if the above means that the INSERT policy is always checked against the INSERT tuple and the UPDATE policy is always checked against the tuple-resulting-from-an-UPDATE, and that tuples which are not visible due to the UPDATE policy throw an error in that case, then it's working as I'd expect. This does mean that the UPDATE policy won't be checked when the INSERT path is used but that seems appropriate to me, as we can't check a policy against a tuple that never exists (the result of the update applied to an existing tuple). > Note that I've already taught ExecWithCheckOptions() to not leak the > existing, target tuple when the UPDATE path is taken (the tuple that > can be referenced in the UPDATE using the TARGET.* alias), while still > performing enforcement against it. I can teach it this too. Ok. Thanks! Stephen
On 9 January 2015 at 20:26, Stephen Frost <sfrost@snowman.net> wrote: > Where this leaves me, at least, is feeling like we should always apply > the INSERT WITH CHECK policy, then if there is a conflict, check the > UPDATE USING policy and throw an error if the row isn't visible but > otherwise perform the UPDATE and then check the UPDATE WITH CHECK > policy. Yeah, I can see the logic in that, but I'm starting to question the final "then" in the above, i.e., the order in which the checks are done. Currently we're applying RLS CHECKs after the INSERT or UPDATE, like WITH CHECK OPTIONs on views. The SQL spec says that WITH CHECK OPTIONs on views have to be applied after the INSERT/UPDATE on the base relation, but we're free to do something different for RLS CHECKs if that makes more sense. If we want RLS to be more like column-level privilege checking, then it does make sense to do the checks sooner, so perhaps we should be checking the RLS policies before the INSERT/UPDATE, like CHECK constraints. Doing that, it makes sense to first check INSERT CHECK policies, potentially throwing an error before any PK conflict is detected, so an error would result even in the INSERT .. ON CONFLICT IGNORE case. That way INSERT CHECK policies would always be applied regardless of the path taken, as you suggest. I would still say that the RLS UPDATE USING/CHECK policies should only be applied if the UPDATE path is taken (before the UPDATE is applied), and they need to be applied to the (old/new) update tuples only. It's a bit like there is a WHERE <record already exists> clause attached to the UPDATE. The RLS UPDATE policies should only be applied to the rows affected by the UPDATE. > I see your point that this runs counter to the 'mod_count' > example use-case and could cause problems for users using RLS with such > a strategy. For my part, I expect users of RLS to expect errors in such > a case rather than allowing it, but it's certainly a judgement call. > Actually I don't think this is a problem for the mod_count example. This isn't the same as AND'ing together the INSERT and UPDATE quals, because each set of policies is being applied to a different tuple. There are 3 tuples in play here:- 1). The insert tuple, which has INSERT CHECK polcies applied to it 2). The existing (old) update tuple, which has UPDATE USING policies applied to it 3). The (new) update tuple, which has UPDATE CHECK policies applied to it NOTE: If we do change RLS CHECKs to be executed prior to modifying any data, that's potentially a change that could be made independently of the UPSERT patch. We should also probably then stop referring to them as WITH CHECK OPTIONs in the docs and in error messages because they're not the same thing, and the code might be neater if they had their own data-structure rather than overloading WithCheckOption. Regards, Dean
Dean, * Dean Rasheed (dean.a.rasheed@gmail.com) wrote: > On 9 January 2015 at 20:26, Stephen Frost <sfrost@snowman.net> wrote: > > Where this leaves me, at least, is feeling like we should always apply > > the INSERT WITH CHECK policy, then if there is a conflict, check the > > UPDATE USING policy and throw an error if the row isn't visible but > > otherwise perform the UPDATE and then check the UPDATE WITH CHECK > > policy. > > Yeah, I can see the logic in that, but I'm starting to question the > final "then" in the above, i.e., the order in which the checks are > done. > > Currently we're applying RLS CHECKs after the INSERT or UPDATE, like > WITH CHECK OPTIONs on views. The SQL spec says that WITH CHECK OPTIONs > on views have to be applied after the INSERT/UPDATE on the base > relation, but we're free to do something different for RLS CHECKs if > that makes more sense. If we want RLS to be more like column-level > privilege checking, then it does make sense to do the checks sooner, > so perhaps we should be checking the RLS policies before the > INSERT/UPDATE, like CHECK constraints. I've been wondering about the placement of the checks also. I keep meaning to go back to the SQL spec and review the WITH CHECK requirement as it feels a bit odd to me, but then again, it is the spec. As you say, we're not obligated to follow it for RLS in any case. > Doing that, it makes sense to first check INSERT CHECK policies, > potentially throwing an error before any PK conflict is detected, so > an error would result even in the INSERT .. ON CONFLICT IGNORE case. > That way INSERT CHECK policies would always be applied regardless of > the path taken, as you suggest. Right. > I would still say that the RLS UPDATE USING/CHECK policies should only > be applied if the UPDATE path is taken (before the UPDATE is applied), > and they need to be applied to the (old/new) update tuples only. It's > a bit like there is a WHERE <record already exists> clause attached to > the UPDATE. The RLS UPDATE policies should only be applied to the rows > affected by the UPDATE. Agreed. I don't see how we could possibly do otherwise. The only tuple we have at the outset is the to-be-INSERT'd tuple. We won't find the conflicting tuple until we try and even then that conflicting tuple might be different from the to-be-INSERT'd tuple, so we can't check that until we've discovered the conflict exists. Further, we won't have the tuple to check for the UPDATE's CHECK policy until all of the expressions (if any) are run on the UPDATE side. > > I see your point that this runs counter to the 'mod_count' > > example use-case and could cause problems for users using RLS with such > > a strategy. For my part, I expect users of RLS to expect errors in such > > a case rather than allowing it, but it's certainly a judgement call. > > Actually I don't think this is a problem for the mod_count example. > This isn't the same as AND'ing together the INSERT and UPDATE quals, > because each set of policies is being applied to a different tuple. > There are 3 tuples in play here:- > > 1). The insert tuple, which has INSERT CHECK polcies applied to it > 2). The existing (old) update tuple, which has UPDATE USING policies > applied to it > 3). The (new) update tuple, which has UPDATE CHECK policies applied to it Right. > NOTE: If we do change RLS CHECKs to be executed prior to modifying any > data, that's potentially a change that could be made independently of > the UPSERT patch. We should also probably then stop referring to them > as WITH CHECK OPTIONs in the docs and in error messages because > they're not the same thing, and the code might be neater if they had > their own data-structure rather than overloading WithCheckOption. I agree that's an independent change. I'm not sure if it would end up being neater or not offhand but I can see some advantages to breaking it up from WithCheck as it might be simpler for users to understand (particularly those users who are not already familiar with WithCheck). Were you thinking about working up a patch for such a change? If not, I'll see about finding time to do it, unless someone else wants to volunteer. :) Thanks! Stephen
On 10 January 2015 at 15:12, Stephen Frost <sfrost@snowman.net> wrote: >> NOTE: If we do change RLS CHECKs to be executed prior to modifying any >> data, that's potentially a change that could be made independently of >> the UPSERT patch. We should also probably then stop referring to them >> as WITH CHECK OPTIONs in the docs and in error messages because >> they're not the same thing, and the code might be neater if they had >> their own data-structure rather than overloading WithCheckOption. > > I agree that's an independent change. I'm not sure if it would end up > being neater or not offhand but I can see some advantages to breaking it > up from WithCheck as it might be simpler for users to understand > (particularly those users who are not already familiar with WithCheck). A new RlsCheck struct wouldn't need the cascade option that WithCheckOption has, and it's not nice the way viewname is being abused. For UPSERT it will need a field indicating the command type, as Peter pointed out, so I think it's different enough to warrant it's own struct, even if we weren't changing the firing time. > Were you thinking about working up a patch for such a change? OK, I'll take a look at it. I started reading the existing code that expands RLS policies and spotted a couple of bugs that should be fixed first:- 1). In prepend_row_security_policies(), defaultDeny should be initialised to false at the top. 2). In fireRIRrules(), activeRIRs isn't being reset properly after recursing, which will cause a self-join to fail. Regards, Dean
On 10 January 2015 at 21:38, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > OK, I'll take a look at it. I started reading the existing code that > expands RLS policies and spotted a couple of bugs that should be fixed > first:- > > 1). In prepend_row_security_policies(), defaultDeny should be > initialised to false at the top. > > 2). In fireRIRrules(), activeRIRs isn't being reset properly after > recursing, which will cause a self-join to fail. > So as I starting looking into these bugs, which looked fairly trivial, the test case I came up with revealed another more subtle bug with RLS, using a more obscure query -- given an update of the form UPDATE t1 ... FROM t2 ..., if t1 is part of an inheritance hierarchy and both t1 and t2 have RLS enabled, the inheritance planner was doing the wrong thing. There is code in there to copy subquery RTEs into each child plan, which needed to do the same for non-target RTEs with security barrier quals, which haven't yet been turned into subqueries. Similarly the rowmark preprocessing code needed to be taught about (non-target) RTEs with security barrier quals to handle this kind of UPDATE with a FROM clause. The attached patch fixes those issues. This bug can't happen with updatable security barrier views, since non-target security barrier views are expanded in the rewriter, so technically this doesn't need bach-patching to 9.4. However, I think the planner changes should probably be back-patched anyway, to keep the code in the 2 branches in sync, and more maintainable. Also it feels like the planner ought to be able to handle any legal query thrown at it, even if it looks like the 9.4 rewriter couldn't generate such a query. Regards, Dean
Вложения
Dean, * Dean Rasheed (dean.a.rasheed@gmail.com) wrote: > On 10 January 2015 at 21:38, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > OK, I'll take a look at it. I started reading the existing code that > > expands RLS policies and spotted a couple of bugs that should be fixed > > first:- > > > > 1). In prepend_row_security_policies(), defaultDeny should be > > initialised to false at the top. > > > > 2). In fireRIRrules(), activeRIRs isn't being reset properly after > > recursing, which will cause a self-join to fail. > > So as I starting looking into these bugs, which looked fairly trivial, > the test case I came up with revealed another more subtle bug with > RLS, using a more obscure query -- given an update of the form UPDATE > t1 ... FROM t2 ..., if t1 is part of an inheritance hierarchy and both > t1 and t2 have RLS enabled, the inheritance planner was doing the > wrong thing. There is code in there to copy subquery RTEs into each > child plan, which needed to do the same for non-target RTEs with > security barrier quals, which haven't yet been turned into subqueries. > Similarly the rowmark preprocessing code needed to be taught about > (non-target) RTEs with security barrier quals to handle this kind of > UPDATE with a FROM clause. Interesting, thanks for the work! I had been suspicious that there was an issue with the recursion handling. > The attached patch fixes those issues. Excellent. Will take a look. > This bug can't happen with updatable security barrier views, since > non-target security barrier views are expanded in the rewriter, so > technically this doesn't need bach-patching to 9.4. However, I think > the planner changes should probably be back-patched anyway, to keep > the code in the 2 branches in sync, and more maintainable. Also it > feels like the planner ought to be able to handle any legal query > thrown at it, even if it looks like the 9.4 rewriter couldn't generate > such a query. I'm not anxious to back-patch if there's no issue in existing branches, but I understand your point about making sure it can handle legal queries even if we don't believe current 9.4 can't generate them. We don't tend to back-patch just to keep things in sync as that could possibly have unintended side-effects. Looking at the regression tests a bit though, I notice that this removes the lower-level LockRows.. There had been much discussion about that last spring which I believe you were a part of..? I specifically recall discussing it with Craig, at least. Thanks! Stephen
On 12 January 2015 at 14:24, Stephen Frost <sfrost@snowman.net> wrote: > Looking at the regression tests a bit though, I notice that this removes > the lower-level LockRows.. There had been much discussion about that > last spring which I believe you were a part of..? I specifically recall > discussing it with Craig, at least. > Ah, yes you're right. Looking back over that discussion it shouldn't be removing those lower-level LockRows. I was a bit aggressive with my change to the rowmark preprocessing -- the first loop applies to requested locks, like SELECT .. FOR UPDATE, so it shouldn't be messing with that, presecurity.c handles that fine. It's only the second loop that needs to be taught about RTEs with security quals that will become subqueries. Here's an updated patch, that passes with the original regression test results. Regards, Dean
Вложения
On 12 January 2015 at 14:24, Stephen Frost <sfrost@snowman.net> wrote: > Interesting, thanks for the work! I had been suspicious that there was > an issue with the recursion handling. > So continuing to review the RLS code, I spotted the following in prepend_row_security_policies(): /* * We may end up getting called multiple times for the same RTE, so check * to make sure we aren't doing double-work. */ if (rte->securityQuals != NIL) return false; which looked suspicious. I assume that's just a hang-up from an earlier attempt to prevent infinite recursion in RLS expansion, but actually it's wrong because in the case of an UPDATE to a security barrier view on top of a table with RLS enabled, the view's securityQuals will get added to the RTE first, and that shouldn't prevent the underlying table's RLS securityQuals from being added. AFAICT, it should be safe to just remove the above code. I can't see how prepend_row_security_policies() could end up getting called more than once for the same RTE. Also, I'm thinking that it would be better to refactor things a bit and have prepend_row_security_policies() just return the new securityQuals and withCheckOptions to add. Then fireRIRrules() would only have to recurse into the new quals being added, not the already-processed quals. Regards, Dean
On 14 January 2015 at 08:43, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On 12 January 2015 at 14:24, Stephen Frost <sfrost@snowman.net> wrote: >> Interesting, thanks for the work! I had been suspicious that there was >> an issue with the recursion handling. >> > > So continuing to review the RLS code, I spotted the following in > prepend_row_security_policies(): > > /* > * We may end up getting called multiple times for the same RTE, so check > * to make sure we aren't doing double-work. > */ > if (rte->securityQuals != NIL) > return false; > > which looked suspicious. I assume that's just a hang-up from an > earlier attempt to prevent infinite recursion in RLS expansion, but > actually it's wrong because in the case of an UPDATE to a security > barrier view on top of a table with RLS enabled, the view's > securityQuals will get added to the RTE first, and that shouldn't > prevent the underlying table's RLS securityQuals from being added. > > AFAICT, it should be safe to just remove the above code. I can't see > how prepend_row_security_policies() could end up getting called more > than once for the same RTE. > Turns out it wasn't as simple as that. prepend_row_security_policies() really could get called multiple times for the same RTE, because the call to query_tree_walker() at the end of fireRIRrules() would descend into the just-added quals again. The simplest fix seems to be to process RLS in a separate loop at the end, so that it can have it's own infinite recursion detection, which is different from that needed for pre-existing security quals and with check options from security barrier views. This approach simplifies things a bit, and ensures that we only try to expand RLS once for each RTE. > Also, I'm thinking that it would be better to refactor things a bit > and have prepend_row_security_policies() just return the new > securityQuals and withCheckOptions to add. Then fireRIRrules() would > only have to recurse into the new quals being added, not the > already-processed quals. > Turns out that refactoring actually became necessary in order to fix this bug, but I think it makes things cleaner and more efficient. Here's an updated patch with a new test for this bug. I've been developing the fixes for these RLS issues as one big patch, but I suppose it would be easy to split up, if that's preferred. Regards, Dean
Вложения
* Dean Rasheed (dean.a.rasheed@gmail.com) wrote: > Turns out it wasn't as simple as that. prepend_row_security_policies() > really could get called multiple times for the same RTE, because the > call to query_tree_walker() at the end of fireRIRrules() would descend > into the just-added quals again. The simplest fix seems to be to > process RLS in a separate loop at the end, so that it can have it's > own infinite recursion detection, which is different from that needed > for pre-existing security quals and with check options from security > barrier views. This approach simplifies things a bit, and ensures that > we only try to expand RLS once for each RTE. Right, I specifically recall having prepend_row_security_policies() getting called multiple times for the same RTE. I like this approach of using a separate loop though and it strikes me that it lends more weight to the argument that we're better off with these as independent considerations. > > Also, I'm thinking that it would be better to refactor things a bit > > and have prepend_row_security_policies() just return the new > > securityQuals and withCheckOptions to add. Then fireRIRrules() would > > only have to recurse into the new quals being added, not the > > already-processed quals. Hmm, good point. > Turns out that refactoring actually became necessary in order to fix > this bug, but I think it makes things cleaner and more efficient. Sounds good, I'll take a look. > Here's an updated patch with a new test for this bug. I've been > developing the fixes for these RLS issues as one big patch, but I > suppose it would be easy to split up, if that's preferred. I'm alright with it as-is for now. Thanks! Stephen
On 10 January 2015 at 15:12, Stephen Frost <sfrost@snowman.net> wrote: > * Dean Rasheed (dean.a.rasheed@gmail.com) wrote: >> Currently we're applying RLS CHECKs after the INSERT or UPDATE, like >> WITH CHECK OPTIONs on views. The SQL spec says that WITH CHECK OPTIONs >> on views have to be applied after the INSERT/UPDATE on the base >> relation, but we're free to do something different for RLS CHECKs if >> that makes more sense. If we want RLS to be more like column-level >> privilege checking, then it does make sense to do the checks sooner, >> so perhaps we should be checking the RLS policies before the >> INSERT/UPDATE, like CHECK constraints. > > Were you thinking about working up a patch for such a change? If not, > I'll see about finding time to do it, unless someone else wants to > volunteer. :) > Attached is a patch to make RLS checks run before attempting to insert/update any data rather than afterwards. In the end I decided not to create a new structure for RLS checks because most of the code that handles them treats them the same as WCOs. Instead, I just added a new 'kind' enum field to the existing structure and renamed/reworded things a bit. The patch also changes the error message for a RLS check violation, to make the cause of the error clearer. One thing I'm not sure about is what sqlstate code to use for this error, but I don't think that using WITH_CHECK_OPTION_VIOLATION is appropriate, because that seems to be specifically intended for views. Regards, Dean
Вложения
Dean, * Dean Rasheed (dean.a.rasheed@gmail.com) wrote: > Attached is a patch to make RLS checks run before attempting to > insert/update any data rather than afterwards. Excellent, many thanks! > In the end I decided not to create a new structure for RLS checks > because most of the code that handles them treats them the same as > WCOs. Instead, I just added a new 'kind' enum field to the existing > structure and renamed/reworded things a bit. Makes sense to me. I like being able to easily differentiate the two from each other now while also not needing to duplicate a bunch of code. I also like the reworded error messages. I am wondering if we should, perhaps, rename ri_WithCheckOptions or ExecWithCheckOptions() (or even more..) to indicate that they're used for both view-based WITH CHECK options and for RLS. We'll need to update this patch once the fixes for the WITH CHECK leaks go in, of course. > The patch also changes the error message for a RLS check violation, to > make the cause of the error clearer. One thing I'm not sure about is > what sqlstate code to use for this error, but I don't think that using > WITH_CHECK_OPTION_VIOLATION is appropriate, because that seems to be > specifically intended for views. Hmm, agreed. Any thoughts on what to use? We could fall back on something like ERRCODE_INSUFFICIENT_PRIVILEGE if necessary, I suppose. Thanks! Stephen
Dean, * Dean Rasheed (dean.a.rasheed@gmail.com) wrote: > Here's an updated patch with a new test for this bug. I've been > developing the fixes for these RLS issues as one big patch, but I > suppose it would be easy to split up, if that's preferred. Thanks for working on all of this! I've brought this up to date with master and addressed the little bit of bitrot. I'm still reviewing it but I'm generally happy with the approach and would certainly welcome any additional thoughts from you or feedback from others. Thanks! Stephen
Вложения
Dean, * Dean Rasheed (dean.a.rasheed@gmail.com) wrote: > Attached is a patch to make RLS checks run before attempting to > insert/update any data rather than afterwards. Excellent, this I really like and it's a pretty straight-forward change. I wonder if there are some documentation updates which need to be done for this also? I'm planning to look as I vauguely recall mentioning the ordering of operations somewhere along the way. I also addressed the bitrot from the column-priv leak patch. Would be great to have you take a look at the latest and let me know if you have any further comments or suggestions. I'm definitely looking forward to getting these changes in soon. Thanks! Stephen
Вложения
On 26 February 2015 at 05:41, Stephen Frost <sfrost@snowman.net> wrote: > Dean, > > * Dean Rasheed (dean.a.rasheed@gmail.com) wrote: >> Here's an updated patch with a new test for this bug. I've been >> developing the fixes for these RLS issues as one big patch, but I >> suppose it would be easy to split up, if that's preferred. > > Thanks for working on all of this! > > I've brought this up to date with master and addressed the little bit > of bitrot. I'm still reviewing it but I'm generally happy with the > approach and would certainly welcome any additional thoughts from you or > feedback from others. > Thanks for doing that. I had been meaning to update it myself, but kept getting distracted by my day job. I think this should probably be committed as 2 separate patches, because there are really 2 separate issues being fixed here, and the commit comment only mentions the second one: 1). The planner changes and the first new regression test (update with from clause self join). This is a fix to the inheritance planner code, which wasn't properly handling the case of a query containing both target and non-target relations with RLS and inheritance. 2). The rewriter changes and the other regression tests (S.b. view on top of Row-level security). This is more than just a code tidy-up -- there's a real bug there. If you run those tests against HEAD, the update against the s.b. view fails to apply the RLS policy of the underlying table because the old recursion detection code thinks it has already handled that RTE. I should get more time to look at this over the weekend. Regards, Dean
On 26 February 2015 at 05:43, Stephen Frost <sfrost@snowman.net> wrote: > Dean, > > * Dean Rasheed (dean.a.rasheed@gmail.com) wrote: >> Attached is a patch to make RLS checks run before attempting to >> insert/update any data rather than afterwards. > > Excellent, this I really like and it's a pretty straight-forward change. > I wonder if there are some documentation updates which need to be done > for this also? I'm planning to look as I vauguely recall mentioning the > ordering of operations somewhere along the way. > > I also addressed the bitrot from the column-priv leak patch. Would be > great to have you take a look at the latest and let me know if you have > any further comments or suggestions. I'm definitely looking forward to > getting these changes in soon. > Thanks for the update. I don't recall any mention of ordering in the docs, but if there isn't anything there it makes sense to add something. I haven't thought about this patch in a while, but I still like the look of it. It's definitely neater to do these checks earlier and it's good to be able to get RLS-specific errors too. I'll take a look at the latest updates. Regards, Dean
On 26 February 2015 at 09:27, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > I think this should probably be committed as 2 separate patches... On closer inspection, the 2 parts are interrelated since the new self-join test that tests the inheritance planner changes also requires the rewriter changes, otherwise it falls over with an infinite recursion error in the rewriter. I suppose the rewriter changes could be committed first with the self-join without inheritance test to test the fix to the infinite recursion problem, but I think it's probably easier to just treat this as a single set of related bug fixes. Attached is an updated patch. There are no more code changes, but I've added a few more regression tests. Regards, Dean
Вложения
On 26 February 2015 at 09:50, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On 26 February 2015 at 05:43, Stephen Frost <sfrost@snowman.net> wrote: >> I wonder if there are some documentation updates which need to be done >> for this also? I'm planning to look as I vauguely recall mentioning the >> ordering of operations somewhere along the way. >> I couldn't find any mention of the timing of the check in the existing documentation, although it does vaguely imply that the check is done before inserting any new data. There is an existing paragraph describing the timing of USING conditions, so I added a new paragraph immediately after that to explain when CHECK expressions are enforced, since that seemed like the best place for it. >> I also addressed the bitrot from the column-priv leak patch. Would be >> great to have you take a look at the latest and let me know if you have >> any further comments or suggestions. I looked it over again, and I'm happy with these updates, except there was a missing "break" in the switch statement in ExecWithCheckOptions(). Here's an updated patch. Regards, Dean
Вложения
* Dean Rasheed (dean.a.rasheed@gmail.com) wrote: > On 26 February 2015 at 09:50, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > On 26 February 2015 at 05:43, Stephen Frost <sfrost@snowman.net> wrote: > >> I wonder if there are some documentation updates which need to be done > >> for this also? I'm planning to look as I vauguely recall mentioning the > >> ordering of operations somewhere along the way. > > I couldn't find any mention of the timing of the check in the existing > documentation, although it does vaguely imply that the check is done > before inserting any new data. There is an existing paragraph > describing the timing of USING conditions, so I added a new paragraph > immediately after that to explain when CHECK expressions are enforced, > since that seemed like the best place for it. Ah, ok, I must have been thinking about the USING discussion. Thanks for adding the paragraph about the CHECK timing. > >> I also addressed the bitrot from the column-priv leak patch. Would be > >> great to have you take a look at the latest and let me know if you have > >> any further comments or suggestions. > > I looked it over again, and I'm happy with these updates, except there > was a missing "break" in the switch statement in > ExecWithCheckOptions(). Oh, thanks for catching that! > Here's an updated patch. Excellent, will review. Thanks again! Stephen