Обсуждение: Improve behavior of concurrent TRUNCATE
Hi all, This is a second thread I am spawning for the previous thread "Canceling authentication due to timeout aka Denial of Service Attack": https://www.postgresql.org/message-id/152512087100.19803.12733865831237526317%40wrigleys.postgresql.org And this time the discussion is about TRUNCATE, as the former thread discussed about a family of failures hence it is harder to catch the proper attention. The problem is that when we look for the relation OID of the relation to truncate, we don't use a callback with RangeVarGetRelidExtended, causing a lock acquire attempt to happen before checking the privileges on the relation for the user running the command. Attached is a patch I have been working on which refactors the code of TRUNCATE in such a way that we check for privileges before trying to acquire a lock, without any user-facing impact (I have reworked a couple of comments compared to the last version). This includes a set of tests showing the new behavior. Like cbe24a6, perhaps we would not want to back-patch it? Based on the past history (and the consensus being reached for the REINDEX case would be to patch only HEAD), I would be actually incline to not back-patch this stuff and qualify that as an improvement. That's also less work for me at commit :) Thoughts? -- Michael
Вложения
Hi, On 8/6/18, 11:59 AM, "Michael Paquier" <michael@paquier.xyz> wrote: > Attached is a patch I have been working on which refactors the code of > TRUNCATE in such a way that we check for privileges before trying to > acquire a lock, without any user-facing impact (I have reworked a couple > of comments compared to the last version). This includes a set of tests > showing the new behavior. Here are some comments for the latest version of the patch. -truncate_check_rel(Relation rel) +truncate_check_rel(Oid relid, Form_pg_class reltuple) Could we use HeapTupleGetOid(reltuple) instead of passing the OID separately? - if (rel->rd_rel->relkind != RELKIND_RELATION && - rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE) + if (reltuple->relkind != RELKIND_RELATION && + + reltuple->relkind != RELKIND_PARTITIONED_TABLE) There appears to be an extra empty line here. +# Test for lock lookup conflicts with TRUNCATE when working on unowned +# relations, particularly catalogs should trigger an ERROR for all the +# scenarios present here. If possible, it might be worth it to check that TRUNCATE still blocks when a role has privileges, too. > Like cbe24a6, perhaps we would not want to back-patch it? Based on the > past history (and the consensus being reached for the REINDEX case would > be to patch only HEAD), I would be actually incline to not back-patch > this stuff and qualify that as an improvement. That's also less work > for me at commit :) Since the only behavior this patch changes is the order of locking and permission checks, my vote would be to back-patch. However, since the REINDEX piece won't be back-patched, I can see why we might not here, either. Nathan
On Wed, Aug 08, 2018 at 03:38:57PM +0000, Bossart, Nathan wrote: > Here are some comments for the latest version of the patch. Thanks for the review, Nathan! > -truncate_check_rel(Relation rel) > +truncate_check_rel(Oid relid, Form_pg_class reltuple) > > Could we use HeapTupleGetOid(reltuple) instead of passing the OID > separately? If that was a HeapTuple. And it seems to me that we had better make truncate_check_rel() depend directly on a Form_pg_class instead of allowing the caller pass anything and deform the tuple within the routine. > - if (rel->rd_rel->relkind != RELKIND_RELATION && > - rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE) > + if (reltuple->relkind != RELKIND_RELATION && > + > + reltuple->relkind != RELKIND_PARTITIONED_TABLE) > > There appears to be an extra empty line here. Fixed. I don't know where this has come from. > +# Test for lock lookup conflicts with TRUNCATE when working on unowned > +# relations, particularly catalogs should trigger an ERROR for all the > +# scenarios present here. > > If possible, it might be worth it to check that TRUNCATE still blocks > when a role has privileges, too. Good idea. I have added more tests for that. Doing a truncate on pg_authid for installcheck was a very bad idea anyway (even if it failed all the time), so I have switched to a custom table, with a GRANT command to control the access with a custom role. > Since the only behavior this patch changes is the order of locking and > permission checks, my vote would be to back-patch. However, since the > REINDEX piece won't be back-patched, I can see why we might not here, > either. This patch is a bit more invasive than the REINDEX one to be honest, and as this is getting qualified as an improvement, at least on this thread, I would be inclined to be more restrictive and just patch HEAD, but not v11. Attached is an updated patch with all your suggestions added. -- Michael
Вложения
On 8/9/18, 5:29 AM, "Michael Paquier" <michael@paquier.xyz> wrote: >> -truncate_check_rel(Relation rel) >> +truncate_check_rel(Oid relid, Form_pg_class reltuple) >> >> Could we use HeapTupleGetOid(reltuple) instead of passing the OID >> separately? > > If that was a HeapTuple. And it seems to me that we had better make > truncate_check_rel() depend directly on a Form_pg_class instead of > allowing the caller pass anything and deform the tuple within the > routine. Got it. I agree, it makes sense to force the caller to provide a Form_pg_class. >> +# Test for lock lookup conflicts with TRUNCATE when working on unowned >> +# relations, particularly catalogs should trigger an ERROR for all the >> +# scenarios present here. >> >> If possible, it might be worth it to check that TRUNCATE still blocks >> when a role has privileges, too. > > Good idea. I have added more tests for that. Doing a truncate on > pg_authid for installcheck was a very bad idea anyway (even if it failed > all the time), so I have switched to a custom table, with a GRANT > command to control the access with a custom role. Good call. Accidentally truncating pg_authid might have led to some interesting test results... >> Since the only behavior this patch changes is the order of locking and >> permission checks, my vote would be to back-patch. However, since the >> REINDEX piece won't be back-patched, I can see why we might not here, >> either. > > This patch is a bit more invasive than the REINDEX one to be honest, and > as this is getting qualified as an improvement, at least on this thread, > I would be inclined to be more restrictive and just patch HEAD, but not > v11. Alright. > Attached is an updated patch with all your suggestions added. Thanks! This patch builds cleanly, the new tests pass, and my manual testing hasn't uncovered any issues. Notably, I cannot reproduce the originally reported authentication issue by using TRUNCATE after this change. Beyond a few small comment wording suggestions below, it looks good to me. + /* + * RangeVarGetRelidExtended has done some basic checks with its + * callback, and only remain session-level checks. + */ This is definitely a nitpick, but I might rephrase this to "...so only session-level checks remain" or to "...but we still need to do session-level checks." +/* + * Set of extra sanity checks to check if a given relation is safe to + * truncate. This is split with truncate_check_rel as + * RangeVarCallbackForTruncate can only call the former. + */ +static void +truncate_check_activity(Relation rel) It might be worth mentioning why RangeVarCallbackForTruncate can only call truncate_check_rel() (we haven't opened the relation yet). +# Test for lock lookup conflicts with TRUNCATE when working on unowned +# relations, particularly catalogs should trigger an ERROR for all the +# scenarios present here. Since we are testing a few more things now, perhaps this could just be "Test for locking conflicts with TRUNCATE commands." +# TRUNCATE will directly fail Maybe we could say something more like this: If the role doesn't have privileges to TRUNCATE the table, TRUNCATE should immediately fail without waiting for a lock. +# TRUNCATE will block +permutation "s1_begin" "s1_tab_lookup" "s2_grant" "s2_auth" "s2_truncate" "s1_commit" "s2_reset" +permutation "s1_begin" "s2_grant" "s2_auth" "s2_truncate" "s1_tab_lookup" "s1_commit" "s2_reset" +permutation "s1_begin" "s2_grant" "s2_auth" "s1_tab_lookup" "s2_truncate" "s1_commit" "s2_reset" +permutation "s2_grant" "s2_auth" "s2_truncate" "s1_begin" "s1_tab_lookup" "s1_commit" "s2_reset" The second and fourth tests don't seem to actually block. Perhaps we could reword the comment to say something like this: If the role has privileges to TRUNCATE the table, TRUNCATE will block if another session holds a lock on the table. Nathan
On Thu, Aug 09, 2018 at 03:27:04PM +0000, Bossart, Nathan wrote: > Thanks! This patch builds cleanly, the new tests pass, and my manual > testing hasn't uncovered any issues. Notably, I cannot reproduce the > originally reported authentication issue by using TRUNCATE after this > change. Beyond a few small comment wording suggestions below, it > looks good to me. Thanks, I have updated the patch as you suggested. Any more improvements to it that you can foresee? > The second and fourth tests don't seem to actually block. Yeah, that's intentional. -- Michael
Вложения
On 8/9/18, 11:31 AM, "Michael Paquier" <michael@paquier.xyz> wrote: > Thanks, I have updated the patch as you suggested. Any more > improvements to it that you can foresee? Looks good to me. Nathan
On Thu, Aug 09, 2018 at 05:55:54PM +0000, Bossart, Nathan wrote: > On 8/9/18, 11:31 AM, "Michael Paquier" <michael@paquier.xyz> wrote: >> Thanks, I have updated the patch as you suggested. Any more >> improvements to it that you can foresee? > > Looks good to me. Okay, pushed to HEAD. Now remains the cases for VACUUM and we will be good. I still need to look more deeply at the last patch sent.. -- Michael
Вложения
On 2018-Aug-06, Michael Paquier wrote: > Attached is a patch I have been working on which refactors the code of > TRUNCATE in such a way that we check for privileges before trying to > acquire a lock, without any user-facing impact (I have reworked a couple > of comments compared to the last version). This includes a set of tests > showing the new behavior. > > Like cbe24a6, perhaps we would not want to back-patch it? Based on the > past history (and the consensus being reached for the REINDEX case would > be to patch only HEAD), I would be actually incline to not back-patch > this stuff and qualify that as an improvement. That's also less work > for me at commit :) I'm not sure I understand your arguments for not back-patching this. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Aug 10, 2018 at 02:03:28PM -0400, Alvaro Herrera wrote: > On 2018-Aug-06, Michael Paquier wrote: >> Like cbe24a6, perhaps we would not want to back-patch it? Based on the >> past history (and the consensus being reached for the REINDEX case would >> be to patch only HEAD), I would be actually incline to not back-patch >> this stuff and qualify that as an improvement. That's also less work >> for me at commit :) > > I'm not sure I understand your arguments for not back-patching this. Mainly consistency. Looking at the git history for such cases we have not really bothered back-patching fixes and those have been qualified as improvements. If we were to close all the holes mentioned in the original DOS thread a back-patch to v11 could be thought as acceptable? That's where the REINDEX fix has found its way after all, but that was way less code churn, and we are post beta 3 for v11... -- Michael
Вложения
On 2018-Aug-10, Michael Paquier wrote: > On Fri, Aug 10, 2018 at 02:03:28PM -0400, Alvaro Herrera wrote: > > On 2018-Aug-06, Michael Paquier wrote: > >> Like cbe24a6, perhaps we would not want to back-patch it? Based on the > >> past history (and the consensus being reached for the REINDEX case would > >> be to patch only HEAD), I would be actually incline to not back-patch > >> this stuff and qualify that as an improvement. That's also less work > >> for me at commit :) > > > > I'm not sure I understand your arguments for not back-patching this. > > Mainly consistency. Looking at the git history for such cases we have > not really bothered back-patching fixes and those have been qualified as > improvements. If we were to close all the holes mentioned in the > original DOS thread a back-patch to v11 could be thought as acceptable? > That's where the REINDEX fix has found its way after all, but that was > way less code churn, and we are post beta 3 for v11... I was actually thinking in applying to all back-branches, not just pg11, considering it a fix for a pretty serious bug. But checking the history, it seems that Robert patched this is 9.2 as new development (2ad36c4e4, 1489e2f26, cbe24a6dd, 1da5c1195, 74a1d4fe7); holes remained, but none was patched until 94da2a6a in pg10 -- took some time! And then nobody cared about the ones you're patching now. So I withdraw my argumentation, mostly because there's clearly not as much interest in seeing this fixed as all that. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Aug 10, 2018 at 5:05 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I was actually thinking in applying to all back-branches, not just pg11, > considering it a fix for a pretty serious bug. But checking the > history, it seems that Robert patched this is 9.2 as new development > (2ad36c4e4, 1489e2f26, cbe24a6dd, 1da5c1195, 74a1d4fe7); holes remained, > but none was patched until 94da2a6a in pg10 -- took some time! And then > nobody cared about the ones you're patching now. > > So I withdraw my argumentation, mostly because there's clearly not as > much interest in seeing this fixed as all that. The original patches would, I think, have been pretty scary to back-patch, since the infrastructure didn't exist in older branches and we were churning a fairly large amount of code. Now that most places are fixed and things have had five years to bake, we could conceivably back-patch the remaining fixes. However, I wonder if we've really looked into how many instances of this problem remain. If there's still ten more that we haven't bothered to fix, back-patching one or two that we've gotten around to doing something about doesn't make a lot of sense to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Aug 13, 2018 at 01:39:06PM -0400, Robert Haas wrote: > The original patches would, I think, have been pretty scary to > back-patch, since the infrastructure didn't exist in older branches > and we were churning a fairly large amount of code. Now that most > places are fixed and things have had five years to bake, we could > conceivably back-patch the remaining fixes. However, I wonder if > we've really looked into how many instances of this problem remain. > If there's still ten more that we haven't bothered to fix, > back-patching one or two that we've gotten around to doing something > about doesn't make a lot of sense to me. If we are confident enough to say that all the holes have been patched, then we could only back-patch down to v11 in my opinion as REINDEX needed a change of behavior for the handling of shared catalog. FWIW, I have spent some time fixing all the issues reported on the original thread, but I did not double-check all commands using an exclusive lock, hence all the issues I have known of are: - REINDEX with shared catalogs, fixed by 661dd23. - TRUNCATE, with something commit only on HEAD with f841ceb2. - VACUUM FULL, for which I have submitted a patch proposal: https://www.postgresql.org/message-id/20180812222142.GA6097%40paquier.xyz -- Michael
Вложения
On 2018-Aug-16, Michael Paquier wrote: > On Mon, Aug 13, 2018 at 01:39:06PM -0400, Robert Haas wrote: > > The original patches would, I think, have been pretty scary to > > back-patch, since the infrastructure didn't exist in older branches > > and we were churning a fairly large amount of code. Now that most > > places are fixed and things have had five years to bake, we could > > conceivably back-patch the remaining fixes. However, I wonder if > > we've really looked into how many instances of this problem remain. > > If there's still ten more that we haven't bothered to fix, > > back-patching one or two that we've gotten around to doing something > > about doesn't make a lot of sense to me. > > If we are confident enough to say that all the holes have been patched, > then we could only back-patch down to v11 in my opinion as REINDEX > needed a change of behavior for the handling of shared catalog. I searched for uses of RangeVarGetRelid, as well as heap_openrv and relation_openrv, and there are a couple that looks very suspicious. I don't think we can claim yet that all holes are fixed. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services