Обсуждение: ModifyTable overheads in generic plans
Hi, I would like to discuss a refactoring patch that builds on top of the patches at [1] to address $subject. To get an idea for what eliminating these overheads looks like, take a look at the following benchmarking results. Note 1: I've forced the use of generic plan by setting plan_cache_mode to 'force_generic_plan' Note 2: The individual TPS figures as measured are quite noisy, though I just want to show the rough trend with increasing number of partitions. pgbench -i -s 10 --partitions={0, 10, 100, 1000} pgbench -T120 -f test.sql -M prepared test.sql: \set aid random(1, 1000000) update pgbench_accounts set abalance = abalance + 1 where aid = :aid; Without any of the patches: 0 tps = 13045.485121 (excluding connections establishing) 10 tps = 9358.157433 (excluding connections establishing) 100 tps = 1878.274500 (excluding connections establishing) 1000 tps = 84.684695 (excluding connections establishing) The slowdown as the partition count increases can be explained by the fact that UPDATE and DELETE can't currently use runtime partition pruning. So, even if any given transaction is only updating a single tuple in a single partition, the plans for *all* partitions are being initialized and also the ResultRelInfos. That is, a lot of useless work being done in InitPlan() and ExecInitModifyTable(). With the patches at [1] (latest 0001+0002 posted there), whereby the generic plan for UPDATE can now perform runtime pruning, numbers can be seen to improve, slightly: 0 tps = 12743.487196 (excluding connections establishing) 10 tps = 12644.240748 (excluding connections establishing) 100 tps = 4158.123345 (excluding connections establishing) 1000 tps = 391.248067 (excluding connections establishing) So even though runtime pruning enabled by those patches ensures that the useless plans are left untouched by the executor, the ResultRelInfos are still being made assuming *all* result relations will be processed. With the attached patches (0001+0002+0003) that I want to discuss here in this thread, numbers are further improved: 0 tps = 13419.283168 (excluding connections establishing) 10 tps = 12588.016095 (excluding connections establishing) 100 tps = 8560.824225 (excluding connections establishing) 1000 tps = 1926.553901 (excluding connections establishing) 0001 and 0002 are preparatory patches. 0003 teaches nodeModifyTable.c to make the ResultRelInfo for a given result relation lazily, that is, when the plan producing tuples to be updated/deleted actually produces one that belongs to that relation. So, if a transaction only updates one tuple, then only one ResultRelInfo would be made. For larger partition counts, that saves significant amount of work. However, there's one new loop in ExecInitModifyTable() added by the patches at [1] that loops over all partitions, which I haven't been able to eliminate so far and I'm seeing it cause significant bottleneck at higher partition counts. The loop is meant to create a hash table that maps result relation OIDs to their offsets in the PlannedStmt.resultRelations list. We need this mapping, because the ResultRelInfos are accessed from the query-global array using that offset. One approach that was mentioned by David Rowley at [1] to not have do this mapping is to make the result relation's scan node's targetlist emit the relation's RT index or ordinal position to begin with, instead of the table OID, but I haven't figured out a way to do that. Having taken care of the ModifyTable overheads (except the one mentioned in the last paragraph), a few more bottlenecks are seen to pop up at higher partition counts. Basically, they result from doing some pre-execution actions on relations contained in the plan by traversing the flat range table in whole. 1. AcquireExecutorLocks(): locks *all* partitions before executing the plan tree but runtime pruning allows to skip scanning all but one 2. ExecCheckRTPerms(): checks permissions of *all* partitions before executing the plan tree, but maybe it's okay to check only the ones that will be accessed Problem 1 has been discussed before and David Rowley even developed a patch that was discussed at [2]. The approach taken in the patch was to delay locking of the partitions contained in a generic plan that are potentially runtime pruneable, although as also described in the linked thread, that approach has a race condition whereby a concurrent session may invalidate the generic plan by altering a partition in the window between when AcquireExecutorLocks() runs on the plan and the plan is executed. Another solution suggested to me by Robert Haas in an off-list discussion is to teach AcquireExecutorLocks() or the nearby code to perform EXTERN parameter based pruning before passing the plan tree to the executor and lock partitions that survive that pruning. It's perhaps doable if we refactor the ExecFindInitialMatchingSubPlans() to not require a full-blown execution context. Or maybe we could do something more invasive by rewriting AcquireExecutorLocks() to walk the plan tree instead of the flat range table, looking for scan nodes and nodes that support runtime pruning to lock the appropriate relations. Regarding problem 2, I wonder if we shouldn't simply move the permission check to ExecGetRangeTableRelation(), which will be performed the first time a given relation is accessed during execution. Anyway, applying David's aforementioned patch gives the following numbers: 0 tps = 12325.890487 (excluding connections establishing) 10 tps = 12146.420443 (excluding connections establishing) 100 tps = 12807.850709 (excluding connections establishing) 1000 tps = 7578.652893 (excluding connections establishing) which suggests that it might be worthwhile try to find a solution for this. Finally, there's one more place that shows up in perf profiles at higher partition counts and that is LockReleaseAll(). David, Tsunakawa-san had worked on a patch [3], which still applies and can be shown to be quite beneficial when generic plans are involved. I couldn't get it to show major improvement over the above numbers in this case (for UPDATE that is), but maybe that's because the loop in ExecInitModifyTable() mentioned above is still in the way. -- Amit Langote EnterpriseDB: http://www.enterprisedb.com [1] https://commitfest.postgresql.org/28/2575/ [2] https://www.postgresql.org/message-id/flat/CAKJS1f_kfRQ3ZpjQyHC7%3DPK9vrhxiHBQFZ%2Bhc0JCwwnRKkF3hg%40mail.gmail.com [3] https://www.postgresql.org/message-id/flat/CAKJS1f-7T9F1xLw5PqgOApcV6YX3WYC4XJHHCpxh8hzcZsA-xA%40mail.gmail.com#c57f2f592484bca76310f4c551d4de15
Вложения
On Fri, Jun 26, 2020 at 9:36 PM Amit Langote <amitlangote09@gmail.com> wrote: > I would like to discuss a refactoring patch that builds on top of the > patches at [1] to address $subject. I've added this to the next CF: https://commitfest.postgresql.org/28/2621/ -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
On Sat, 27 Jun 2020 at 00:36, Amit Langote <amitlangote09@gmail.com> wrote: > 2. ExecCheckRTPerms(): checks permissions of *all* partitions before > executing the plan tree, but maybe it's okay to check only the ones > that will be accessed I don't think it needs to be quite as complex as that. expand_single_inheritance_child will set the RangeTblEntry.requiredPerms to 0, so we never need to check permissions on a partition. The overhead of permission checking when there are many partitions is just down to the fact that ExecCheckRTPerms() loops over the entire rangetable and calls ExecCheckRTEPerms for each one. ExecCheckRTEPerms() does have very little work to do when requiredPerms is 0, but the loop itself and the function call overhead show up when you remove the other bottlenecks. I have a patch somewhere that just had the planner add the RTindexes with a non-zero requiredPerms and set that in the plan so that ExecCheckRTPerms could just look at the ones that actually needed something checked. There's a slight disadvantage there that for queries to non-partitioned tables that we need to build a Bitmapset that has all items from the rangetable. That's likely a small overhead, but not free, so perhaps there is a better way. David
On Mon, Jun 29, 2020 at 10:39 AM David Rowley <dgrowleyml@gmail.com> wrote: > > On Sat, 27 Jun 2020 at 00:36, Amit Langote <amitlangote09@gmail.com> wrote: > > 2. ExecCheckRTPerms(): checks permissions of *all* partitions before > > executing the plan tree, but maybe it's okay to check only the ones > > that will be accessed > > I don't think it needs to be quite as complex as that. > expand_single_inheritance_child will set the > RangeTblEntry.requiredPerms to 0, so we never need to check > permissions on a partition. The overhead of permission checking when > there are many partitions is just down to the fact that > ExecCheckRTPerms() loops over the entire rangetable and calls > ExecCheckRTEPerms for each one. ExecCheckRTEPerms() does have very > little work to do when requiredPerms is 0, but the loop itself and the > function call overhead show up when you remove the other bottlenecks. I had forgotten that we set requiredPerms to 0 for the inheritance child tables. > I have a patch somewhere that just had the planner add the RTindexes > with a non-zero requiredPerms and set that in the plan so that > ExecCheckRTPerms could just look at the ones that actually needed > something checked. There's a slight disadvantage there that for > queries to non-partitioned tables that we need to build a Bitmapset > that has all items from the rangetable. That's likely a small > overhead, but not free, so perhaps there is a better way. I can't think of anything for this that doesn't involve having one more list of RTEs or bitmapset of RT indexes in PlannedStmt. -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
On Fri, Jun 26, 2020 at 9:36 PM Amit Langote <amitlangote09@gmail.com> wrote: > I would like to discuss a refactoring patch that builds on top of the > patches at [1] to address $subject. I forgot to update a place in postgres_fdw causing one of its tests to crash. Fixed in the attached updated version. -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
Вложения
> On 1 Jul 2020, at 08:30, Amit Langote <amitlangote09@gmail.com> wrote: > > On Fri, Jun 26, 2020 at 9:36 PM Amit Langote <amitlangote09@gmail.com> wrote: >> I would like to discuss a refactoring patch that builds on top of the >> patches at [1] to address $subject. > > I forgot to update a place in postgres_fdw causing one of its tests to crash. > > Fixed in the attached updated version. The attached 0003 fails to apply to current HEAD, please submit another rebased version. Marking the entry as Waiting on Author in the meantime. cheers ./daniel
Hi Daniel, On Wed, Jul 1, 2020 at 6:50 PM Daniel Gustafsson <daniel@yesql.se> wrote: > > On 1 Jul 2020, at 08:30, Amit Langote <amitlangote09@gmail.com> wrote: > > > > On Fri, Jun 26, 2020 at 9:36 PM Amit Langote <amitlangote09@gmail.com> wrote: > >> I would like to discuss a refactoring patch that builds on top of the > >> patches at [1] to address $subject. > > > > I forgot to update a place in postgres_fdw causing one of its tests to crash. > > > > Fixed in the attached updated version. > > The attached 0003 fails to apply to current HEAD, please submit another rebased > version. Marking the entry as Waiting on Author in the meantime. Thank you for the heads up. Actually, as I noted in the first email, the patches here are to be applied on top of patches of another thread that I chose not to post here. But I can see how that is inconvenient both for the CF bot and other humans, so I'm attaching all of the patches. Another thing I could do is decouple the patches to discuss here from the patches of the other thread, which should be possible and might be good to avoid back and forth between the two threads. -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
Вложения
- v2-0005-Delay-initializing-UPDATE-DELETE-ResultRelInfos.patch
- v2-0003-Revise-how-some-FDW-executor-APIs-obtain-ResultRe.patch
- v2-0004-Do-not-set-rootResultRelIndex-unnecessarily.patch
- v2-0001-Overhaul-UPDATE-s-targetlist-processing.patch
- v2-0002-Overhaul-how-inherited-update-delete-are-handled.patch
> On 1 Jul 2020, at 15:38, Amit Langote <amitlangote09@gmail.com> wrote: > Another thing I could do is decouple the patches to discuss here from > the patches of the other thread, which should be possible and might be > good to avoid back and forth between the two threads. It sounds like it would make it easier for reviewers, so if it's possible with a reasonable effort it might be worth it. I've moved this entry to the next CF for now. cheers ./daniel
On Fri, Jun 26, 2020 at 8:36 AM Amit Langote <amitlangote09@gmail.com> wrote: > 0001 and 0002 are preparatory patches. I read through these patches a bit but it's really unclear what the point of them is. I think they need better commit messages, or better comments, or both. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Aug 1, 2020 at 4:46 AM Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jun 26, 2020 at 8:36 AM Amit Langote <amitlangote09@gmail.com> wrote: > > 0001 and 0002 are preparatory patches. > > I read through these patches a bit but it's really unclear what the > point of them is. I think they need better commit messages, or better > comments, or both. Thanks for taking a look. Sorry about the lack of good commentary, which I have tried to address in the attached updated version. I extracted one more part as preparatory from the earlier 0003 patch, so there are 4 patches now. Also as discussed with Daniel, I have changed the patches so that they can be applied on plain HEAD instead of having to first apply the patches at [1]. Without runtime pruning for UPDATE/DELETE proposed in [1], optimizing ResultRelInfo creation by itself does not improve the performance/scalability by that much, but the benefit of lazily creating ResultRelInfos seems clear so I think maybe it's okay to pursue this independently. -- Amit Langote EnterpriseDB: http://www.enterprisedb.com [1] https://www.postgresql.org/message-id/CA%2BHiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ%40mail.gmail.com
Вложения
On Tue, Aug 4, 2020 at 3:15 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Sat, Aug 1, 2020 at 4:46 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Jun 26, 2020 at 8:36 AM Amit Langote <amitlangote09@gmail.com> wrote: > > > 0001 and 0002 are preparatory patches. > > > > I read through these patches a bit but it's really unclear what the > > point of them is. I think they need better commit messages, or better > > comments, or both. > > Thanks for taking a look. Sorry about the lack of good commentary, > which I have tried to address in the attached updated version. I > extracted one more part as preparatory from the earlier 0003 patch, so > there are 4 patches now. > > Also as discussed with Daniel, I have changed the patches so that they > can be applied on plain HEAD instead of having to first apply the > patches at [1]. Without runtime pruning for UPDATE/DELETE proposed in > [1], optimizing ResultRelInfo creation by itself does not improve the > performance/scalability by that much, but the benefit of lazily > creating ResultRelInfos seems clear so I think maybe it's okay to > pursue this independently. Per cfbot's automatic patch tester, there were some issues in the 0004 patch: nodeModifyTable.c: In function ‘ExecModifyTable’: 1529nodeModifyTable.c:2484:24: error: ‘junkfilter’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 1530 junkfilter->jf_junkAttNo, 1531 ^ 1532nodeModifyTable.c:2309:14: note: ‘junkfilter’ was declared here 1533 JunkFilter *junkfilter; 1534 ^ 1535cc1: all warnings being treated as errors 1536<builtin>: recipe for target 'nodeModifyTable.o' failed 1537make[3]: *** [nodeModifyTable.o] Error 1 Fixed in the attached updated version -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
Вложения
Hello, On Fri, Aug 7, 2020 at 9:26 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Tue, Aug 4, 2020 at 3:15 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Sat, Aug 1, 2020 at 4:46 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > On Fri, Jun 26, 2020 at 8:36 AM Amit Langote <amitlangote09@gmail.com> wrote: > > > > 0001 and 0002 are preparatory patches. > > > > > > I read through these patches a bit but it's really unclear what the > > > point of them is. I think they need better commit messages, or better > > > comments, or both. > > > > Thanks for taking a look. Sorry about the lack of good commentary, > > which I have tried to address in the attached updated version. I > > extracted one more part as preparatory from the earlier 0003 patch, so > > there are 4 patches now. > > > > Also as discussed with Daniel, I have changed the patches so that they > > can be applied on plain HEAD instead of having to first apply the > > patches at [1]. Without runtime pruning for UPDATE/DELETE proposed in > > [1], optimizing ResultRelInfo creation by itself does not improve the > > performance/scalability by that much, but the benefit of lazily > > creating ResultRelInfos seems clear so I think maybe it's okay to > > pursue this independently. > > Per cfbot's automatic patch tester, there were some issues in the 0004 patch: > > nodeModifyTable.c: In function ‘ExecModifyTable’: > 1529nodeModifyTable.c:2484:24: error: ‘junkfilter’ may be used > uninitialized in this function [-Werror=maybe-uninitialized] > 1530 junkfilter->jf_junkAttNo, > 1531 ^ > 1532nodeModifyTable.c:2309:14: note: ‘junkfilter’ was declared here > 1533 JunkFilter *junkfilter; > 1534 ^ > 1535cc1: all warnings being treated as errors > 1536<builtin>: recipe for target 'nodeModifyTable.o' failed > 1537make[3]: *** [nodeModifyTable.o] Error 1 > > Fixed in the attached updated version Needed a rebase due to f481d28232. Attached updated patches. -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
Вложения
Attached updated patches based on recent the discussion at: * Re: partition routing layering in nodeModifyTable.c * https://www.postgresql.org/message-id/CA%2BHiwqHpmMjenQqNpMHrhg3DRhqqQfby2RCT1HWVwMin3_5vMA%40mail.gmail.com 0001 adjusts how ForeignScanState.resultRelInfo is initialized for use by direct modify operations. 0002 refactors ResultRelInfo initialization do be don lazily on first use I call these v6, because the last version posted on this thread was v5, even though it went through a couple of iterations on the above thread. Sorry about the confusion. -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
On 30/10/2020 08:13, Amit Langote wrote: > /* > * Perform WITH CHECK OPTIONS check, if any. > */ > static void > ExecProcessWithCheckOptions(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo, > TupleTableSlot *slot, WCOKind wco_kind) > { > ModifyTable *node = (ModifyTable *) mtstate->ps.plan; > EState *estate = mtstate->ps.state; > > if (node->withCheckOptionLists == NIL) > return; > > /* Initialize expression state if not already done. */ > if (resultRelInfo->ri_WithCheckOptions == NIL) > { > int whichrel = resultRelInfo - mtstate->resultRelInfo; > List *wcoList; > List *wcoExprs = NIL; > ListCell *ll; > > Assert(whichrel >= 0 && whichrel < mtstate->mt_nplans); > wcoList = (List *) list_nth(node->withCheckOptionLists, whichrel); > foreach(ll, wcoList) > { > WithCheckOption *wco = (WithCheckOption *) lfirst(ll); > ExprState *wcoExpr = ExecInitQual((List *) wco->qual, > &mtstate->ps); > > wcoExprs = lappend(wcoExprs, wcoExpr); > } > > resultRelInfo->ri_WithCheckOptions = wcoList; > resultRelInfo->ri_WithCheckOptionExprs = wcoExprs; > } > > /* > * ExecWithCheckOptions() will skip any WCOs which are not of the kind > * we are looking for at this point. > */ > ExecWithCheckOptions(wco_kind, resultRelInfo, slot, estate); > } Can we do this initialization in ExecGetResultRelation()? That would seem much more straightforward. Is there any advantage to delaying it here? And same thing with the junk filter and the RETURNING list. (/me reads patch further) I presume that's what you referred to in the commit message: > Also, extend this lazy initialization approach to some of the > individual fields of ResultRelInfo such that even for the result > relations that are initialized, those fields are only initialized on > first access. While no performance improvement is to be expected > there, it can lead to a simpler initialization logic of the > ResultRelInfo itself, because the conditions for whether a given > field is needed or not tends to look confusing. One side-effect > of this is that any "SubPlans" referenced in the expressions of > those fields are also lazily initialized and hence changes the > output of EXPLAIN (without ANALYZE) in some regression tests. I'm now curious what the initialization logic would look like, if we initialized those fields in ExecGetResultRelation(). At a quick glance on the conditions on when those initializations are done in the patch now, it would seem pretty straightforward. If the target list contains any junk columns, initialize junk filter, and if ModifyTable->returningLists is set, initialize RETURNING list. Maybe I'm missing something. - Heikki
On Mon, Nov 2, 2020 at 10:19 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 30/10/2020 08:13, Amit Langote wrote: > > /* > > * Perform WITH CHECK OPTIONS check, if any. > > */ > > static void > > ExecProcessWithCheckOptions(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo, > > TupleTableSlot *slot, WCOKind wco_kind) > > { > > ModifyTable *node = (ModifyTable *) mtstate->ps.plan; > > EState *estate = mtstate->ps.state; > > > > if (node->withCheckOptionLists == NIL) > > return; > > > > /* Initialize expression state if not already done. */ > > if (resultRelInfo->ri_WithCheckOptions == NIL) > > { > > int whichrel = resultRelInfo - mtstate->resultRelInfo; > > List *wcoList; > > List *wcoExprs = NIL; > > ListCell *ll; > > > > Assert(whichrel >= 0 && whichrel < mtstate->mt_nplans); > > wcoList = (List *) list_nth(node->withCheckOptionLists, whichrel); > > foreach(ll, wcoList) > > { > > WithCheckOption *wco = (WithCheckOption *) lfirst(ll); > > ExprState *wcoExpr = ExecInitQual((List *) wco->qual, > > &mtstate->ps); > > > > wcoExprs = lappend(wcoExprs, wcoExpr); > > } > > > > resultRelInfo->ri_WithCheckOptions = wcoList; > > resultRelInfo->ri_WithCheckOptionExprs = wcoExprs; > > } > > > > /* > > * ExecWithCheckOptions() will skip any WCOs which are not of the kind > > * we are looking for at this point. > > */ > > ExecWithCheckOptions(wco_kind, resultRelInfo, slot, estate); > > } > > Can we do this initialization in ExecGetResultRelation()? That would > seem much more straightforward. Is there any advantage to delaying it > here? And same thing with the junk filter and the RETURNING list. > > (/me reads patch further) I presume that's what you referred to in the > commit message: > > > Also, extend this lazy initialization approach to some of the > > individual fields of ResultRelInfo such that even for the result > > relations that are initialized, those fields are only initialized on > > first access. While no performance improvement is to be expected > > there, it can lead to a simpler initialization logic of the > > ResultRelInfo itself, because the conditions for whether a given > > field is needed or not tends to look confusing. One side-effect > > of this is that any "SubPlans" referenced in the expressions of > > those fields are also lazily initialized and hence changes the > > output of EXPLAIN (without ANALYZE) in some regression tests. > > > I'm now curious what the initialization logic would look like, if we > initialized those fields in ExecGetResultRelation(). At a quick glance > on the conditions on when those initializations are done in the patch > now, it would seem pretty straightforward. If the target list contains > any junk columns, initialize junk filter, and if > ModifyTable->returningLists is set, initialize RETURNING list. Maybe I'm > missing something. Yeah, it's not that complicated to initialize those things in ExecGetResultRelation(). In fact, ExecGetResultRelation() (or its subroutine ExecBuildResultRelation()) housed those initializations in the earlier versions of this patch, but I changed that after our discussion about being lazy about initializing as much stuff as we can. Maybe I should revert that? -- Amit Langote EDB: http://www.enterprisedb.com
On Mon, Nov 2, 2020 at 10:53 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Mon, Nov 2, 2020 at 10:19 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > (/me reads patch further) I presume that's what you referred to in the > > commit message: > > > > > Also, extend this lazy initialization approach to some of the > > > individual fields of ResultRelInfo such that even for the result > > > relations that are initialized, those fields are only initialized on > > > first access. While no performance improvement is to be expected > > > there, it can lead to a simpler initialization logic of the > > > ResultRelInfo itself, because the conditions for whether a given > > > field is needed or not tends to look confusing. One side-effect > > > of this is that any "SubPlans" referenced in the expressions of > > > those fields are also lazily initialized and hence changes the > > > output of EXPLAIN (without ANALYZE) in some regression tests. > > > > > > I'm now curious what the initialization logic would look like, if we > > initialized those fields in ExecGetResultRelation(). At a quick glance > > on the conditions on when those initializations are done in the patch > > now, it would seem pretty straightforward. If the target list contains > > any junk columns, initialize junk filter, and if > > ModifyTable->returningLists is set, initialize RETURNING list. Maybe I'm > > missing something. > > Yeah, it's not that complicated to initialize those things in > ExecGetResultRelation(). In fact, ExecGetResultRelation() (or its > subroutine ExecBuildResultRelation()) housed those initializations in > the earlier versions of this patch, but I changed that after our > discussion about being lazy about initializing as much stuff as we > can. Maybe I should revert that? Please check the attached if that looks better. -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
On 03/11/2020 10:27, Amit Langote wrote: > Please check the attached if that looks better. Great, thanks! Yeah, I like that much better. This makes me a bit unhappy: > > /* Also let FDWs init themselves for foreign-table result rels */ > if (resultRelInfo->ri_FdwRoutine != NULL) > { > if (resultRelInfo->ri_usesFdwDirectModify) > { > ForeignScanState *fscan = (ForeignScanState *) mtstate->mt_plans[i]; > > /* > * For the FDW's convenience, set the ForeignScanState node's > * ResultRelInfo to let the FDW know which result relation it > * is going to work with. > */ > Assert(IsA(fscan, ForeignScanState)); > fscan->resultRelInfo = resultRelInfo; > resultRelInfo->ri_FdwRoutine->BeginDirectModify(fscan, eflags); > } > else if (resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL) > { > List *fdw_private = (List *) list_nth(node->fdwPrivLists, i); > > resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate, > resultRelInfo, > fdw_private, > i, > eflags); > } > } If you remember, I was unhappy with a similar assertion in the earlier patches [1]. I'm not sure what to do instead though. A few options: A) We could change FDW API so that BeginDirectModify takes the same arguments as BeginForeignModify(). That avoids the assumption that it's a ForeignScan node, because BeginForeignModify() doesn't take ForeignScanState as argument. That would be consistent, which is nice. But I think we'd somehow still need to pass the ResultRelInfo to the corresponding ForeignScan, and I'm not sure how. B) Look up the ResultRelInfo, and call BeginDirectModify(), on the first call to ForeignNext(). C) Accept the Assertion. And add an elog() check in the planner for that with a proper error message. I'm leaning towards B), but maybe there's some better solution I didn't think of? Perhaps changing the API would make sense in any case, it is a bit weird as it is. Backwards-incompatible API changes are not nice, but I don't think there are many FDWs out there that implement the DirectModify functions. And those functions are pretty tightly coupled with the executor and ModifyTable node details anyway, so I don't feel like we can, or need to, guarantee that they stay unchanged across major versions. [1] https://www.postgresql.org/message-id/19c23dd9-89ce-75a3-9105-5fc05a46f94a%40iki.fi - Heikki
On Tue, Nov 3, 2020 at 9:05 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 03/11/2020 10:27, Amit Langote wrote: > > Please check the attached if that looks better. > > Great, thanks! Yeah, I like that much better. > > This makes me a bit unhappy: > > > > > /* Also let FDWs init themselves for foreign-table result rels */ > > if (resultRelInfo->ri_FdwRoutine != NULL) > > { > > if (resultRelInfo->ri_usesFdwDirectModify) > > { > > ForeignScanState *fscan = (ForeignScanState *) mtstate->mt_plans[i]; > > > > /* > > * For the FDW's convenience, set the ForeignScanState node's > > * ResultRelInfo to let the FDW know which result relation it > > * is going to work with. > > */ > > Assert(IsA(fscan, ForeignScanState)); > > fscan->resultRelInfo = resultRelInfo; > > resultRelInfo->ri_FdwRoutine->BeginDirectModify(fscan, eflags); > > } > > else if (resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL) > > { > > List *fdw_private = (List *) list_nth(node->fdwPrivLists, i); > > > > resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate, > > resultRelInfo, > > fdw_private, > > i, > > eflags); > > } > > } > > If you remember, I was unhappy with a similar assertion in the earlier > patches [1]. I'm not sure what to do instead though. A few options: > > A) We could change FDW API so that BeginDirectModify takes the same > arguments as BeginForeignModify(). That avoids the assumption that it's > a ForeignScan node, because BeginForeignModify() doesn't take > ForeignScanState as argument. That would be consistent, which is nice. > But I think we'd somehow still need to pass the ResultRelInfo to the > corresponding ForeignScan, and I'm not sure how. Maybe ForeignScan doesn't need to contain any result relation info then? ForeignScan.operation != CMD_SELECT is enough to tell it to call IterateDirectModify() as today. > B) Look up the ResultRelInfo, and call BeginDirectModify(), on the first > call to ForeignNext(). > > C) Accept the Assertion. And add an elog() check in the planner for that > with a proper error message. > > I'm leaning towards B), but maybe there's some better solution I didn't > think of? Perhaps changing the API would make sense in any case, it is a > bit weird as it is. Backwards-incompatible API changes are not nice, but > I don't think there are many FDWs out there that implement the > DirectModify functions. And those functions are pretty tightly coupled > with the executor and ModifyTable node details anyway, so I don't feel > like we can, or need to, guarantee that they stay unchanged across major > versions. B is not too bad, but I tend to prefer doing A too. -- Amit Langote EDB: http://www.enterprisedb.com
On Wed, Nov 4, 2020 at 11:32 AM Amit Langote <amitlangote09@gmail.com> wrote: > On Tue, Nov 3, 2020 at 9:05 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > A) We could change FDW API so that BeginDirectModify takes the same > > arguments as BeginForeignModify(). That avoids the assumption that it's > > a ForeignScan node, because BeginForeignModify() doesn't take > > ForeignScanState as argument. That would be consistent, which is nice. > > But I think we'd somehow still need to pass the ResultRelInfo to the > > corresponding ForeignScan, and I'm not sure how. > > Maybe ForeignScan doesn't need to contain any result relation info > then? ForeignScan.operation != CMD_SELECT is enough to tell it to > call IterateDirectModify() as today. > > > B) Look up the ResultRelInfo, and call BeginDirectModify(), on the first > > call to ForeignNext(). > > > > C) Accept the Assertion. And add an elog() check in the planner for that > > with a proper error message. > > > > I'm leaning towards B), but maybe there's some better solution I didn't > > think of? Perhaps changing the API would make sense in any case, it is a > > bit weird as it is. Backwards-incompatible API changes are not nice, but > > I don't think there are many FDWs out there that implement the > > DirectModify functions. And those functions are pretty tightly coupled > > with the executor and ModifyTable node details anyway, so I don't feel > > like we can, or need to, guarantee that they stay unchanged across major > > versions. > > B is not too bad, but I tend to prefer doing A too. How about I update the 0001 patch to implement A? -- Amit Langote EDB: http://www.enterprisedb.com
On Wed, Nov 4, 2020 at 11:32 AM Amit Langote <amitlangote09@gmail.com> wrote: > On Tue, Nov 3, 2020 at 9:05 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > On 03/11/2020 10:27, Amit Langote wrote: > > > Please check the attached if that looks better. > > > > Great, thanks! Yeah, I like that much better. > > > > This makes me a bit unhappy: > > > > > > > > /* Also let FDWs init themselves for foreign-table result rels */ > > > if (resultRelInfo->ri_FdwRoutine != NULL) > > > { > > > if (resultRelInfo->ri_usesFdwDirectModify) > > > { > > > ForeignScanState *fscan = (ForeignScanState *) mtstate->mt_plans[i]; > > > > > > /* > > > * For the FDW's convenience, set the ForeignScanState node's > > > * ResultRelInfo to let the FDW know which result relation it > > > * is going to work with. > > > */ > > > Assert(IsA(fscan, ForeignScanState)); > > > fscan->resultRelInfo = resultRelInfo; > > > resultRelInfo->ri_FdwRoutine->BeginDirectModify(fscan, eflags); > > > } > > > else if (resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL) > > > { > > > List *fdw_private = (List *) list_nth(node->fdwPrivLists, i); > > > > > > resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate, > > > resultRelInfo, > > > fdw_private, > > > i, > > > eflags); > > > } > > > } > > > > If you remember, I was unhappy with a similar assertion in the earlier > > patches [1]. I'm not sure what to do instead though. A few options: > > > > A) We could change FDW API so that BeginDirectModify takes the same > > arguments as BeginForeignModify(). That avoids the assumption that it's > > a ForeignScan node, because BeginForeignModify() doesn't take > > ForeignScanState as argument. That would be consistent, which is nice. > > But I think we'd somehow still need to pass the ResultRelInfo to the > > corresponding ForeignScan, and I'm not sure how. > > Maybe ForeignScan doesn't need to contain any result relation info > then? ForeignScan.operation != CMD_SELECT is enough to tell it to > call IterateDirectModify() as today. Hmm, I misspoke. We do still need ForeignScanState.resultRelInfo, because the IterateDirectModify() API uses it to return the remotely inserted/updated/deleted tuple for the RETURNING projection performed by ExecModifyTable(). > > B) Look up the ResultRelInfo, and call BeginDirectModify(), on the first > > call to ForeignNext(). > > > > C) Accept the Assertion. And add an elog() check in the planner for that > > with a proper error message. > > > > I'm leaning towards B), but maybe there's some better solution I didn't > > think of? Perhaps changing the API would make sense in any case, it is a > > bit weird as it is. Backwards-incompatible API changes are not nice, but > > I don't think there are many FDWs out there that implement the > > DirectModify functions. And those functions are pretty tightly coupled > > with the executor and ModifyTable node details anyway, so I don't feel > > like we can, or need to, guarantee that they stay unchanged across major > > versions. > > B is not too bad, but I tend to prefer doing A too. On second thought, it seems A would amount to merely a cosmetic adjustment of the API, nothing more. B seems to get the job done for me and also doesn't unnecessarily break compatibility, so I've updated 0001 to implement B. Please give it a look. -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
On 10/11/2020 13:12, Amit Langote wrote: > On Wed, Nov 4, 2020 at 11:32 AM Amit Langote <amitlangote09@gmail.com> wrote: >> On Tue, Nov 3, 2020 at 9:05 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >>> A) We could change FDW API so that BeginDirectModify takes the same >>> arguments as BeginForeignModify(). That avoids the assumption that it's >>> a ForeignScan node, because BeginForeignModify() doesn't take >>> ForeignScanState as argument. That would be consistent, which is nice. >>> But I think we'd somehow still need to pass the ResultRelInfo to the >>> corresponding ForeignScan, and I'm not sure how. >> >> Maybe ForeignScan doesn't need to contain any result relation info >> then? ForeignScan.operation != CMD_SELECT is enough to tell it to >> call IterateDirectModify() as today. > > Hmm, I misspoke. We do still need ForeignScanState.resultRelInfo, > because the IterateDirectModify() API uses it to return the remotely > inserted/updated/deleted tuple for the RETURNING projection performed > by ExecModifyTable(). > >>> B) Look up the ResultRelInfo, and call BeginDirectModify(), on the first >>> call to ForeignNext(). >>> >>> C) Accept the Assertion. And add an elog() check in the planner for that >>> with a proper error message. >>> >>> I'm leaning towards B), but maybe there's some better solution I didn't >>> think of? Perhaps changing the API would make sense in any case, it is a >>> bit weird as it is. Backwards-incompatible API changes are not nice, but >>> I don't think there are many FDWs out there that implement the >>> DirectModify functions. And those functions are pretty tightly coupled >>> with the executor and ModifyTable node details anyway, so I don't feel >>> like we can, or need to, guarantee that they stay unchanged across major >>> versions. >> >> B is not too bad, but I tend to prefer doing A too. > > On second thought, it seems A would amount to merely a cosmetic > adjustment of the API, nothing more. B seems to get the job done for > me and also doesn't unnecessarily break compatibility, so I've updated > 0001 to implement B. Please give it a look. Looks good at a quick glance. It is a small API break that BeginDirectModify() is now called during execution, not at executor startup, but I don't think that's going to break FDWs in practice. One could argue, though, that if we're going to change the API, we should do it more loudly. So changing the arguments might be a good thing. The BeginDirectModify() and BeginForeignModify() interfaces are inconsistent, but that's not this patch's fault. I wonder if we could move the call to BeginForeignModify() also to ForeignNext(), though? And BeginForeignScan() too, while we're at it. Overall, this is probably fine as it is though. I'll review more thorougly tomorrow. - Heikki
On 10/11/2020 17:32, Heikki Linnakangas wrote: > On 10/11/2020 13:12, Amit Langote wrote: >> On second thought, it seems A would amount to merely a cosmetic >> adjustment of the API, nothing more. B seems to get the job done for >> me and also doesn't unnecessarily break compatibility, so I've updated >> 0001 to implement B. Please give it a look. > > Looks good at a quick glance. It is a small API break that > BeginDirectModify() is now called during execution, not at executor > startup, but I don't think that's going to break FDWs in practice. One > could argue, though, that if we're going to change the API, we should do > it more loudly. So changing the arguments might be a good thing. > > The BeginDirectModify() and BeginForeignModify() interfaces are > inconsistent, but that's not this patch's fault. I wonder if we could > move the call to BeginForeignModify() also to ForeignNext(), though? And > BeginForeignScan() too, while we're at it. With these patches, BeginForeignModify() and BeginDirectModify() are both called during execution, before the first IterateForeignScan/IterateDirectModify call. The documentation for BeginForeignModify() needs to be updated, it still claims that it's run at executor startup, but that's not true after these patches. So that needs to be updated. I think that's a good thing, because it means that BeginForeignModify() and BeginDirectModify() are called at the same stage, from the FDW's point of view. Even though BeginDirectModify() is called from ForeignNext(), and BeginForeignModify() from ExecModifyTable(), that difference isn't visible to the FDW; both are after executor startup but before the first Iterate call. - Heikki
Thanks for the review. On Wed, Nov 11, 2020 at 5:55 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 10/11/2020 17:32, Heikki Linnakangas wrote: > > On 10/11/2020 13:12, Amit Langote wrote: > >> On second thought, it seems A would amount to merely a cosmetic > >> adjustment of the API, nothing more. B seems to get the job done for > >> me and also doesn't unnecessarily break compatibility, so I've updated > >> 0001 to implement B. Please give it a look. > > > > Looks good at a quick glance. It is a small API break that > > BeginDirectModify() is now called during execution, not at executor > > startup, but I don't think that's going to break FDWs in practice. One > > could argue, though, that if we're going to change the API, we should do > > it more loudly. So changing the arguments might be a good thing. > > > > The BeginDirectModify() and BeginForeignModify() interfaces are > > inconsistent, but that's not this patch's fault. I wonder if we could > > move the call to BeginForeignModify() also to ForeignNext(), though? And > > BeginForeignScan() too, while we're at it. > > With these patches, BeginForeignModify() and BeginDirectModify() are > both called during execution, before the first > IterateForeignScan/IterateDirectModify call. The documentation for > BeginForeignModify() needs to be updated, it still claims that it's run > at executor startup, but that's not true after these patches. So that > needs to be updated. Good point, I've updated the patch to note that. > I think that's a good thing, because it means that BeginForeignModify() > and BeginDirectModify() are called at the same stage, from the FDW's > point of view. Even though BeginDirectModify() is called from > ForeignNext(), and BeginForeignModify() from ExecModifyTable(), that > difference isn't visible to the FDW; both are after executor startup but > before the first Iterate call. Right. -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
I'm still a bit confused and unhappy about the initialization of ResultRelInfos and the various fields in them. We've made progress in the previous patches, but it's still a bit messy. > /* > * If transition tuples will be captured, initialize a map to convert > * child tuples into the format of the table mentioned in the query > * (root relation), because the transition tuple store can only store > * tuples in the root table format. However for INSERT, the map is > * only initialized for a given partition when the partition itself is > * first initialized by ExecFindPartition. Also, this map is also > * needed if an UPDATE ends up having to move tuples across > * partitions, because in that case the child tuple to be moved first > * needs to be converted into the root table's format. In that case, > * we use GetChildToRootMap() to either create one from scratch if > * we didn't already create it here. > * > * Note: We cannot always initialize this map lazily, that is, use > * GetChildToRootMap(), because AfterTriggerSaveEvent(), which needs > * the map, doesn't have access to the "target" relation that is > * needed to create the map. > */ > if (mtstate->mt_transition_capture && operation != CMD_INSERT) > { > Relation relation = resultRelInfo->ri_RelationDesc; > Relation targetRel = mtstate->rootResultRelInfo->ri_RelationDesc; > > resultRelInfo->ri_ChildToRootMap = > convert_tuples_by_name(RelationGetDescr(relation), > RelationGetDescr(targetRel)); > /* First time creating the map for this result relation. */ > Assert(!resultRelInfo->ri_ChildToRootMapValid); > resultRelInfo->ri_ChildToRootMapValid = true; > } The comment explains that AfterTriggerSaveEvent() cannot use GetChildToRootMap(), because it doesn't have access to the root target relation. But there is a field for that in ResultRelInfo: ri_PartitionRoot. However, that's only set up when we do partition routing. How about we rename ri_PartitionRoot to e.g ri_RootTarget, and set it always, even for non-partition inheritance? We have that information available when we initialize the ResultRelInfo, so might as well. Some code currently checks ri_PartitionRoot, to determine if a tuple that's been inserted, has been routed. For example: > /* > * Also check the tuple against the partition constraint, if there is > * one; except that if we got here via tuple-routing, we don't need to > * if there's no BR trigger defined on the partition. > */ > if (resultRelationDesc->rd_rel->relispartition && > (resultRelInfo->ri_PartitionRoot == NULL || > (resultRelInfo->ri_TrigDesc && > resultRelInfo->ri_TrigDesc->trig_insert_before_row))) > ExecPartitionCheck(resultRelInfo, slot, estate, true); So if we set ri_PartitionRoot always, we would need some other way to determine if the tuple at hand has actually been routed or not. But wouldn't that be a good thing anyway? Isn't it possible that the same ResultRelInfo is sometimes used for routed tuples, and sometimes for tuples that have been inserted/updated "directly"? ExecLookupUpdateResultRelByOid() sets that field lazily, so I think it would be possible to get here with ri_PartitionRoot either set or not, depending on whether an earlier cross-partition update was routed to the table. The above check is just an optimization, to skip unnecessary ExecPartitionCheck() calls, but I think this snippet in ExecConstraints() needs to get this right: > /* > * If the tuple has been routed, it's been converted to the > * partition's rowtype, which might differ from the root > * table's. We must convert it back to the root table's > * rowtype so that val_desc shown error message matches the > * input tuple. > */ > if (resultRelInfo->ri_PartitionRoot) > { > AttrMap *map; > > rel = resultRelInfo->ri_PartitionRoot; > tupdesc = RelationGetDescr(rel); > /* a reverse map */ > map = build_attrmap_by_name_if_req(orig_tupdesc, > tupdesc); > > /* > * Partition-specific slot's tupdesc can't be changed, so > * allocate a new one. > */ > if (map != NULL) > slot = execute_attr_map_slot(map, slot, > MakeTupleTableSlot(tupdesc, &TTSOpsVirtual)); > } Is that an existing bug, or am I missing? - Heikki
On Wed, Nov 11, 2020 at 10:14 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > I'm still a bit confused and unhappy about the initialization of > ResultRelInfos and the various fields in them. We've made progress in > the previous patches, but it's still a bit messy. > > > /* > > * If transition tuples will be captured, initialize a map to convert > > * child tuples into the format of the table mentioned in the query > > * (root relation), because the transition tuple store can only store > > * tuples in the root table format. However for INSERT, the map is > > * only initialized for a given partition when the partition itself is > > * first initialized by ExecFindPartition. Also, this map is also > > * needed if an UPDATE ends up having to move tuples across > > * partitions, because in that case the child tuple to be moved first > > * needs to be converted into the root table's format. In that case, > > * we use GetChildToRootMap() to either create one from scratch if > > * we didn't already create it here. > > * > > * Note: We cannot always initialize this map lazily, that is, use > > * GetChildToRootMap(), because AfterTriggerSaveEvent(), which needs > > * the map, doesn't have access to the "target" relation that is > > * needed to create the map. > > */ > > if (mtstate->mt_transition_capture && operation != CMD_INSERT) > > { > > Relation relation = resultRelInfo->ri_RelationDesc; > > Relation targetRel = mtstate->rootResultRelInfo->ri_RelationDesc; > > > > resultRelInfo->ri_ChildToRootMap = > > convert_tuples_by_name(RelationGetDescr(relation), > > RelationGetDescr(targetRel)); > > /* First time creating the map for this result relation. */ > > Assert(!resultRelInfo->ri_ChildToRootMapValid); > > resultRelInfo->ri_ChildToRootMapValid = true; > > } > > The comment explains that AfterTriggerSaveEvent() cannot use > GetChildToRootMap(), because it doesn't have access to the root target > relation. But there is a field for that in ResultRelInfo: > ri_PartitionRoot. However, that's only set up when we do partition routing. > > How about we rename ri_PartitionRoot to e.g ri_RootTarget, and set it > always, even for non-partition inheritance? We have that information > available when we initialize the ResultRelInfo, so might as well. Yeah, I agree it's better to use ri_PartitionRoot more generally like you describe here. > Some code currently checks ri_PartitionRoot, to determine if a tuple > that's been inserted, has been routed. For example: > > > /* > > * Also check the tuple against the partition constraint, if there is > > * one; except that if we got here via tuple-routing, we don't need to > > * if there's no BR trigger defined on the partition. > > */ > > if (resultRelationDesc->rd_rel->relispartition && > > (resultRelInfo->ri_PartitionRoot == NULL || > > (resultRelInfo->ri_TrigDesc && > > resultRelInfo->ri_TrigDesc->trig_insert_before_row))) > > ExecPartitionCheck(resultRelInfo, slot, estate, true); > > So if we set ri_PartitionRoot always, we would need some other way to > determine if the tuple at hand has actually been routed or not. But > wouldn't that be a good thing anyway? Isn't it possible that the same > ResultRelInfo is sometimes used for routed tuples, and sometimes for > tuples that have been inserted/updated "directly"? > ExecLookupUpdateResultRelByOid() sets that field lazily, so I think it > would be possible to get here with ri_PartitionRoot either set or not, > depending on whether an earlier cross-partition update was routed to the > table. ri_RelationDesc != ri_PartitionRoot gives whether the result relation is the original target relation of the query or not, so checking that should be enough here. > The above check is just an optimization, to skip unnecessary > ExecPartitionCheck() calls, but I think this snippet in > ExecConstraints() needs to get this right: > > > /* > > * If the tuple has been routed, it's been converted to the > > * partition's rowtype, which might differ from the root > > * table's. We must convert it back to the root table's > > * rowtype so that val_desc shown error message matches the > > * input tuple. > > */ > > if (resultRelInfo->ri_PartitionRoot) > > { > > AttrMap *map; > > > > rel = resultRelInfo->ri_PartitionRoot; > > tupdesc = RelationGetDescr(rel); > > /* a reverse map */ > > map = build_attrmap_by_name_if_req(orig_tupdesc, > > tupdesc); > > > > /* > > * Partition-specific slot's tupdesc can't be changed, so > > * allocate a new one. > > */ > > if (map != NULL) > > slot = execute_attr_map_slot(map, slot, > > MakeTupleTableSlot(tupdesc,&TTSOpsVirtual)); > > } > > Is that an existing bug, or am I missing? What it's doing is converting a routed tuple in the partition's tuple format back into the original target relation's format before showing the tuple in the error message. Note that we do this reverse conversion only for tuple routing target relations, not all child result relations, so in that sense it's a bit inconsistent. Maybe we don't need to be too pedantic about showing the exact same tuple as the user inserted (that is, one matching the "root" table's column order), so it seems okay to just remove these reverse-conversion blocks that are repeated in a number of places that show an error message after failing a constraint check. Attached new 0002 which does these adjustments. I went with ri_RootTargetDesc to go along with ri_RelationDesc. Also, I have updated the original 0002 (now 0003) to make GetChildToRootMap() use ri_RootTargetDesc instead of ModifyTableState.rootResultRelInfo.ri_RelationDesc, so that even AfterTriggerSaveEvent() can now use that function. This allows us to avoid having to initialize ri_ChildToRootMap anywhere but inside GetChildRootMap(), with that long comment defending doing so. :-) -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
On Thu, Nov 12, 2020 at 5:04 PM Amit Langote <amitlangote09@gmail.com> wrote: > Attached new 0002 which does these adjustments. I went with > ri_RootTargetDesc to go along with ri_RelationDesc. > > Also, I have updated the original 0002 (now 0003) to make > GetChildToRootMap() use ri_RootTargetDesc instead of > ModifyTableState.rootResultRelInfo.ri_RelationDesc, so that even > AfterTriggerSaveEvent() can now use that function. This allows us to > avoid having to initialize ri_ChildToRootMap anywhere but inside > GetChildRootMap(), with that long comment defending doing so. :-) These needed to be rebased due to recent copy.c upheavals. Attached. -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
On Mon, Dec 7, 2020 at 3:53 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Thu, Nov 12, 2020 at 5:04 PM Amit Langote <amitlangote09@gmail.com> wrote: > > Attached new 0002 which does these adjustments. I went with > > ri_RootTargetDesc to go along with ri_RelationDesc. > > > > Also, I have updated the original 0002 (now 0003) to make > > GetChildToRootMap() use ri_RootTargetDesc instead of > > ModifyTableState.rootResultRelInfo.ri_RelationDesc, so that even > > AfterTriggerSaveEvent() can now use that function. This allows us to > > avoid having to initialize ri_ChildToRootMap anywhere but inside > > GetChildRootMap(), with that long comment defending doing so. :-) > > These needed to be rebased due to recent copy.c upheavals. Attached. Needed to be rebased again. -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
On Tue, Dec 22, 2020 at 5:16 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Mon, Dec 7, 2020 at 3:53 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Thu, Nov 12, 2020 at 5:04 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > Attached new 0002 which does these adjustments. I went with > > > ri_RootTargetDesc to go along with ri_RelationDesc. > > > > > > Also, I have updated the original 0002 (now 0003) to make > > > GetChildToRootMap() use ri_RootTargetDesc instead of > > > ModifyTableState.rootResultRelInfo.ri_RelationDesc, so that even > > > AfterTriggerSaveEvent() can now use that function. This allows us to > > > avoid having to initialize ri_ChildToRootMap anywhere but inside > > > GetChildRootMap(), with that long comment defending doing so. :-) > > > > These needed to be rebased due to recent copy.c upheavals. Attached. > > Needed to be rebased again. And again, this time over the recent batch insert API related patches. -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
On Mon, Jan 25, 2021 at 2:23 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Tue, Dec 22, 2020 at 5:16 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Mon, Dec 7, 2020 at 3:53 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > On Thu, Nov 12, 2020 at 5:04 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > > Attached new 0002 which does these adjustments. I went with > > > > ri_RootTargetDesc to go along with ri_RelationDesc. > > > > > > > > Also, I have updated the original 0002 (now 0003) to make > > > > GetChildToRootMap() use ri_RootTargetDesc instead of > > > > ModifyTableState.rootResultRelInfo.ri_RelationDesc, so that even > > > > AfterTriggerSaveEvent() can now use that function. This allows us to > > > > avoid having to initialize ri_ChildToRootMap anywhere but inside > > > > GetChildRootMap(), with that long comment defending doing so. :-) > > > > > > These needed to be rebased due to recent copy.c upheavals. Attached. > > > > Needed to be rebased again. > > And again, this time over the recent batch insert API related patches. Another rebase. I've dropped what was patch 0001 in the previous set, because I think it has been rendered unnecessary due to recently committed changes. However, the rebase led to a couple of additional regression test output changes that I think are harmless. The changes are caused by the fact that ri_RootResultRelInfo now gets initialized in *all* child result relations, not just those that participate in tuple routing. -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
Amit Langote <amitlangote09@gmail.com> writes: > [ v14-0002-Initialize-result-relation-information-lazily.patch ] Needs YA rebase over 86dc90056. regards, tom lane
On Thu, Apr 1, 2021 at 3:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Langote <amitlangote09@gmail.com> writes: > > [ v14-0002-Initialize-result-relation-information-lazily.patch ] > Needs YA rebase over 86dc90056. Done. I will post the updated results for -Mprepared benchmarks I did in the other thread shortly. -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
On Thu, Apr 1, 2021 at 10:12 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Thu, Apr 1, 2021 at 3:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Amit Langote <amitlangote09@gmail.com> writes: > > > [ v14-0002-Initialize-result-relation-information-lazily.patch ] > > Needs YA rebase over 86dc90056. > > Done. I will post the updated results for -Mprepared benchmarks I did > in the other thread shortly. Test details: pgbench -n -T60 -Mprepared -f nojoin.sql nojoin.sql: \set a random(1, 1000000) update test_table t set b = :a where a = :a; * test_table has 40 columns and partitions as shown below * plan_cache_mode = force_generic_plan Results: nparts master patched 64 6262 17118 128 3449 12082 256 1722 7643 1024 359 2099 * tps figures shown are the median of 3 runs. So, drastic speedup can be seen by even just not creating ResultRelInfos for child relations that are not updated, as the patch does. I haven't yet included any changes for AcquireExecutorLocks() and ExecCheckRTPerms() bottlenecks that still remain and cause the drop in tps as partition count increases. -- Amit Langote EDB: http://www.enterprisedb.com
Amit Langote <amitlangote09@gmail.com> writes: > On Thu, Apr 1, 2021 at 3:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Amit Langote <amitlangote09@gmail.com> writes: > [ v14-0002-Initialize-result-relation-information-lazily.patch ] >> Needs YA rebase over 86dc90056. > Done. I spent some time looking this over. There are bits of it we can adopt without too much trouble, but I'm afraid that 0001 (delay FDW BeginDirectModify until the first actual update) is a nonstarter, which makes the main idea of delaying ExecInitResultRelation unworkable. My fear about 0001 is that it will destroy any hope of direct updates on different remote partitions executing with consistent semantics (i.e. compatible snapshots), because some row updates triggered by the local query may have already happened before a given partition gets to start its remote query. Maybe we can work around that, but I do not want to commit a major restructuring that assumes we can dodge this problem when we don't yet even have a fix for cross-partition updates that does rely on the assumption of synchronous startup. In some desultory performance testing here, it seemed like a significant part of the cost is ExecOpenIndices, and I don't see a reason offhand why we could not delay/skip that. I also concur with delaying construction of ri_ChildToRootMap and the partition_tuple_routing data structures, since many queries will never need those at all. > * PartitionTupleRouting.subplan_resultrel_htab is removed in favor > of using ModifyTableState.mt_resultOidHash to look up an UPDATE > result relation by OID. Hmm, that sounds promising too, though I didn't look at the details. Anyway, I think the way to proceed for now is to grab the low-hanging fruit of things that clearly won't change any semantics. But tail end of the dev cycle is no time to be making really fundamental changes in how FDW direct modify works. regards, tom lane
On Sun, Apr 4, 2021 at 10:20 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Langote <amitlangote09@gmail.com> writes: > > On Thu, Apr 1, 2021 at 3:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Amit Langote <amitlangote09@gmail.com> writes: > > [ v14-0002-Initialize-result-relation-information-lazily.patch ] > >> Needs YA rebase over 86dc90056. > > > Done. > > I spent some time looking this over. Thanks. > There are bits of it we can > adopt without too much trouble, but I'm afraid that 0001 (delay > FDW BeginDirectModify until the first actual update) is a nonstarter, > which makes the main idea of delaying ExecInitResultRelation unworkable. > > My fear about 0001 is that it will destroy any hope of direct updates > on different remote partitions executing with consistent semantics > (i.e. compatible snapshots), because some row updates triggered by the > local query may have already happened before a given partition gets to > start its remote query. Maybe we can work around that, but I do not > want to commit a major restructuring that assumes we can dodge this > problem when we don't yet even have a fix for cross-partition updates > that does rely on the assumption of synchronous startup. Hmm, okay, I can understand the concern. > In some desultory performance testing here, it seemed like a > significant part of the cost is ExecOpenIndices, and I don't see > a reason offhand why we could not delay/skip that. I also concur > with delaying construction of ri_ChildToRootMap and the > partition_tuple_routing data structures, since many queries will > never need those at all. As I mentioned in [1], creating ri_projectNew can be expensive too, especially as column count (and partition count for the generic plan case) grows. I think we should have an static inline initialize-on-first-access accessor function for that field too. Actually, I remember considering having such accessor functions (all static inline) for ri_WithCheckOptionExprs, ri_projectReturning, ri_onConflictArbiterIndexes, and ri_onConflict (needed by ON CONFLICT UPDATE) as well, prompted by Heikki's comments earlier in the discussion. I also remember, before even writing this patch, not liking that WCO and RETURNING expressions are initialized in their own separate loops, rather than being part of the earlier loop that says: /* * Do additional per-result-relation initialization. */ for (i = 0; i < nrels; i++) { I guess ri_RowIdAttNo initialization can go into the same loop. > > * PartitionTupleRouting.subplan_resultrel_htab is removed in favor > > of using ModifyTableState.mt_resultOidHash to look up an UPDATE > > result relation by OID. > > Hmm, that sounds promising too, though I didn't look at the details. > > Anyway, I think the way to proceed for now is to grab the low-hanging > fruit of things that clearly won't change any semantics. But tail end > of the dev cycle is no time to be making really fundamental changes > in how FDW direct modify works. I agree. -- Amit Langote EDB: http://www.enterprisedb.com [1] https://www.postgresql.org/message-id/CA+HiwqHLUNhMxy46Mrb04VJpN=HUdm9bD7xdZ6f5h2o4imX79g@mail.gmail.com
Amit Langote <amitlangote09@gmail.com> writes: > On Sun, Apr 4, 2021 at 10:20 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> In some desultory performance testing here, it seemed like a >> significant part of the cost is ExecOpenIndices, and I don't see >> a reason offhand why we could not delay/skip that. I also concur >> with delaying construction of ri_ChildToRootMap and the >> partition_tuple_routing data structures, since many queries will >> never need those at all. > As I mentioned in [1], creating ri_projectNew can be expensive too, > especially as column count (and partition count for the generic plan > case) grows. I think we should have an static inline > initialize-on-first-access accessor function for that field too. > Actually, I remember considering having such accessor functions (all > static inline) for ri_WithCheckOptionExprs, ri_projectReturning, > ri_onConflictArbiterIndexes, and ri_onConflict (needed by ON CONFLICT > UPDATE) as well, prompted by Heikki's comments earlier in the > discussion. I also remember, before even writing this patch, not > liking that WCO and RETURNING expressions are initialized in their own > separate loops, rather than being part of the earlier loop that says: Sure, we might as well try to improve the cosmetics here. >> Anyway, I think the way to proceed for now is to grab the low-hanging >> fruit of things that clearly won't change any semantics. But tail end >> of the dev cycle is no time to be making really fundamental changes >> in how FDW direct modify works. > I agree. OK. Do you want to pull out the bits of the patch that we can still do without postponing BeginDirectModify? Another thing we could consider, perhaps, is keeping the behavior the same for foreign tables but postponing init of local ones. To avoid opening the relations to figure out which kind they are, we'd have to rely on the RTE copies of relkind, which is a bit worrisome --- I'm not certain that those are guaranteed to be up-to-date --- but it's probably okay since there is no way to convert a regular table to foreign or vice versa. Anyway, that idea seems fairly messy so I'm inclined to just pursue the lower-hanging fruit for now. regards, tom lane
On Mon, Apr 5, 2021 at 1:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Langote <amitlangote09@gmail.com> writes: > > On Sun, Apr 4, 2021 at 10:20 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> In some desultory performance testing here, it seemed like a > >> significant part of the cost is ExecOpenIndices, and I don't see > >> a reason offhand why we could not delay/skip that. I also concur > >> with delaying construction of ri_ChildToRootMap and the > >> partition_tuple_routing data structures, since many queries will > >> never need those at all. > > > As I mentioned in [1], creating ri_projectNew can be expensive too, > > especially as column count (and partition count for the generic plan > > case) grows. I think we should have an static inline > > initialize-on-first-access accessor function for that field too. > > > Actually, I remember considering having such accessor functions (all > > static inline) for ri_WithCheckOptionExprs, ri_projectReturning, > > ri_onConflictArbiterIndexes, and ri_onConflict (needed by ON CONFLICT > > UPDATE) as well, prompted by Heikki's comments earlier in the > > discussion. I also remember, before even writing this patch, not > > liking that WCO and RETURNING expressions are initialized in their own > > separate loops, rather than being part of the earlier loop that says: > > Sure, we might as well try to improve the cosmetics here. > > >> Anyway, I think the way to proceed for now is to grab the low-hanging > >> fruit of things that clearly won't change any semantics. But tail end > >> of the dev cycle is no time to be making really fundamental changes > >> in how FDW direct modify works. > > > I agree. > > OK. Do you want to pull out the bits of the patch that we can still > do without postponing BeginDirectModify? I ended up with the attached, whereby ExecInitResultRelation() is now performed for all relations before calling ExecInitNode() on the subplan. As mentioned, I moved other per-result-rel initializations into the same loop that does ExecInitResultRelation(), while moving code related to some initializations into initialize-on-first-access accessor functions for the concerned fields. I chose to do that for ri_WIthCheckOptionExprs, ri_projectReturning, and ri_projectNew. ExecInitNode() is called on the subplan (to set outerPlanState(mtstate) that is) after all of the per-result-rel initializations are done. One of the initializations is calling BeginForeignModify() for non-direct modifications, an API to which we currently pass mtstate. Moving that to before setting outerPlanState(mtstate) so as to be in the same loop as other initializations had me worried just a little bit given a modification I had to perform in postgresBeginForeignModify(): @@ -1879,7 +1879,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate, rte, resultRelInfo, mtstate->operation, - outerPlanState(mtstate)->plan, + outerPlan(mtstate->ps.plan), query, target_attrs, values_end_len, Though I think that this is harmless, because I'd think that the implementers of this API shouldn't really rely too strongly on assuming that outerPlanState(mtstate) is valid when it is called, if postgres_fdw's implementation is any indication. Another slightly ugly bit is the dependence of direct modify API on ri_projectReturning being set even if it doesn't care for anything else in the ResultRelInfo. So in ExecInitModifyTable() ri_projectReturning initialization is not skipped for directly-modified foreign result relations. Notes on regression test changes: * Initializing WCO quals during execution instead of during ExecInitNode() of ModifyTable() causes a couple of regression test changes in updatable_view.out that were a bit unexpected for me -- Subplans that are referenced in WCO quals are no longer shown in the plain EXPLAIN output. Even though that's a user-visible change, maybe we can live with that? * ri_RootResultRelInfo in *all* child relations instead of just in tuple-routing result relations has caused changes to inherit.out and privileges.out. I think that's basically down to ExecConstraints() et al doing one thing for child relations in which ri_RootResultRelInfo is set and another for those in which it is not. Now it's set in *all* child relations, so it always does the former thing. I remember having checked that those changes are only cosmetic when I first encountered them. * Moving PartitionTupleRouting initialization to be done lazily for cross-partition update cases causes changes to update.out. They have to do with the fact that the violations of the actual target table's partition constraint are now shown as such, instead of reporting them as occurring on one of the leaf partitions. Again, only cosmetic. > Another thing we could consider, perhaps, is keeping the behavior > the same for foreign tables but postponing init of local ones. > To avoid opening the relations to figure out which kind they are, > we'd have to rely on the RTE copies of relkind, which is a bit > worrisome --- I'm not certain that those are guaranteed to be > up-to-date --- but it's probably okay since there is no way to > convert a regular table to foreign or vice versa. Anyway, that > idea seems fairly messy so I'm inclined to just pursue the > lower-hanging fruit for now. It would be nice to try that idea out, but I tend to agree with the last part. Also, I'm fairly happy with the kind of performance improvement I see even with the lower-hanging fruit patch for my earlier earlier shared benchmark that tests the performance of generic plan execution: HEAD (especially with 86dc90056 now in): nparts 10cols 20cols 40cols 64 6926 6394 6253 128 3758 3501 3482 256 1938 1822 1776 1024 406 371 406 Patched: 64 13147 12554 14787 128 7850 9788 9631 256 4472 5599 5638 1024 1218 1503 1309 I also tried with a version where the new tuple projections are built in ExecInitModifyTable() as opposed to lazily: 64 10937 9969 8535 128 6586 5903 4887 256 3613 3118 2654 1024 884 749 652 This tells us that delaying initializing new tuple projection for updates can have a sizable speedup and better scalability. -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
Amit Langote <amitlangote09@gmail.com> writes: > On Mon, Apr 5, 2021 at 1:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> OK. Do you want to pull out the bits of the patch that we can still >> do without postponing BeginDirectModify? > I ended up with the attached, whereby ExecInitResultRelation() is now > performed for all relations before calling ExecInitNode() on the > subplan. As mentioned, I moved other per-result-rel initializations > into the same loop that does ExecInitResultRelation(), while moving > code related to some initializations into initialize-on-first-access > accessor functions for the concerned fields. I chose to do that for > ri_WIthCheckOptionExprs, ri_projectReturning, and ri_projectNew. I pushed the parts of this that I thought were safe and productive. The business about moving the subplan tree initialization to after calling FDWs' BeginForeignModify functions seems to me to be a nonstarter. Existing FDWs are going to expect their scan initializations to have been done first. I'm surprised that postgres_fdw seemed to need only a one-line fix; it could have been far worse. The amount of trouble that could cause is absolutely not worth it to remove one loop over the result relations. I also could not get excited about postponing initialization of RETURNING or WITH CHECK OPTIONS expressions. I grant that that can be helpful when those features are used, but I doubt that RETURNING is used that heavily, and WITH CHECK OPTIONS is surely next door to nonexistent in performance-critical queries. If the feature isn't used, the cost of the existing code is about zero. So I couldn't see that it was worth the amount of code thrashing and risk of new bugs involved. The bit you noted about EXPLAIN missing a subplan is pretty scary in this connection; I'm not at all sure that that's just cosmetic. (Having said that, I'm wondering if there are bugs in these cases for cross-partition updates that target a previously-not-used partition. So we might have things to fix anyway.) Anyway, looking at the test case you posted at the very top of this thread, I was getting this with HEAD on Friday: nparts TPS 0 12152 10 8672 100 2753 1000 314 and after the two patches I just pushed, it looks like: 0 12105 10 9928 100 5433 1000 938 So while there's certainly work left to do, that's not bad for some low-hanging-fruit grabbing. regards, tom lane
Hi, On 2021-04-06 19:24:11 -0400, Tom Lane wrote: > I also could not get excited about postponing initialization of RETURNING > or WITH CHECK OPTIONS expressions. I grant that that can be helpful > when those features are used, but I doubt that RETURNING is used that > heavily, and WITH CHECK OPTIONS is surely next door to nonexistent > in performance-critical queries. FWIW, there's a number of ORMs etc that use it on every insert (there's not really a better way to get the serial when you also want to do pipelining). > nparts TPS > 0 12152 > 10 8672 > 100 2753 > 1000 314 > > and after the two patches I just pushed, it looks like: > > 0 12105 > 10 9928 > 100 5433 > 1000 938 > > So while there's certainly work left to do, that's not bad for > some low-hanging-fruit grabbing. Nice. 3x at the upper end is pretty good. Greetings, Andres Freund
On Wed, Apr 7, 2021 at 8:24 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Langote <amitlangote09@gmail.com> writes: > > On Mon, Apr 5, 2021 at 1:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> OK. Do you want to pull out the bits of the patch that we can still > >> do without postponing BeginDirectModify? > > > I ended up with the attached, whereby ExecInitResultRelation() is now > > performed for all relations before calling ExecInitNode() on the > > subplan. As mentioned, I moved other per-result-rel initializations > > into the same loop that does ExecInitResultRelation(), while moving > > code related to some initializations into initialize-on-first-access > > accessor functions for the concerned fields. I chose to do that for > > ri_WIthCheckOptionExprs, ri_projectReturning, and ri_projectNew. > > I pushed the parts of this that I thought were safe and productive. Thank you. +/* + * ExecInitInsertProjection + * Do one-time initialization of projection data for INSERT tuples. + * + * INSERT queries may need a projection to filter out junk attrs in the tlist. + * + * This is "one-time" for any given result rel, but we might touch + * more than one result rel in the course of a partitioned INSERT. I don't think we need this last bit for INSERT, because the result rels for leaf partitions will never have to go through ExecInitInsertProjection(). Leaf partitions are never directly fed tuples that ExecModifyTable() extracts out of the subplan, because those tuples will have gone through the root target table's projection before being passed to tuple routing. So, if INSERTs will ever need a projection, only the partitioned table being inserted into will need to have one built for. Also, I think we should update the commentary around ri_projectNew a bit to make it clear that noplace beside ExecGet{Insert|Update}Tuple should be touching it and the associated slots. + * This is "one-time" for any given result rel, but we might touch more than + * one result rel in the course of a partitioned UPDATE, and each one needs + * its own projection due to possible column order variation. Minor quibble, but should we write it as "...in the course of an inherited UPDATE"? Attached patch contains these changes. > The business about moving the subplan tree initialization to after > calling FDWs' BeginForeignModify functions seems to me to be a > nonstarter. Existing FDWs are going to expect their scan initializations > to have been done first. I'm surprised that postgres_fdw seemed to > need only a one-line fix; it could have been far worse. The amount of > trouble that could cause is absolutely not worth it to remove one loop > over the result relations. Okay, that sounds fair. After all, we write this about 'mtstate' in the description of BeginForeignModify(), which I had failed to notice: "mtstate is the overall state of the ModifyTable plan node being executed; global data about the plan and execution state is available via this structure." > I also could not get excited about postponing initialization of RETURNING > or WITH CHECK OPTIONS expressions. I grant that that can be helpful > when those features are used, but I doubt that RETURNING is used that > heavily, and WITH CHECK OPTIONS is surely next door to nonexistent > in performance-critical queries. If the feature isn't used, the cost > of the existing code is about zero. So I couldn't see that it was worth > the amount of code thrashing and risk of new bugs involved. Okay. > The bit you > noted about EXPLAIN missing a subplan is pretty scary in this connection; > I'm not at all sure that that's just cosmetic. Yeah, this and... > (Having said that, I'm wondering if there are bugs in these cases for > cross-partition updates that target a previously-not-used partition. > So we might have things to fix anyway.) ...this would need to be looked at a bit more closely, which I'll try to do sometime later this week. > Anyway, looking at the test case you posted at the very top of this > thread, I was getting this with HEAD on Friday: > > nparts TPS > 0 12152 > 10 8672 > 100 2753 > 1000 314 > > and after the two patches I just pushed, it looks like: > > 0 12105 > 10 9928 > 100 5433 > 1000 938 > > So while there's certainly work left to do, that's not bad for > some low-hanging-fruit grabbing. Yes, certainly. I reran my usual benchmark and got the following numbers, this time comparing v13.2 against the latest HEAD: nparts 10cols 20cols 40cols v13.2 64 3231 2747 2217 128 1528 1269 1121 256 709 652 491 1024 96 78 67 v14dev HEAD 64 14835 14360 14563 128 9469 9601 9490 256 5523 5383 5268 1024 1482 1415 1366 Clearly, we've made some very good progress here. Thanks. -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
Amit Langote <amitlangote09@gmail.com> writes: > Also, I think we should update the commentary around ri_projectNew a > bit to make it clear that noplace beside ExecGet{Insert|Update}Tuple > should be touching it and the associated slots. Hm. I pushed your comment fixes in nodeModifyTable.c, but not this change, because it seemed to be more verbose and not really an improvement. Why are these fields any more hands-off than any others? Besides which, there certainly is other code touching ri_oldTupleSlot. Anyway, I've marked the CF entry closed, because I think this is about as far as we can get for v14. I'm not averse to revisiting the RETURNING and WITH CHECK OPTIONS issues later, but it looks to me like that needs more study. > I reran my usual benchmark and got the following numbers, this time > comparing v13.2 against the latest HEAD: > nparts 10cols 20cols 40cols > v13.2 > 64 3231 2747 2217 > 128 1528 1269 1121 > 256 709 652 491 > 1024 96 78 67 > v14dev HEAD > 64 14835 14360 14563 > 128 9469 9601 9490 > 256 5523 5383 5268 > 1024 1482 1415 1366 > Clearly, we've made some very good progress here. Thanks. Indeed, that's a pretty impressive comparison. regards, tom lane
On Wed, Apr 7, 2021 at 12:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > v13.2 > > 64 3231 2747 2217 > > 128 1528 1269 1121 > > 256 709 652 491 > > 1024 96 78 67 > > > v14dev HEAD > > 64 14835 14360 14563 > > 128 9469 9601 9490 > > 256 5523 5383 5268 > > 1024 1482 1415 1366 > > > Clearly, we've made some very good progress here. Thanks. > > Indeed, that's a pretty impressive comparison. +1. That looks like a big improvement. In a vacuum, you'd hope that partitioning a table would make things faster rather than slower, when only one partition is implicated. Or at least that the speed would stay about the same. And, while this is a lot better, we're clearly not there yet. So I hope that, in future releases, we can continue to find ways to whittle down the overhead. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Apr 7, 2021 at 12:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Indeed, that's a pretty impressive comparison. > +1. That looks like a big improvement. > In a vacuum, you'd hope that partitioning a table would make things > faster rather than slower, when only one partition is implicated. Or > at least that the speed would stay about the same. And, while this is > a lot better, we're clearly not there yet. So I hope that, in future > releases, we can continue to find ways to whittle down the overhead. Note that this test case includes plan_cache_mode = force_generic_plan, so it's deliberately kneecapping our ability to tell that "only one partition is implicated". I think things would often be better in production cases. No argument that there's not work left to do, though. regards, tom lane
On Thu, Apr 8, 2021 at 1:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Langote <amitlangote09@gmail.com> writes: > > Also, I think we should update the commentary around ri_projectNew a > > bit to make it clear that noplace beside ExecGet{Insert|Update}Tuple > > should be touching it and the associated slots. > > Hm. I pushed your comment fixes in nodeModifyTable.c, but not this > change, because it seemed to be more verbose and not really an > improvement. Why are these fields any more hands-off than any others? > Besides which, there certainly is other code touching ri_oldTupleSlot. Oops, that's right. > Anyway, I've marked the CF entry closed, because I think this is about > as far as we can get for v14. I'm not averse to revisiting the > RETURNING and WITH CHECK OPTIONS issues later, but it looks to me like > that needs more study. Sure, I will look into that. -- Amit Langote EDB: http://www.enterprisedb.com
On Thu, Apr 8, 2021 at 3:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Wed, Apr 7, 2021 at 12:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Indeed, that's a pretty impressive comparison. > > > +1. That looks like a big improvement. > > > In a vacuum, you'd hope that partitioning a table would make things > > faster rather than slower, when only one partition is implicated. Or > > at least that the speed would stay about the same. And, while this is > > a lot better, we're clearly not there yet. So I hope that, in future > > releases, we can continue to find ways to whittle down the overhead. > > Note that this test case includes plan_cache_mode = force_generic_plan, > so it's deliberately kneecapping our ability to tell that "only one > partition is implicated". For the record, here are the numbers for plan_cache_mode = auto. (Actually, plancache.c always goes with custom planning for partitioned tables.) v13.2 nparts 10cols 20cols 40cols 64 13391 12140 10958 128 13436 12297 10643 256 12564 12294 10355 1024 11450 10600 9020 v14dev HEAD 64 14925 14648 13361 128 14379 14333 13138 256 14478 14246 13316 1024 12744 12621 11579 There's 10-20% improvement in this case too for various partition counts, which really has more to do with 86dc90056 than the work done here. -- Amit Langote EDB: http://www.enterprisedb.com
On Thu, 8 Apr 2021 at 15:32, Amit Langote <amitlangote09@gmail.com> wrote: > There's 10-20% improvement in this case too for various partition > counts, which really has more to do with 86dc90056 than the work done > here. I'm not sure of the exact query you're running, but I imagine the reason that it wasn't that slow with custom plans was down to 428b260f87. So the remaining slowness for the generic plan case with large numbers of partitions in the plan vs custom plans plan-time pruning is a) locking run-time pruned partitions; and; b) permission checks during executor startup? Aside from the WCO and RETURNING stuff you mentioned, I mean. David
On Thu, Apr 8, 2021 at 1:54 PM David Rowley <dgrowleyml@gmail.com> wrote: > On Thu, 8 Apr 2021 at 15:32, Amit Langote <amitlangote09@gmail.com> wrote: > > There's 10-20% improvement in this case too for various partition > > counts, which really has more to do with 86dc90056 than the work done > > here. > > I'm not sure of the exact query you're running, The query is basically this: \set a random(1, 1000000) update test_table set b = :a where a = :a; > but I imagine the > reason that it wasn't that slow with custom plans was down to > 428b260f87. Right, 428b260f87 is certainly why we are seeing numbers this big at all. However, I was saying that 86dc90056 is what makes v14 HEAD run about 10-20% faster than *v13.2* in this benchmark. Note that inheritance_planner() in v13, which, although not as bad as it used to be in v11, is still more expensive than a single grouping_planner() call for a given query that we now get thanks to 86dc90056. > So the remaining slowness for the generic plan case with large numbers > of partitions in the plan vs custom plans plan-time pruning is a) > locking run-time pruned partitions; and; b) permission checks during > executor startup? Actually, we didn't move ahead with making the ResulRelInfos themselves lazily as I had proposed in the original patch, so ExecInitModifyTable() still builds ResultRelInfos for all partitions. Although we did move initializations of some fields of it out of ExecInitModifyTable() --- commits a1115fa0, c5b7ba4e, saving a decent amount of time spent there. We need to study closely whether initializing foreign partition's updates (direct or otherwise) lazily doesn't produce wrong semantics before we can do that and we need the ResultRelInfos to pass to those APIs. -- Amit Langote EDB: http://www.enterprisedb.com
On Wed, Apr 7, 2021 at 5:18 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Wed, Apr 7, 2021 at 8:24 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I also could not get excited about postponing initialization of RETURNING > > or WITH CHECK OPTIONS expressions. I grant that that can be helpful > > when those features are used, but I doubt that RETURNING is used that > > heavily, and WITH CHECK OPTIONS is surely next door to nonexistent > > in performance-critical queries. If the feature isn't used, the cost > > of the existing code is about zero. So I couldn't see that it was worth > > the amount of code thrashing and risk of new bugs involved. > > Okay. > > > The bit you > > noted about EXPLAIN missing a subplan is pretty scary in this connection; > > I'm not at all sure that that's just cosmetic. > > Yeah, this and... I looked into this and can't see why this isn't just cosmetic as far as ModifyTable is concerned. "EXPLAIN missing a subplan" here just means that ModifyTableState.PlanState.subPlan is not set. Besides ExplainNode(), only ExecReScan() looks at PlanState.subPlan, and that does not seem relevant to ModifyTable, because it doesn't support rescanning. I don't see any such problems with creating RETURNING projections on-demand either. > > (Having said that, I'm wondering if there are bugs in these cases for > > cross-partition updates that target a previously-not-used partition. > > So we might have things to fix anyway.) > > ...this would need to be looked at a bit more closely, which I'll try > to do sometime later this week. Given the above, I can't think of any undiscovered problems related to WCO and RETURNING expression states in the cases where cross-partition updates target partitions that need to be initialized by ExecInitPartitionInfo(). Here is the result for the test case in updatable_views.sql modified to use partitioning and cross-partition updates: CREATE TABLE base_tbl (a int) partition by range (a); CREATE TABLE base_tbl1 PARTITION OF base_tbl FOR VALUES FROM (1) TO (6); CREATE TABLE base_tbl2 PARTITION OF base_tbl FOR VALUES FROM (6) TO (11); CREATE TABLE base_tbl3 PARTITION OF base_tbl FOR VALUES FROM (11) TO (15); CREATE TABLE ref_tbl (a int PRIMARY KEY); INSERT INTO ref_tbl SELECT * FROM generate_series(1,10); CREATE VIEW rw_view1 AS SELECT * FROM base_tbl b WHERE EXISTS(SELECT 1 FROM ref_tbl r WHERE r.a = b.a) WITH CHECK OPTION; INSERT INTO rw_view1 VALUES (1); INSERT 0 1 INSERT INTO rw_view1 VALUES (11); ERROR: new row violates check option for view "rw_view1" DETAIL: Failing row contains (11). -- Both are cross-partition updates where the target relation is -- lazily initialized in ExecInitPartitionInfo(), along with the WCO -- qual ExprState UPDATE rw_view1 SET a = a + 5 WHERE a = 1; UPDATE 1 UPDATE rw_view1 SET a = a + 5 WHERE a = 6; ERROR: new row violates check option for view "rw_view1" DETAIL: Failing row contains (11). EXPLAIN (costs off) INSERT INTO rw_view1 VALUES (5); QUERY PLAN ---------------------- Insert on base_tbl b -> Result (2 rows) EXPLAIN (costs off) UPDATE rw_view1 SET a = a + 5 WHERE a = 1; QUERY PLAN -------------------------------------------------------- Update on base_tbl b Update on base_tbl1 b_1 -> Nested Loop -> Index Scan using ref_tbl_pkey on ref_tbl r Index Cond: (a = 1) -> Seq Scan on base_tbl1 b_1 Filter: (a = 1) (7 rows) EXPLAIN (costs off) UPDATE rw_view1 SET a = a + 5 WHERE a = 6; QUERY PLAN -------------------------------------------------------- Update on base_tbl b Update on base_tbl2 b_1 -> Nested Loop -> Index Scan using ref_tbl_pkey on ref_tbl r Index Cond: (a = 6) -> Seq Scan on base_tbl2 b_1 Filter: (a = 6) (7 rows) Patch attached. I tested the performance benefit of doing this by modifying the update query used in earlier benchmarks to have a RETURNING * clause, getting the following TPS numbers: -Mprepared (plan_cache_mode=force_generic_plan) nparts 10cols 20cols 40cols HEAD 64 10909 9067 7171 128 6903 5624 4161 256 3748 3056 2219 1024 953 738 427 Patched 64 13817 13395 12754 128 9271 9102 8279 256 5345 5207 5083 1024 1463 1443 1389 Also, I don't see much impact of checking if (node->returningLists) in the per-result-rel initialization loop in the common cases where there's no RETURNING. -- Amit Langote EDB: http://www.enterprisedb.com