Re: generic plans and "initial" pruning
От | Amit Langote |
---|---|
Тема | Re: generic plans and "initial" pruning |
Дата | |
Msg-id | CA+HiwqFOsJ21nmvoKtPkzxKfYzACajj-mAOoJ+-y_O6g+-v-aA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: generic plans and "initial" pruning (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: generic plans and "initial" pruning
Re: generic plans and "initial" pruning |
Список | pgsql-hackers |
On Thu, Jan 13, 2022 at 3:20 AM Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Jan 12, 2022 at 9:32 AM Amit Langote <amitlangote09@gmail.com> wrote: > > Or, maybe this won't be a concern if performing ExecutorStart() is > > made a part of CheckCachedPlan() somehow, which would then take locks > > on the relation as the PlanState tree is built capturing any plan > > invalidations, instead of AcquireExecutorLocks(). That does sound like > > an ambitious undertaking though. > > On the surface that would seem to involve abstraction violations, but > maybe that could be finessed somehow. The plancache shouldn't know too > much about what the executor is going to do with the plan, but it > could ask the executor to perform a step that has been designed for > use by the plancache. I guess the core problem here is how to pass > around information that is node-specific before we've stood up the > executor state tree. Maybe the executor could have a function that > does the pruning and returns some kind of array of results that can be > used both to decide what to lock and also what to consider as pruned > at the start of execution. (I'm hand-waving about the details because > I don't know.) The attached patch implements this idea. Sorry for the delay in getting this out and thanks to Robert for the off-list discussions on this. So the new executor "step" you mention is the function ExecutorPrep in the patch, which calls a recursive function ExecPrepNode on the plan tree's top node, much as ExecutorStart calls (via InitPlan) ExecInitNode to construct a PlanState tree for actual execution paralleling the plan tree. For now, ExecutorPrep() / ExecPrepNode() does mainly two things if and as it walks the plan tree: 1) Extract the RT indexes of RTE_RELATION entries and add them to a bitmapset in the result struct, 2) If the node contains a PartitionPruneInfo, perform its "initial pruning steps" and store the result of doing so in a per-plan-node node called PlanPrepOutput. The bitmapset and the array containing per-plan-node PlanPrepOutput nodes are returned in a node called ExecPrepOutput, which is the result of ExecutorPrep, to its calling module (say, plancache.c), which, after it's done using that information, must pass it forward to subsequent execution steps. That is done by passing it, via the module's callers, to CreateQueryDesc() which remembers the ExecPrepOutput in QueryDesc that is eventually passed to ExecutorStart(). A bunch of other details are mentioned in the patch's commit message, which I'm pasting below for anyone reading to spot any obvious flaws (no-go's) of this approach: Invent a new executor "prep" phase The new phase, implemented by execMain.c:ExecutorPrep() and its recursive underling execProcnode.c:ExecPrepNode(), takes a query's PlannedStmt and processes the plan tree contained in it to produce a ExecPrepOutput node as result. As the plan tree is walked, each node must add the RT index(es) of any relation(s) that it directly manipulates to a bitmapset member of ExecPrepOutput (for example, an IndexScan node must add the Scan's scanrelid). Also, each node may want to make a PlanPrepOutput node containing additional information that may be of interest to the calling module or to the later execution phases, if the node can provide one (for example, an Append node may perform initial pruning and add a set of "initially valid subplans" to the PlanPrepOutput). The PlanPrepOutput nodess of all the plan nodes are added to an array in the ExecPrepOutput, which is indexed using the individual nodes' plan_node_id; a NULL is stored in the array slots of nodes that don't have anything interesting to add to the PlanPrepOutput. The ExecPrepOutput thus produced is passed to CreateQueryDesc() and subsequently to ExecutorStart() via QueryDesc, which then makes it available to the executor routines via the query's EState. The main goal of adding this new phase is, for now, to allow cached cached generic plans containing scans of partitioned tables using Append/MergeAppend to be executed more efficiently by the prep phase doing any initial pruning, instead of deferring that to ExecutorStart(). That may allow AcquireExecutorLocks() on the plan to lock only only the minimal set of relations/partitions, that is those whose subplans survive the initial pruning. Implementation notes: * To allow initial pruning to be done as part of the pre-execution prep phase as opposed to as part of ExecutorStart(), this refactors ExecCreatePartitionPruneState() and ExecFindInitialMatchingSubPlans() to pass the information needed to do initial pruning directly as parameters instead of getting that from the EState and the PlanState of the parent Append/MergeAppend, both of which would not be available in ExecutorPrep(). Another, sort of non-essential-to-this- goal, refactoring this does is moving the partition pruning initialization stanzas in ExecInitAppend() and ExecInitMergeAppend() both of which contain the same cod into its own function ExecInitPartitionPruning(). * To pass the ExecPrepOutput(s) created by the plancache module's invocation of ExecutorPrep() to the callers of the module, which in turn would pass them down to ExecutorStart(), CachedPlan gets a new List field that stores those ExecPrepOutputs, containing one element for each PlannedStmt also contained in the CachedPlan. The new list is stored in a child context of the context containing the PlannedStmts, though unlike the latter, it is reset on every invocation of CheckCachedPlan(), which in turn calls ExecutorPrep() with a new set of bound Params. * AcquireExecutorLocks() is now made to loop over a bitmapset of RT indexes, those of relations returned in ExecPrepOutput, instead of over the whole range table. With initial pruning that is also done as part of ExcecutorPrep(), only relations from non-pruned nodes of the plan tree would get locked as a result of this new arrangement. * PlannedStmt gets a new field usesPrepExecPruning that indicates whether any of the nodes of the plan tree contain "initial" (or "pre-execution") pruning steps, which saves ExecutorPrep() the trouble of walking the plan tree only to find out whether that's the case. * PartitionPruneInfo nodes now explicitly stores whether the steps contained in any of the individual PartitionedRelPruneInfos embedded in it contain initial pruning steps (those that can be performed during ExecutorPrep) and execution pruning steps (those that can only be performed during ExecutorRun), as flags contains_initial_steps and contains_exec_steps, respectively. In fact, the aforementioned PlannedStmt field's value is a logical OR of the values of the former across all PartitionPruneInfo nodes embedded in the plan tree. * PlannedStmt also gets a bitmapset field to store the RT indexes of all relation RTEs referenced in the query that is populated when contructing the flat range table in setrefs.c, which effectively contains all the relations that the planner must have locked. In the case of a cached plan, AcquireExecutorLocks() must lock all of those relations, except those whose subnodes get pruned as result of ExecutorPrep(). * PlannedStmt gets yet another field numPlanNodes that records the highest plan_node_id assigned to any of the node contained in the tree, which serves as the size to use when allocating the PlanPrepOutput array. Maybe this should be more than one patch? Say: 0001 to add ExecutorPrep and the boilerplate, 0002 to teach plancache.c to use the new facility Thoughts? -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
В списке pgsql-hackers по дате отправления: