Re: generic plans and "initial" pruning

Поиск
Список
Период
Сортировка
От Amit Langote
Тема Re: generic plans and "initial" pruning
Дата
Msg-id CA+HiwqFOsJ21nmvoKtPkzxKfYzACajj-mAOoJ+-y_O6g+-v-aA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: generic plans and "initial" pruning  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: generic plans and "initial" pruning
Re: generic plans and "initial" pruning
Список pgsql-hackers
On Thu, Jan 13, 2022 at 3:20 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jan 12, 2022 at 9:32 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > Or, maybe this won't be a concern if performing ExecutorStart() is
> > made a part of CheckCachedPlan() somehow, which would then take locks
> > on the relation as the PlanState tree is built capturing any plan
> > invalidations, instead of AcquireExecutorLocks(). That does sound like
> > an ambitious undertaking though.
>
> On the surface that would seem to involve abstraction violations, but
> maybe that could be finessed somehow. The plancache shouldn't know too
> much about what the executor is going to do with the plan, but it
> could ask the executor to perform a step that has been designed for
> use by the plancache. I guess the core problem here is how to pass
> around information that is node-specific before we've stood up the
> executor state tree. Maybe the executor could have a function that
> does the pruning and returns some kind of array of results that can be
> used both to decide what to lock and also what to consider as pruned
> at the start of execution. (I'm hand-waving about the details because
> I don't know.)

The attached patch implements this idea.  Sorry for the delay in
getting this out and thanks to Robert for the off-list discussions on
this.

So the new executor "step" you mention is the function ExecutorPrep in
the patch, which calls a recursive function ExecPrepNode on the plan
tree's top node, much as ExecutorStart calls (via InitPlan)
ExecInitNode to construct a PlanState tree for actual execution
paralleling the plan tree.

For now, ExecutorPrep() / ExecPrepNode() does mainly two things if and
as it walks the plan tree: 1) Extract the RT indexes of RTE_RELATION
entries and add them to a bitmapset in the result struct, 2) If the
node contains a PartitionPruneInfo, perform its "initial pruning
steps" and store the result of doing so in a per-plan-node node called
PlanPrepOutput.  The bitmapset and the array containing per-plan-node
PlanPrepOutput nodes are returned in a node called ExecPrepOutput,
which is the result of ExecutorPrep, to its calling module (say,
plancache.c), which, after it's done using that information, must pass
it forward to subsequent execution steps.  That is done by passing it,
via the module's callers, to CreateQueryDesc() which remembers the
ExecPrepOutput in QueryDesc that is eventually passed to
ExecutorStart().

A bunch of other details are mentioned in the patch's commit message,
which I'm pasting below for anyone reading to spot any obvious flaws
(no-go's) of this approach:

    Invent a new executor "prep" phase

    The new phase, implemented by execMain.c:ExecutorPrep() and its
    recursive underling execProcnode.c:ExecPrepNode(), takes a query's
    PlannedStmt and processes the plan tree contained in it to produce
    a ExecPrepOutput node as result.

    As the plan tree is walked, each node must add the RT index(es) of
    any relation(s) that it directly manipulates to a bitmapset member of
    ExecPrepOutput (for example, an IndexScan node must add the Scan's
    scanrelid).  Also, each node may want to make a PlanPrepOutput node
    containing additional information that may be of interest to the
    calling module or to the later execution phases, if the node can
    provide one (for example, an Append node may perform initial pruning
    and add a set of "initially valid subplans" to the PlanPrepOutput).
    The PlanPrepOutput nodess of all the plan nodes are added to an array
    in the ExecPrepOutput, which is indexed using the individual nodes'
    plan_node_id; a NULL is stored in the array slots of nodes that
    don't have anything interesting to add to the PlanPrepOutput.

    The ExecPrepOutput thus produced is passed to CreateQueryDesc()
    and subsequently to ExecutorStart() via QueryDesc, which then makes
    it available to the executor routines via the query's EState.

    The main goal of adding this new phase is, for now, to allow cached
    cached generic plans containing scans of partitioned tables using
    Append/MergeAppend to be executed more efficiently by the prep phase
    doing any initial pruning, instead of deferring that to
    ExecutorStart().  That may allow AcquireExecutorLocks() on the plan
    to lock only only the minimal set of relations/partitions, that is
    those whose subplans survive the initial pruning.

    Implementation notes:

    * To allow initial pruning to be done as part of the pre-execution
    prep phase as opposed to as part of ExecutorStart(), this refactors
    ExecCreatePartitionPruneState() and ExecFindInitialMatchingSubPlans()
    to pass the information needed to do initial pruning directly as
    parameters instead of getting that from the EState and the PlanState
    of the parent Append/MergeAppend, both of which would not be
    available in ExecutorPrep().  Another, sort of non-essential-to-this-
    goal, refactoring this does is moving the partition pruning
    initialization stanzas in ExecInitAppend() and ExecInitMergeAppend()
    both of which contain the same cod into its own function
    ExecInitPartitionPruning().

    * To pass the ExecPrepOutput(s) created by the plancache module's
    invocation of ExecutorPrep() to the callers of the module, which in
    turn would pass them down to ExecutorStart(), CachedPlan gets a new
    List field that stores those ExecPrepOutputs, containing one element
    for each PlannedStmt also contained in the CachedPlan.  The new list
    is stored in a child context of the context containing the
    PlannedStmts, though unlike the latter, it is reset on every
    invocation of CheckCachedPlan(), which in turn calls ExecutorPrep()
    with a new set of bound Params.

    * AcquireExecutorLocks() is now made to loop over a bitmapset of RT
    indexes, those of relations returned in ExecPrepOutput, instead of
    over the whole range table.  With initial pruning that is also done
    as part of ExcecutorPrep(), only relations from non-pruned nodes of
    the plan tree would get locked as a result of this new arrangement.

    * PlannedStmt gets a new field usesPrepExecPruning that indicates
    whether any of the nodes of the plan tree contain "initial" (or
    "pre-execution") pruning steps, which saves ExecutorPrep() the
    trouble of walking the plan tree only to find out whether that's
    the case.

    * PartitionPruneInfo nodes now explicitly stores whether the steps
    contained in any of the individual PartitionedRelPruneInfos embedded
    in it contain initial pruning steps (those that can be performed
    during ExecutorPrep) and execution pruning steps (those that can only
    be performed during ExecutorRun), as flags contains_initial_steps and
    contains_exec_steps, respectively.  In fact, the aforementioned
    PlannedStmt field's value is a logical OR of the values of the former
    across all PartitionPruneInfo nodes embedded in the plan tree.

    * PlannedStmt also gets a bitmapset field to store the RT indexes of
    all relation RTEs referenced in the query that is populated when
    contructing the flat range table in setrefs.c, which effectively
    contains all the relations that the planner must have locked. In the
    case of a cached plan, AcquireExecutorLocks() must lock all of those
    relations, except those whose subnodes get pruned as result of
    ExecutorPrep().

    * PlannedStmt gets yet another field numPlanNodes that records the
    highest plan_node_id assigned to any of the node contained in the
    tree, which serves as the size to use when allocating the
    PlanPrepOutput array.

Maybe this should be more than one patch?  Say:

0001 to add ExecutorPrep and the boilerplate,
0002 to teach plancache.c to use the new facility

Thoughts?

--
Amit Langote
EDB: http://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dilip Kumar
Дата:
Сообщение: Re: decoupling table and index vacuum
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: Database-level collation version tracking