[v9.5] Custom Plan API

Поиск
Список
Период
Сортировка
От Kouhei Kaigai
Тема [v9.5] Custom Plan API
Дата
Msg-id 9A28C8860F777E439AA12E8AEA7694F8F9E7B1@BPXM15GP.gisp.nec.co.jp
обсуждение исходный текст
Ответы Re: [v9.5] Custom Plan API
Re: [v9.5] Custom Plan API
Список pgsql-hackers
Prior to the development cycle towards v9.5, I'd like to reopen
the discussion of custom-plan interface. Even though we had lots
of discussion during the last three commit-fests, several issues
are still under discussion. So, I'd like to clarify direction of
the implementation, prior to the first commit-fest.

(1) DDL support and system catalog

Simon suggested that DDL command should be supported to track custom-
plan providers being installed, and to avoid nonsense hook calls
if it is an obvious case that custom-plan provider can help. It also
makes sense to give a chance to load extensions once installed.
(In the previous design, I assumed modules are loaded by LOAD command
or *_preload_libraries parameters).

I tried to implement the following syntax:

  CREATE CUSTOM PLAN <name> FOR (scan|join|any) HANDLER <func_name>;

It records a particular function as an entrypoint of custom-plan provider,
then it will be called when planner tries to find out the best path to scan
or join relations. This function takes an argument (INTERNAL type) that packs
information to construct and register an alternative scan/join path, like
PlannerInfo, RelOptInfo and so on.

(*) The data structure below will be supplied, in case of scan path.
  typedef struct {
     uint32          custom_class;
     PlannerInfo    *root;
     RelOptInfo     *baserel;
     RangeTblEntry  *rte;
  } customScanArg;

This function, usually implemented with C-language, can construct a custom
object being delivered from CustomPath type that contains a set of function
pointers; including functions that populate another objects delivered from
CustomPlan or CustomPlanState as I did in the patch towards v9.4 development.

Properties of individual custom-plan providers are recorded in the
pg_custom_plan system catalog. Right now, its definition is quite simple
- only superuser can create / drop custom-plan providers, and its definition
does not belong to a particular namespace.
Because of this assumption (only superuser can touch), I don't put database
ACL mechanism here.
What kind of characteristics should be there?


(2) Static functions to be exported

Tom concerned that custom-plan API needs several key functions can be
called by extensions, although these are declared as static functions,
thus, it looks like a part of interfaces.
Once people thought it is stable ones we can use beyond the version up,
it may become a barrier to the future improvement in the core code.
Is it a right understanding, isn't it?

One solution is to write a notice clearly, like: "these external functions
are not stable interfaces, so extension should not assumed these functions
are available beyond future version up".

Nevertheless, more stable functions are more kindness for authors of extensions.
So, I tried a few approaches.

First of all, we categorized functions into three categories.
(A) It walks on plan/expression tree recursively.
(B) It modifies internal state of the core backend.
(C) It is commonly used but in a particular source file.

Although the number of functions are not so many, (A) and (B) must have
its entrypoint from extensions. If unavailable, extension needs to manage
a copied code with small enhancement by itself, and its burden is similar
to just branching the tree.
Example of (A) are: create_plan_recurse, set_plan_refs, ...
Example of (B) are: fix_expr_common, ...

On the other hands, (C) functions are helpful if available, however, it
is not mandatory requirement to implement.

Our first trial, according to the proposition by Tom, is to investigate
a common walker function on plan tree as we are now doing on expression
tree. We expected, we can give function pointers of key routines to
extensions, instead of exporting the static functions.
However, it didn't work well because existing recursive call takes
various kind of jobs for each plan-node type, so it didn't fit a structure
of walker functions; that applies a uniform operation for each node.

Note that, I assumed the following walker functions that applies plan_walker
or expr_walker on the underlying plan/expression trees.
    bool
    plan_tree_walker(Plan *plan,
                     bool (*plan_walker) (),
                     bool (*expr_walker) (),
                     void *context)
Please tell me if it is different from your ideas, I'll reconsider it.

On the next, I tried another approach that gives function pointers of
(A) and (B) functions as a part of custom-plan interface.
It is workable at least, however, it seems to me its interface definition
has advantage in comparison to the original approach.

For example, below is definition of the callback in setref.c.

+   void    (*SetCustomPlanRef)(PlannerInfo *root,
+                               CustomPlan *custom_plan,
+                               int rtoffset,
+                               Plan *(*fn_set_plan_refs)(PlannerInfo *root,
+                                                         Plan *plan,
+                                                         int rtoffset),
+                               void (*fn_fix_expr_common)(PlannerInfo *root,
+                                                          Node *node));

Extension needs set_plan_refs() and fix_expr_common() at least, I added
function pointers of them. But this definition has to be updated according
to the future update of these functions. It does not seem to me a proper
way to smooth the impact of future internal change.

So, I'd like to find out where is a good common ground to solve the matter.

One idea is the first simple solution. The core PostgreSQL will be developed
independently from the out-of-tree modules, so we don't care about stability
of declaration of internal functions, even if it is exported to multiple
source files. (I believe it is our usual manner.)

One other idea is, a refactoring of the core backend to consolidate routines
per plan-node, not processing stage. For example, createplan.c contains most
of codes commonly needed to create plan, in addition to individual plan node.
Let's assume a function like create_seqscan_plan() are located in a separated
source file, then routines to be exported become clear.
One expected disadvantage is, this refactoring makes complicated to back patches.

Do you have any other ideas to implement it well?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

> -----Original Message-----
> From: Kohei KaiGai [mailto:kaigai@kaigai.gr.jp]
> Sent: Tuesday, April 29, 2014 10:07 AM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Tom Lane; Andres Freund; Robert Haas; Simon Riggs; PgHacker; Stephen
> Frost; Shigeru Hanada; Jim Mlodgenski; Peter Eisentraut
> Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
>
> >> Yeah.  I'm still not exactly convinced that custom-scan will ever
> >> allow independent development of new plan types (which, with all due
> >> respect to Robert, is what it was being sold as last year in Ottawa).
> >> But I'm not opposed in principle to committing it, if we can find a
> >> way to have a cleaner API for things like setrefs.c.  It seems like
> >> late-stage planner processing in general is an issue for this patch
> >> (createplan.c and subselect.c are also looking messy).  EXPLAIN isn't
> too great either.
> >>
> >> I'm not sure exactly what to do about those cases, but I wonder
> >> whether things would get better if we had the equivalent of
> >> expression_tree_walker/mutator capability for plan nodes.  The state
> >> of affairs in setrefs and subselect, at least, is a bit reminiscent
> >> of the bad old days when we had lots of different bespoke code for
> >> traversing expression trees.
> >>
> > Hmm. If we have something like expression_tree_walker/mutator for plan
> > nodes, we can pass a walker/mutator function's pointer instead of
> > exposing static functions that takes recursive jobs.
> > If custom-plan provider (that has sub-plans) got a callback with
> > walker/ mutator pointer, all it has to do for sub-plans are calling
> > this new plan-tree walking support routine with supplied walker/mutator.
> > It seems to me more simple design than what I did.
> >
> I tried to code the similar walker/mutator functions on plan-node tree,
> however, it was not available to implement these routines enough simple,
> because the job of walker/mutator functions are not uniform thus caller
> side also must have a large switch-case branches.
>
> I picked up setrefs.c for my investigation.
> The set_plan_refs() applies fix_scan_list() on the expression tree being
> appeared in the plan node if it is delivered from Scan, however, it also
> applies set_join_references() for subclass of Join, or
> set_dummy_tlist_references() for some other plan nodes.
> It implies that the walker/mutator functions of Plan node has to apply
> different operation according to the type of Plan node. I'm not certain
> how much different forms are needed.
> (In addition, set_plan_refs() performs usually like a walker, but often
> performs as a mutator if trivial subquery....)
>
> I'm expecting the function like below. It allows to call plan_walker
> function for each plan-node and also allows to call expr_walker function
> for each expression-node on the plan node.
>
>     bool
>     plan_tree_walker(Plan *plan,
>                      bool (*plan_walker) (),
>                      bool (*expr_walker) (),
>                      void *context)
>
> I'd like to see if something other form to implement this routine.
>
>
> One alternative idea to give custom-plan provider a chance to handle its
> subplans is, to give function pointers (1) to handle recursion of plan-tree
> and (2) to set up backend's internal state.
> In case of setrefs.c, set_plan_refs() and fix_expr_common() are minimum
> necessity for extensions. It also kills necessity to export static
> functions.
>
> How about your thought?
> --
> KaiGai Kohei <kaigai@kaigai.gr.jp>

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: tests for client programs
Следующее
От: Andres Freund
Дата:
Сообщение: Re: tests for client programs