Re: Custom Scan APIs (Re: Custom Plan node)

Поиск
Список
Период
Сортировка
От	Kouhei Kaigai
Тема	Re: Custom Scan APIs (Re: Custom Plan node)
Дата	5 марта 2014 г. 08:22:29
Msg-id	9A28C8860F777E439AA12E8AEA7694F8F853A2@BPXM15GP.gisp.nec.co.jp обсуждение исходный текст
Ответ на	Re: Custom Scan APIs (Re: Custom Plan node) (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: Custom Scan APIs (Re: Custom Plan node)
Список	pgsql-hackers
Дерево обсуждения
Tom, thanks for your detailed comments.

> I apologize for not having paid much attention to this thread so far.
> It kept getting stuck on my "to look at later" queue.  Anyway, I've taken
> a preliminary look at the v7 patch now.
>
> While the patch seems roughly along the lines of what we talked about last
> PGCon, I share Stephen's unease about a lot of the details.  It's not
> entirely clear that these hooks are really good for anything, and it's even
> less clear what APIs the hook functions should be expected to depend on.
> I really do not like the approach embodied in the later patches of "oh,
> we'll just expose whatever static planner functions seem convenient".
> That's not an approach that's available when doing actual external
> development of an extension, and even if it were that doesn't make it a
> good idea.  The larger the exposed surface of functions the harder it is
> to know what's safe to change.
>
Hmm. It needs a clear reasoning to expose the function rather than
its convenience.

> Anyway, on to specifics:
>
> * Please drop the whole register_custom_provider/get_custom_provider API.
> There is no reason other than debugging for a provider to have a name at
> all, and if we expect providers to have unique names then that creates a
> collision risk for independently-created extensions.  AFAICS, it's
> sufficient for a function hooked into one of the add-a-path hooks to include
> a pointer to a struct-of-function-pointers in the Path object it returns,
> and similarly the CustomScan Plan object can contain a pointer inserted
> when it's created.  I don't object to having a name field in the function
> pointer structs for debugging reasons, but I don't want any lookups being
> done on it.
>
One thing I was worrying about is how copyObject() and nodeToString() support
set of function pointer tables around custom-scan node, however, you suggested
to change the assumption here. So, I think these functions become unnecessary.

> * The function-struct pointers should be marked const in the referencing
> nodes, to indicate that the core code won't be modifying or copying them.
> In practice they'd probably be statically allocated constants in the
> extensions anyway.
>
OK,

> * The patch does lots of violence to the separation between planner and
> executor, starting with the decision to include nodes/relation.h in
> executor.h.  That will not do at all.  I see that you did that because you
> wanted to make ExecSupportsMarkRestore take a Path, but we need some other
> answer.  One slightly grotty answer is to invent two different customscan
> Plan types, one that supports mark/restore and one that doesn't, so that
> ExecSupportsMarkRestore could still just look at the Plan type tag.
> (BTW, path->pathtype is supposed to contain the node tag of the Plan node
> that the path would produce.  Putting T_CustomPath in it is entirely
> tone-deaf.)  Another way would be to remove ExecSupportsMarkRestore in
> favor of some new function in the planner; but it's good to keep it in
> execAmi.c since that has other knowledge of which plan types support
> mark/restore.
>
OK, I'll add one derivative node delivertive plan node type,
CustomScanMarkRestore for instance.
Probably, it shall be populated on the create_customscan_plan()
according to the flag being set on the CustomPath.

> * More generally, I'm not convinced that exactly one Path type and exactly
> one Plan type is going to get us very far.  It seems rather ugly to use
> the same Plan type for both scan and join nodes, and what will happen if
> somebody wants to build a custom Append node, or something else that has
> neither zero nor two subplans?
>
In the previous discussion, CustomJoin will be nonsense because we know
limited number of join algorithms: nest-loop, hash-join and merge-join, unlike
variation of logic to scan relations. Also, IIUC, someone didn't want to add
custom- something node types for each built-in types.
So, we concluded to put CustomScan node to replace built-in join / scan at
that time.
Regarding to the Append node, it probably needs to be enhanced to have
list of subplans on CustomScan, or add individual CustomAppend node, or
opaque "CustomPlan" may be sufficient if it handles EXPLAIN by itself.

> * nodeCustom.h is being completely abused.  That should only export the
> functions in nodeCustom.c, which are going to be pretty much one-liners
> anyway.  The right place to put the function pointer struct definitions
> is someplace else.  I'd be inclined to start by separating the function
> pointers into two structs, one for functions needed for a Path and one for
> functions needed for a Plan, so that you don't have this problem of having
> to import everything the planner knows into an executor header or vice versa.
> Most likely you could just put the Path function pointer struct declaration
> next to CustomPath in relation.h, and the one for Plans next to CustomPlan
> (or the variants thereof) in plannodes.h.
>
Yes. I didn't have clear idea where we should put the definition of interfaces.
Probably, InitCustomScanPlan (maybe, CreateCustomScanPlan) shall be moved to
relation.h, and rest of callbacks shall be moved to plannodes.h.

> * The set of fields provided in CustomScan seems nonsensical.  I'm not even
> sure that it should be derived from Scan; that's okay for plan types that
> actually are scans of a base relation, but it's confusing overhead for
> anything that's say a join, or a custom sort node, or anything like that.
> Maybe one argument for multiple plan node types is that one would be derived
> from Scan and one directly from Plan.
>
The reason why CustomScan is derived from Scan is, some of backend code
wants to know rtindex of the relation to be referenced by this CustomScan.
The scanrelid of Scan is used in three points: nodeCustom.c, setrefs.c and
explain.c. The usage in nodeCustom.c is just for service routines, and the
usage in setrefs.c can be moved to the extension according to your suggestion.
We need to investigate the usage in explain.c; ExplainPreScanNode() walks
around the nodes to collect relids referenced in this query. If we don't
want to put a callback for this specific usage, it is a reasonable choice
to show the backend the associated scanrelid of CustomScan.
Is it a confusable rule, if extension has to set 0 when a particular relation
is not scanned in this CustomScan.

> * More generally, what this definition for CustomScan exposes is that we
> have no idea whatever what fields a custom plan node might need.  I'm
> inclined to think that what we should be assuming is that any custom path
> or plan node is really an object of a struct type known only to its providing
> extension, whose first field is the CustomPath or CustomPlan struct known
> to the core backend.  (Think C++ subclassing.)  This would imply that
> copyfuncs/equalfuncs/outfuncs support would have to be provided by the
> extension, which is in principle possible if we add function pointers for
> those operations to the struct linked to from the path/plan object.
> (Notationally this might be a bit of a pain, since the macros that we use
> in the functions in copyfuncs.c etc aren't public.  Not sure if it's worth
> exposing those somewhere, or if people should just copy/paste them.)  This
> approach would probably also avoid the need for the klugy bitmapset
> representation you propose in patch 3.
>
It's a breakthrough!
Probably, Path node needs to have a callback on outfuncs.c.
Also, Plan node needs to have a callback on copyfuncs.c and outfuncs.c.
I think, prototype of the callback functions are not specific to custom-
scan node, so it should be declared in the nodes/nodes.h.

> * This also implies that create_customscan_plan is completely bogus.
> A custom plan provider *must* provide a callback function for that, because
> only it will know how big a node to palloc.  There are far too many
> assumptions in create_customscan_plan anyway; I think there is probably
> nothing at all in that function that shouldn't be getting done by the custom
> provider instead.
>
OK, InitCustomScanPlan shall become CreateCustomScanPlan probably, to
return an object being palloc()'ed with arbitrary size.

> * Likewise, there is way too much hard-wired stuff in explain.c.  It should
> not be optional whether a custom plan provider provides an explain support
> function, and that function should be doing pretty much everything involved
> in printing the node.
>
Probably, the hunk around show_customscan_info() call can be entirely moved
to the extension side. If so, I want ExplainNode() being an extern function,
because it allows extensions to print underlying plan-nodes.

> * I don't believe in the hard-wired implementation in setrefs.c either.
>
Are you saying the hard-wired portion in setrefs.c can be moved to the
extension side? If fix_scan_expr() become extern function, I think it
is feasible.

> * Get rid of the CUSTOM_VAR business, too (including custom_tupdesc).
> That's at best badly thought out, and at worst a source of new bugs.
> Under what circumstances is a custom plan node going to contain any Vars
> that don't reduce to either scan or input-plan variables?  None, because
> that would imply that it was doing something unrelated to the requested
> query.
>
I also want to rid it, if we have alternative idea to solve the issue.
The varno of Var-node originally has rt-index of the relation being referenced,
then setrefs.c adjusts it according to the offset and relations type to be
referenced.
In case of Var-node that references joined-relations, it shall be replaced to
either INNER_VAR or OUTER_VAR according the location of underlying
relations. It eventually makes ExecEvalScalarVar() to reference either
ecxt_innertuple or ecxt_outertuple, however, it is problematic if we already
consolidated tuples come from the both side into one.
For example, the enhanced postgres_fdw fetches the result set of remote
join query, thus a tuple contains the fields come from both side.
In this case, what varno shall be suitable to put?

> * The API for add_join_path_hook seems overcomplex, as well as too full
> of implementation details that should remain private to joinpath.c.
> I particularly object to passing the mergeclause lists, which seem unlikely
> to be of interest for non-mergejoin plan types anyway.
> More generally, it seems likely that this hook is at the wrong level of
> abstraction; it forces the hook function to concern itself with a lot of
> stuff about join legality and parameterization (which I note your patch3
> code fails to do; but that's not an optional thing).
>
I'd like to see if you have idea where the hook shall be located, and which
kind of abstraction is suitable.

> * After a bit of reflection I think the best thing to do might be to ditch
> add_join_path_hook for 9.4, on these grounds:
> 1. You've got enough to do to get the rest of the patch committable.
> 2. Like Stephen, I feel that the proposed usage as a substitute for FDW-based
> foreign join planning is not the direction we want to travel.
> 3. Without that, the use case for new join nodes seems pretty marginal,
> as opposed to say alternative sort-node implementations (a case not
> supported by this patch, except to the extent that you could use them in
> explicit-sort mergejoins if you duplicated large parts of joinpath.c).
>
Are you suggesting an alternative merge join path that uses sort-node
on top of custom-scan node, aren't you?
Probably, this path shall be constructed using existing MergeJoin path
that has one or two CustomScan node for sorting or scan and sorting.
Can I share your suggestion correctly?

> * Getting nitpicky ... what is the rationale for the doubled underscore
> in the CUSTOM__ flag names?  That's just a typo waiting to happen IMO.
>
Sorry, I intended the double underline as a separator towards prefix portion
of this label.

> * Why is there more than one call site for add_scan_path_hook?  I don't
> see the need for the calling macro with the randomly inconsistent name,
> either.
>
Where is the best place to do? Even though I cannot imagine the situation
to run sub-query or cte by extensions, its path is constructed during
set_rel_size(), so I had to put the hook for each set_xxxx_pathlist()
functions.

> * The test arrangement for contrib/ctidscan is needlessly complex, and
> testing it in the core tests is a bogus idea anyway.  Why not just let it
> contain its own test script like most other contrib modules?
>
I thought the regression test of CustomScan interface is useful in the
core test. However, it seems to me a reasonable suggestion to implement
this test as usual contrib regression test. So, I'll adjust it.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
В списке pgsql-hackers по дате отправления:
Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Custom Scan APIs (Re: Custom Plan node)