Обсуждение: Custom Plan node
Hi, The attached patch adds a new plan node type; CustomPlan that enables extensions to get control during query execution, via registered callbacks. Right now, all the jobs of the executor are built-in, except for foreign scan, thus we have no way to run self implemented code within extension, instead of a particular plan-tree portion. It is painful for people who want to implement an edge feature on the executor, because all we can do is to replace whole of the executor portion but unreasonable maintenance burden. CustomPlan requires extensions two steps to use; registration of a set of callbacks, and manipulation of plan tree. First, extension has to register a set of callbacks with a unique name using RegisterCustomPlan(). Each callbacks are defined as follows, and extension is responsible to perform these routines works well. void BeginCustomPlan(CustomPlanState *cestate, int eflags); TupleTableSlot *ExecCustomPlan(CustomPlanState *node); Node *MultiExecCustomPlan(CustomPlanState *node); void EndCustomPlan(CustomPlanState *node); void ExplainCustomPlan(CustomPlanState *node, ExplainState *es); void ReScanCustomPlan(CustomPlanState *node); void ExecMarkPosCustomPlan(CustomPlanState *node); void ExecRestrPosCustomPlan(CustomPlanState *node); These callbacks are invoked if plan tree contained CustomPlan node. However, usual code path never construct this node type towards any SQL input. So, extension needs to manipulate the plan tree already constructed. It is the second job. Extension will put its local code on the planner_hook to reference and manipulate PlannedStmt object. It can replace particular nodes in plan tree by CustomPlan, or inject it into arbitrary point. Though my intention is to implement GPU accelerate table scan or other stuff on top of this feature, probably, some other useful features can be thought. Someone suggested it may be useful for PG-XC folks to implement clustered-scan, at the developer meeting. Also, I have an idea to implement in-memory query cache that enables to cut off a particular branch of plan tree. Probably, other folks have other ideas. The contrib/xtime module shows a simple example that records elapsed time of the underlying plan node, then print it at end of execution. For example, this query constructs the following plan-tree as usually we see. postgres=# EXPLAIN (costs off) SELECT * FROM t1 JOIN t2 ON t1.a = t2.x WHERE x BETWEEN 1000 AND 1200 ORDER BY y; QUERY PLAN ----------------------------------------------------- Sort Sort Key: t2.y -> Nested Loop -> Seq Scan on t2 Filter: ((x >= 1000) AND (x <= 1200)) -> Index Scan using t1_pkey on t1 Index Cond: (a = t2.x) (7 rows) Once xtime module manipulate the plan tree to inject CustomPlan, it shall become as follows: postgres=# LOAD '$libdir/xtime'; LOAD postgres=# EXPLAIN (costs off) SELECT * FROM t1 JOIN t2 ON t1.a = t2.x WHERE x BETWEEN 1000 AND 1200 ORDER BY y; QUERY PLAN ----------------------------------------------------------------- CustomPlan:xtime -> Sort Sort Key: y -> CustomPlan:xtime -> Nested Loop -> CustomPlan:xtime on t2 Filter: ((x >= 1000) AND (x <= 1200)) -> CustomPlan:xtime -> Index Scan using t1_pkey on t1 Index Cond: (a = x) (10 rows) You can see CustomPlan with name of "xtime" appeared in the plan-tree, then the executor calls functions being registered as callback of "xtime", when it met CustomPlan during recursive execution. Extension has to set name of custom plan provider at least when it construct a CustomPlan node and put it on the target plan tree. A set of callbacks are looked up by the name, and installed on CustomPlanState object for execution, on ExecIniNode(). The reason why I didn't put function pointer directly is, plan nodes need to be complianced to copyObject() and others. Please any comments. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Вложения
Kohei KaiGai <kaigai@kaigai.gr.jp> writes: > The attached patch adds a new plan node type; CustomPlan that enables > extensions to get control during query execution, via registered callbacks. TBH, I think this is really an exercise in building useless mechanism. I don't believe that any actually *interesting* new types of plan node can be inserted into a query plan without invasive changes to the planner, and so it's a bit pointless to set up hooks whereby you can avoid touching any source code in the executor. > ... Extension will put its local code on the planner_hook > to reference and manipulate PlannedStmt object. That is hardly a credible design for doing anything interesting with custom plans. It's got exactly the same problem you are complaining about for the executor, ie you have to replace the whole of the planner if you try to do things that way. One other point here is: if you need more than one kind of custom plan node, how will you tell what's what? I doubt you can completely eliminate the need for IsA-style tests, especially in the planner area. The sample contrib module here already exposes the failure mode I'm worried about: it falls down as soon as it sees a plan node type it doesn't know. If you could show me how this would work together with some other extension that's also adding custom plan nodes of its own, then I might think you had something. In the same vein, the patch fails to provide credible behavior for ExecSupportsMarkRestore, ExecMaterializesOutput, ExplainPreScanNode, search_plan_tree, and probably some other places that need to know about all possible plan node types. Even if you'd covered every one of those bases, you've still only got support for "generic" plan nodes having no particularly unique properties. As an example of what I'm thinking about here, NestLoop, which might be considered the most vanilla of all join plan nodes, actually has a lot of specialized infrastructure in both the planner and the executor to support its ability to pass outer-relation values into the inner-relation scan. I think that as soon as you try to do anything of real interest with custom plan nodes, you'll be finding you need special-purpose additions that no set of generic hooks could reasonably cover. In short, I don't understand or agree with this idea that major changes should be implementable without touching any of the core code in any way. This is open source --- if you need a modified version then modify it. I used to build systems that needed hook-style extensibility because the core code was burned into ROM; but that's not what we're dealing with today, and I don't really see the argument for sacrificing readability and performance by putting hooks everywhere, especially in places with vague, ever-changing API contracts. regards, tom lane
On Fri, Sep 6, 2013 at 4:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Kohei KaiGai <kaigai@kaigai.gr.jp> writes: >> The attached patch adds a new plan node type; CustomPlan that enables >> extensions to get control during query execution, via registered callbacks. > > TBH, I think this is really an exercise in building useless mechanism. > I don't believe that any actually *interesting* new types of plan node can > be inserted into a query plan without invasive changes to the planner, and > so it's a bit pointless to set up hooks whereby you can avoid touching any > source code in the executor. I find this a somewhat depressing response. Didn't we discuss this exact design at the developer meeting in Ottawa? I thought it sounded reasonable to you then, or at least I don't remember you panning it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I find this a somewhat depressing response. Didn't we discuss this > exact design at the developer meeting in Ottawa? I thought it sounded > reasonable to you then, or at least I don't remember you panning it. What I recall saying is that I didn't see how the planner side of it would work ... and I still don't see that. I'd be okay with committing executor-side fixes only if we had a vision of where we'd go on the planner side; but this patch doesn't offer any path forward there. This is not unlike the FDW stuff, where getting a reasonable set of planner APIs in place was by far the hardest part (and isn't really done even yet, since you still can't do remote joins or remote aggregation in any reasonable fashion). But you can do simple stuff reasonably simply, without reimplementing all of the planner along the way --- and I think we should look for some equivalent level of usefulness from this before we commit it. regards, tom lane
2013/9/7 Tom Lane <tgl@sss.pgh.pa.us>: > Robert Haas <robertmhaas@gmail.com> writes: >> I find this a somewhat depressing response. Didn't we discuss this >> exact design at the developer meeting in Ottawa? I thought it sounded >> reasonable to you then, or at least I don't remember you panning it. > > What I recall saying is that I didn't see how the planner side of it would > work ... and I still don't see that. I'd be okay with committing > executor-side fixes only if we had a vision of where we'd go on the > planner side; but this patch doesn't offer any path forward there. > The reason why this patch stick on executor-side is we concluded not to patch the planner code from the beginning in Ottawa because of its complexity. I'd also like to agree that planner support for custom plan is helpful to construct better execution plan, however, it also make sense even if this feature begins a functionality that offers a way to arrange a plan tree being already constructed. Anyway, let me investigate what's kind of APIs to be added for planner stage also. > This is not unlike the FDW stuff, where getting a reasonable set of > planner APIs in place was by far the hardest part (and isn't really done > even yet, since you still can't do remote joins or remote aggregation in > any reasonable fashion). But you can do simple stuff reasonably simply, > without reimplementing all of the planner along the way --- and I think > we should look for some equivalent level of usefulness from this before > we commit it. > Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
2013/9/7 Kohei KaiGai <kaigai@kaigai.gr.jp>: > 2013/9/7 Tom Lane <tgl@sss.pgh.pa.us>: >> Robert Haas <robertmhaas@gmail.com> writes: >>> I find this a somewhat depressing response. Didn't we discuss this >>> exact design at the developer meeting in Ottawa? I thought it sounded >>> reasonable to you then, or at least I don't remember you panning it. >> >> What I recall saying is that I didn't see how the planner side of it would >> work ... and I still don't see that. I'd be okay with committing >> executor-side fixes only if we had a vision of where we'd go on the >> planner side; but this patch doesn't offer any path forward there. >> > The reason why this patch stick on executor-side is we concluded > not to patch the planner code from the beginning in Ottawa because > of its complexity. > I'd also like to agree that planner support for custom plan is helpful > to construct better execution plan, however, it also make sense even > if this feature begins a functionality that offers a way to arrange a plan > tree being already constructed. > > Anyway, let me investigate what's kind of APIs to be added for planner > stage also. > It is a brief idea to add planner support on custom node, if we need it from the beginning. Of course, it is not still detailed considered and needs much brushing up, however, it may be a draft to implement this feature. We may be able to categorize plan node types into three; scan, join and others. Even though planner tries to test various combinations of join and scan to minimize its estimated cost, we have less options on other types like T_Agg and so on. It seems to me the other types are almost put according to the query's form, so it does not make a big problem even if all we can do is manipulation of plan-tree at planner_hook. That is similar to what proposed patch is doing. So, let's focus on join and scan. It needs to give extensions a chance to override built-in path if they can offer more cheap path. It leads an API that allows to add alternative paths when built-in feature is constructing candidate paths. Once path was added, we can compare them according to the estimated cost. For example, let's assume a query tries to join foreign table A and B managed by same postgres_fdw server, remote join likely has cheaper cost than local join. If extension has a capability to handle the case correctly, it may be able to add an alternative "custom-join" path with cheaper-cost. Then, this path shall be transformed to "CustomJoin" node that launches a query to get a result of A join B being remotely joined. In this case, here is no matter even if "CustomJoin" has underlying ForeignScan nodes on the left-/right-tree, because extension can handle the things to do with its arbitrary. So, the following APIs may be needed for planner support, at least. * API to add an alternative join path, in addition to built-in join logic. * API to add an alternative scan path, in addition to built-in scan logic. * API to construct "CustomJoin" according to the related path. * API to construct "CustomScan" according to the related path. Any comment please. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
On Sat, Sep 07, 2013 at 02:49:54PM +0200, Kohei KaiGai wrote: > 2013/9/7 Kohei KaiGai <kaigai@kaigai.gr.jp>: > > 2013/9/7 Tom Lane <tgl@sss.pgh.pa.us>: > >> Robert Haas <robertmhaas@gmail.com> writes: > >>> I find this a somewhat depressing response. Didn't we discuss this > >>> exact design at the developer meeting in Ottawa? I thought it sounded > >>> reasonable to you then, or at least I don't remember you panning it. > >> > >> What I recall saying is that I didn't see how the planner side of it would > >> work ... and I still don't see that. I'd be okay with committing > >> executor-side fixes only if we had a vision of where we'd go on the > >> planner side; but this patch doesn't offer any path forward there. > >> > > The reason why this patch stick on executor-side is we concluded > > not to patch the planner code from the beginning in Ottawa because > > of its complexity. > > I'd also like to agree that planner support for custom plan is helpful > > to construct better execution plan, however, it also make sense even > > if this feature begins a functionality that offers a way to arrange a plan > > tree being already constructed. > > > > Anyway, let me investigate what's kind of APIs to be added for planner > > stage also. > > > It is a brief idea to add planner support on custom node, if we need it > from the beginning. Of course, it is not still detailed considered and > needs much brushing up, however, it may be a draft to implement this > feature. > > We may be able to categorize plan node types into three; scan, join > and others. > > Even though planner tries to test various combinations of join and scan > to minimize its estimated cost, we have less options on other types > like T_Agg and so on. It seems to me the other types are almost put > according to the query's form, so it does not make a big problem even > if all we can do is manipulation of plan-tree at planner_hook. > That is similar to what proposed patch is doing. > > So, let's focus on join and scan. It needs to give extensions a chance > to override built-in path if they can offer more cheap path. > It leads an API that allows to add alternative paths when built-in feature > is constructing candidate paths. Once path was added, we can compare > them according to the estimated cost. > For example, let's assume a query tries to join foreign table A and B > managed by same postgres_fdw server, remote join likely has cheaper > cost than local join. If extension has a capability to handle the case > correctly, it may be able to add an alternative "custom-join" path with > cheaper-cost. > Then, this path shall be transformed to "CustomJoin" node that launches > a query to get a result of A join B being remotely joined. > In this case, here is no matter even if "CustomJoin" has underlying > ForeignScan nodes on the left-/right-tree, because extension can handle > the things to do with its arbitrary. > > So, the following APIs may be needed for planner support, at least. > > * API to add an alternative join path, in addition to built-in join logic. > * API to add an alternative scan path, in addition to built-in scan logic. > * API to construct "CustomJoin" according to the related path. > * API to construct "CustomScan" according to the related path. > > Any comment please. The broad outlines look great. Do we have any way, at least conceptually, to consider the graph of the cluster with edges weighted by network bandwidth and latency? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
2013/9/7 David Fetter <david@fetter.org>: > On Sat, Sep 07, 2013 at 02:49:54PM +0200, Kohei KaiGai wrote: >> 2013/9/7 Kohei KaiGai <kaigai@kaigai.gr.jp>: >> > 2013/9/7 Tom Lane <tgl@sss.pgh.pa.us>: >> >> Robert Haas <robertmhaas@gmail.com> writes: >> >>> I find this a somewhat depressing response. Didn't we discuss this >> >>> exact design at the developer meeting in Ottawa? I thought it sounded >> >>> reasonable to you then, or at least I don't remember you panning it. >> >> >> >> What I recall saying is that I didn't see how the planner side of it would >> >> work ... and I still don't see that. I'd be okay with committing >> >> executor-side fixes only if we had a vision of where we'd go on the >> >> planner side; but this patch doesn't offer any path forward there. >> >> >> > The reason why this patch stick on executor-side is we concluded >> > not to patch the planner code from the beginning in Ottawa because >> > of its complexity. >> > I'd also like to agree that planner support for custom plan is helpful >> > to construct better execution plan, however, it also make sense even >> > if this feature begins a functionality that offers a way to arrange a plan >> > tree being already constructed. >> > >> > Anyway, let me investigate what's kind of APIs to be added for planner >> > stage also. >> > >> It is a brief idea to add planner support on custom node, if we need it >> from the beginning. Of course, it is not still detailed considered and >> needs much brushing up, however, it may be a draft to implement this >> feature. >> >> We may be able to categorize plan node types into three; scan, join >> and others. >> >> Even though planner tries to test various combinations of join and scan >> to minimize its estimated cost, we have less options on other types >> like T_Agg and so on. It seems to me the other types are almost put >> according to the query's form, so it does not make a big problem even >> if all we can do is manipulation of plan-tree at planner_hook. >> That is similar to what proposed patch is doing. >> >> So, let's focus on join and scan. It needs to give extensions a chance >> to override built-in path if they can offer more cheap path. >> It leads an API that allows to add alternative paths when built-in feature >> is constructing candidate paths. Once path was added, we can compare >> them according to the estimated cost. >> For example, let's assume a query tries to join foreign table A and B >> managed by same postgres_fdw server, remote join likely has cheaper >> cost than local join. If extension has a capability to handle the case >> correctly, it may be able to add an alternative "custom-join" path with >> cheaper-cost. >> Then, this path shall be transformed to "CustomJoin" node that launches >> a query to get a result of A join B being remotely joined. >> In this case, here is no matter even if "CustomJoin" has underlying >> ForeignScan nodes on the left-/right-tree, because extension can handle >> the things to do with its arbitrary. >> >> So, the following APIs may be needed for planner support, at least. >> >> * API to add an alternative join path, in addition to built-in join logic. >> * API to add an alternative scan path, in addition to built-in scan logic. >> * API to construct "CustomJoin" according to the related path. >> * API to construct "CustomScan" according to the related path. >> >> Any comment please. > > The broad outlines look great. > > Do we have any way, at least conceptually, to consider the graph of > the cluster with edges weighted by network bandwidth and latency? > As postgres_fdw is now doing? Its configuration allows to add cost to connect remote server as startup cost, and also add cost to transfer data on network being multiplexed with estimated number of rows, according to per-server configuration. I think it is responsibility of the custom plan provider, and fully depends on the nature of what does it want to provide. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
On Sat, Sep 07, 2013 at 05:21:31PM +0200, Kohei KaiGai wrote: > 2013/9/7 David Fetter <david@fetter.org>: > > The broad outlines look great. > > > > Do we have any way, at least conceptually, to consider the graph > > of the cluster with edges weighted by network bandwidth and > > latency? > > > As postgres_fdw is now doing? Its configuration allows to add cost > to connect remote server as startup cost, and also add cost to > transfer data on network being multiplexed with estimated number of > rows, according to per-server configuration. I think it is > responsibility of the custom plan provider, and fully depends on the > nature of what does it want to provide. Interesting :) Sorry I was unclear. I meant something that could take into account the bandwidths and latencies throughout the graph, not just the connections directly from the originating node. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Fri, Sep 6, 2013 at 7:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I find this a somewhat depressing response. Didn't we discuss this >> exact design at the developer meeting in Ottawa? I thought it sounded >> reasonable to you then, or at least I don't remember you panning it. > > What I recall saying is that I didn't see how the planner side of it would > work ... and I still don't see that. I'd be okay with committing > executor-side fixes only if we had a vision of where we'd go on the > planner side; but this patch doesn't offer any path forward there. > > This is not unlike the FDW stuff, where getting a reasonable set of > planner APIs in place was by far the hardest part (and isn't really done > even yet, since you still can't do remote joins or remote aggregation in > any reasonable fashion). But you can do simple stuff reasonably simply, > without reimplementing all of the planner along the way --- and I think > we should look for some equivalent level of usefulness from this before > we commit it. I do think there are problems with this as written. The example consumer of the hook seems to contain a complete list of plan nodes, which is an oxymoron in the face of a facility to add custom plan nodes. But, I guess I'm not yet convinced that one-for-one substitution of nodes is impossible even with something about this simple. If someone can do a post-pass over the plan tree and replace a SeqScan node with an AwesomeSeqScan node or a Sort node with a RadixSort node, would that constitute a sufficient POC to justify this infrastructure? Obviously, what you'd really want is to be able to inject those nodes (with proper costing) at the time they'd otherwise be generated, since it could affect whether or not a path involving a substituted node survives in the first place, but I'm not sure it's reasonable to expect the planner infrastructure for such changes in the same path as the executor hooks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > But, I guess I'm not yet convinced that one-for-one substitution of > nodes is impossible even with something about this simple. If someone > can do a post-pass over the plan tree and replace a SeqScan node with > an AwesomeSeqScan node or a Sort node with a RadixSort node, would > that constitute a sufficient POC to justify this infrastructure? No, for exactly the reason you mention: such a change wouldn't have been accounted for in the planner's other choices, and thus this isn't anything more than a kluge. In these specific examples you'd have to ask whether it wouldn't make more sense to be modifying or hooking the executor's code for the existing plan node types, anyway. The main reason I can see for not attacking it like that would be if you wanted the planner to do something different --- which the above approach forecloses. Let me be clear that I'm not against the concept of custom plan nodes. But it was obvious from the beginning that making the executor deal with them would be much easier than making the planner deal with them. I don't think we should commit a bunch of executor-side infrastructure in the absence of any (ahem) plan for doing something realistic on the planner side. Either that infrastructure will go unused, or we'll be facing a continual stream of demands for doubtless-half-baked planner changes so that people can do something with it. I'd be willing to put in the infrastructure as soon as it's clear that we have a way forward, but not if it's never going to be more than a kluge. regards, tom lane
* Robert Haas (robertmhaas@gmail.com) wrote: > But, I guess I'm not yet convinced that one-for-one substitution of > nodes is impossible even with something about this simple. Couldn't that be done with hooks in those specific plan nodes, or similar..? Of course, as Tom points out, that wouldn't address how the costing is done and it could end up being wrong if the implementation of the node is completely different. All that said, I've already been wishing for a way to change how Append works to allow for parallel execution through FDWs; eg: you have a bunch of foreign tables (say, 32) to independent PG clusters on indepentdent pieces of hardware which can all execute a given request in parallel. With a UNION ALL view created over top of those tables, it'd be great if we fired off all the queries at once and then went through collecting the responses, instead of going through them serially.. The same approach could actually be said for Appends which go across tablespaces, if you consider that independent tablespaces mean independent and parallelizable I/O access. Of course, all of this would need to deal sanely with ORDER BY and LIMIT cases. Thanks, Stephen
On Mon, Sep 9, 2013 at 1:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Let me be clear that I'm not against the concept of custom plan nodes. > But it was obvious from the beginning that making the executor deal with > them would be much easier than making the planner deal with them. I don't > think we should commit a bunch of executor-side infrastructure in the > absence of any (ahem) plan for doing something realistic on the planner > side. Either that infrastructure will go unused, or we'll be facing a > continual stream of demands for doubtless-half-baked planner changes > so that people can do something with it. > > I'd be willing to put in the infrastructure as soon as it's clear that we > have a way forward, but not if it's never going to be more than a kluge. Fair enough, I think. So the action item for KaiGai is to think of how the planner integration might work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Sep 10, 2013 at 11:33 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> I'd be willing to put in the infrastructure as soon as it's clear that we >> have a way forward, but not if it's never going to be more than a kluge. > > Fair enough, I think. So the action item for KaiGai is to think of > how the planner integration might work. It's hard to imagine how the planner could possibly be pluggable in a generally useful way. It sounds like putting an insurmountable barrier in place that blocks a feature that would be useful in the Executor. If you only want to handle nodes which directly replace one of the existing nodes providing the same semantics then that might be possible. But I think you also want to be able to interpose new nodes in the tree, for example for the query cache idea. Frankly I think the planner is hard enough to change when you have full access to the source I can't imagine trying to manipulate it from a distance like this. -- greg
Greg Stark <stark@mit.edu> writes: > It's hard to imagine how the planner could possibly be pluggable in a > generally useful way. It sounds like putting an insurmountable barrier > in place that blocks a feature that would be useful in the Executor. But it's *not* useful without a credible way to modify the planner. > Frankly I think the planner is hard enough to change when you have > full access to the source I can't imagine trying to manipulate it from > a distance like this. Yeah. To tell the truth, I'm not really convinced that purely-plugin addition of new plan node types is going to be worth anything. I think pretty much any interesting project (for example, the parallelization of Append nodes that was mumbled about above) is going to require changes that won't fit within such an infrastructure. Still, I'm willing to accept a patch for plugin plan nodes, *if* it addresses the hard part of that problem and not only the easy part. regards, tom lane
2013/9/10 Robert Haas <robertmhaas@gmail.com>: > On Mon, Sep 9, 2013 at 1:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Let me be clear that I'm not against the concept of custom plan nodes. >> But it was obvious from the beginning that making the executor deal with >> them would be much easier than making the planner deal with them. I don't >> think we should commit a bunch of executor-side infrastructure in the >> absence of any (ahem) plan for doing something realistic on the planner >> side. Either that infrastructure will go unused, or we'll be facing a >> continual stream of demands for doubtless-half-baked planner changes >> so that people can do something with it. >> >> I'd be willing to put in the infrastructure as soon as it's clear that we >> have a way forward, but not if it's never going to be more than a kluge. > > Fair enough, I think. So the action item for KaiGai is to think of > how the planner integration might work. > Do you think the idea I mentioned at the upthread is worth to investigate for more detailed consideration? Or, does it seems to you short-sighted thinking to fit this infrastructure with planner? It categorizes plan node into three: join, scan and other stuff. Cost based estimation is almost applied on join and scan, so abstracted scan and join may make sense to inform core planner what does this custom plan node try to do. On the other hand, other stuff, like Agg, is a stuff that must be added according to the provided query, even if its cost estimation was not small, to perform as the provided query described. So, I thought important one is integration of join and scan, but manipulation of plan tree for other stuff is sufficient. How about your opinion? Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
On Tue, Sep 10, 2013 at 11:45 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: >> Fair enough, I think. So the action item for KaiGai is to think of >> how the planner integration might work. >> > Do you think the idea I mentioned at the upthread is worth to investigate > for more detailed consideration? Or, does it seems to you short-sighted > thinking to fit this infrastructure with planner? > > It categorizes plan node into three: join, scan and other stuff. > Cost based estimation is almost applied on join and scan, so abstracted > scan and join may make sense to inform core planner what does this > custom plan node try to do. > On the other hand, other stuff, like Agg, is a stuff that must be added > according to the provided query, even if its cost estimation was not small, > to perform as the provided query described. > So, I thought important one is integration of join and scan, but manipulation > of plan tree for other stuff is sufficient. > > How about your opinion? Well, I don't know that I'm smart enough to predict every sort of thing that someone might want to do here, unfortunately. This is a difficult area: there are many possible things someone might want to do, and as Tom points out, there's a lot of special handling of particular node types that can make things difficult. And I can't claim to be an expert in this area. That having been said, I think the idea of a CustomScan node is probably worth investigating. I don't know if that would work out well or poorly, but I think it would be worth some experimentation. Perhaps you could have a hook that gets called for each baserel, and can decide whether or not it wishes to inject any additional paths; and then a CustomScan node that could be used to introduce such paths.I've been thinking that we really ought to have theability to optimize CTID range queries, like SELECT * FROM foo WHERE ctid > 'some constant'. We have a Tid Scan node, but it only handles equalities, not inequalities. I suppose this functionality should really be in core, but since it's not it might make an interesting test for the infrastructure you're proposing. You may be able to think of something else you like better; it's just a thought. I am a little less sanguine about the chances of a CustomJoin node working out well. I agree that we need something to handle join pushdown, but it seems to me that might be done by providing a Foreign Scan path into the joinrel rather than by adding a concept of foreign joins per se. There are other possible new join types, like the Range Join that Jeff Davis has mentioned in the past, which might be interesting. But overall, I can't see us adding very many more join types, so I'm not totally sure how much extensibility would help us here. We've added a few scan types over the years (index only scans in 9.2, and bitmap index scans in 8.1, I think) but all of our existing join types go back to time immemorial. And I think that lumping everything else together under "not a scan or join" has the least promise of all. Is replacing Append really the same as replacing Sort? I think we'll need to think harder here about what we're trying to accomplish and how to get there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2013/9/13 Robert Haas <robertmhaas@gmail.com>: > On Tue, Sep 10, 2013 at 11:45 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: >>> Fair enough, I think. So the action item for KaiGai is to think of >>> how the planner integration might work. >>> >> Do you think the idea I mentioned at the upthread is worth to investigate >> for more detailed consideration? Or, does it seems to you short-sighted >> thinking to fit this infrastructure with planner? >> >> It categorizes plan node into three: join, scan and other stuff. >> Cost based estimation is almost applied on join and scan, so abstracted >> scan and join may make sense to inform core planner what does this >> custom plan node try to do. >> On the other hand, other stuff, like Agg, is a stuff that must be added >> according to the provided query, even if its cost estimation was not small, >> to perform as the provided query described. >> So, I thought important one is integration of join and scan, but manipulation >> of plan tree for other stuff is sufficient. >> >> How about your opinion? > > Well, I don't know that I'm smart enough to predict every sort of > thing that someone might want to do here, unfortunately. This is a > difficult area: there are many possible things someone might want to > do, and as Tom points out, there's a lot of special handling of > particular node types that can make things difficult. And I can't > claim to be an expert in this area. > Sorry for my late response. I've tried to investigate the planner code to find out the way to integrate this custom api, and it is still in progress. One special handling I found was that create_join_plan() adjust root->curOuterRels prior to recursion of inner tree if NestLoop. Probably, we need some flags to control these special handling in the core. It is a hard job to list up all the stuff, so it seems to me we need to check-up them during code construction... > That having been said, I think the idea of a CustomScan node is > probably worth investigating. I don't know if that would work out > well or poorly, but I think it would be worth some experimentation. > Perhaps you could have a hook that gets called for each baserel, and > can decide whether or not it wishes to inject any additional paths; > and then a CustomScan node that could be used to introduce such paths. > I've been thinking that we really ought to have the ability to > optimize CTID range queries, like SELECT * FROM foo WHERE ctid > 'some > constant'. We have a Tid Scan node, but it only handles equalities, > not inequalities. I suppose this functionality should really be in > core, but since it's not it might make an interesting test for the > infrastructure you're proposing. You may be able to think of > something else you like better; it's just a thought. > This above framework was exactly what I considered. Probably, we have to put a hook on functions invoked by set_base_rel_pathlist() to add another possible way to scan the provided baserel, then set_cheapest() will choose the most reasonable one. The attached patch, it's just a works-in-progress, shows which hook I try to put around the code. Please grep it with "add_custom_scan_paths". Regarding to the "guest module" of this framework, another idea that I have is, built-in query cache module that returns previous scan result being cached if table contents was not updated from the previous run. Probably, it makes sense in case when most of rows are filtered out in this scan. Anyway, I'd like to consider something useful to demonstrate this API. > I am a little less sanguine about the chances of a CustomJoin node > working out well. I agree that we need something to handle join > pushdown, but it seems to me that might be done by providing a Foreign > Scan path into the joinrel rather than by adding a concept of foreign > joins per se. > Indeed, if we have a hook on add_paths_to_joinrel(), it also makes sense for foreign tables; probably, planner will choose foreign-path instead of existing join node including foreign-scans. > There are other possible new join types, like the Range > Join that Jeff Davis has mentioned in the past, which might be > interesting. But overall, I can't see us adding very many more join > types, so I'm not totally sure how much extensibility would help us > here. We've added a few scan types over the years (index only scans > in 9.2, and bitmap index scans in 8.1, I think) but all of our > existing join types go back to time immemorial. > It seems to me a significant point. Even if the custom node being added by extension instead of joins looks like a scan for core-PostgreSQL, extension will be able to run its custom join equivalent to join. I think, the above built-in query cache idea make sense to demonstrate this pseudo join node; that will be able to hold a materialized result being already joined. At least, it seems to me sufficient for my target; table join accelerated with GPU device. > And I think that lumping everything else together under "not a scan or > join" has the least promise of all. Is replacing Append really the > same as replacing Sort? I think we'll need to think harder here about > what we're trying to accomplish and how to get there. > As long as extension modifies PlannedStmt on the planner_hook, I don't think it is not difficult so much, as I demonstrate on the previous patch. Unlike scan or join, existing code is not designed to compare multiple possible paths, so it seems to me a feature to adjust a plan-tree already construct is sufficient for most usage because extension can decide which one can offer more cheap path than built-in ones. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Вложения
On Thu, Oct 3, 2013 at 3:05 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: > Sorry for my late response. I've tried to investigate the planner code > to find out the way to integrate this custom api, and it is still in > progress. > One special handling I found was that create_join_plan() adjust > root->curOuterRels prior to recursion of inner tree if NestLoop. > Probably, we need some flags to control these special handling > in the core. > It is a hard job to list up all the stuff, so it seems to me we need > to check-up them during code construction... I'm pretty sure that is only a very small tip of a very large iceberg. > This above framework was exactly what I considered. > Probably, we have to put a hook on functions invoked by > set_base_rel_pathlist() to add another possible way to scan > the provided baserel, then set_cheapest() will choose the > most reasonable one. > The attached patch, it's just a works-in-progress, shows > which hook I try to put around the code. Please grep it > with "add_custom_scan_paths". That seems fairly reasonable to me, but I think we'd want a working demonstration > Regarding to the "guest module" of this framework, another > idea that I have is, built-in query cache module that returns > previous scan result being cached if table contents was not > updated from the previous run. Probably, it makes sense in > case when most of rows are filtered out in this scan. > Anyway, I'd like to consider something useful to demonstrate > this API. I doubt that has any chance of working well. Supposing that query caching is a feature we want, I don't think that plan trees are the right place to try to install it. That sounds like something that ought to be done before we plan and execute the query in the first place. >> I am a little less sanguine about the chances of a CustomJoin node >> working out well. I agree that we need something to handle join >> pushdown, but it seems to me that might be done by providing a Foreign >> Scan path into the joinrel rather than by adding a concept of foreign >> joins per se. >> > Indeed, if we have a hook on add_paths_to_joinrel(), it also makes > sense for foreign tables; probably, planner will choose foreign-path > instead of existing join node including foreign-scans. Yes, I think it's reasonable to think about injecting a scan path into a join node. >> And I think that lumping everything else together under "not a scan or >> join" has the least promise of all. Is replacing Append really the >> same as replacing Sort? I think we'll need to think harder here about >> what we're trying to accomplish and how to get there. >> > As long as extension modifies PlannedStmt on the planner_hook, > I don't think it is not difficult so much, as I demonstrate on the > previous patch. > Unlike scan or join, existing code is not designed to compare > multiple possible paths, so it seems to me a feature to adjust > a plan-tree already construct is sufficient for most usage > because extension can decide which one can offer more cheap > path than built-in ones. Well, there were a lot of problems with your demonstration, which have already been pointed out upthread. I'm skeptical about the idea of simply replacing planner nodes wholesale, and Tom is outright opposed.I think you'll do better to focus on a narrower case- I'd suggest custom scan nodes - and leave the rest as a project for another time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2013/10/3 Robert Haas <robertmhaas@gmail.com>: >>> I am a little less sanguine about the chances of a CustomJoin node >>> working out well. I agree that we need something to handle join >>> pushdown, but it seems to me that might be done by providing a Foreign >>> Scan path into the joinrel rather than by adding a concept of foreign >>> joins per se. >>> >> Indeed, if we have a hook on add_paths_to_joinrel(), it also makes >> sense for foreign tables; probably, planner will choose foreign-path >> instead of existing join node including foreign-scans. > > Yes, I think it's reasonable to think about injecting a scan path into > a join node. > >>> And I think that lumping everything else together under "not a scan or >>> join" has the least promise of all. Is replacing Append really the >>> same as replacing Sort? I think we'll need to think harder here about >>> what we're trying to accomplish and how to get there. >>> >> As long as extension modifies PlannedStmt on the planner_hook, >> I don't think it is not difficult so much, as I demonstrate on the >> previous patch. >> Unlike scan or join, existing code is not designed to compare >> multiple possible paths, so it seems to me a feature to adjust >> a plan-tree already construct is sufficient for most usage >> because extension can decide which one can offer more cheap >> path than built-in ones. > > Well, there were a lot of problems with your demonstration, which have > already been pointed out upthread. I'm skeptical about the idea of > simply replacing planner nodes wholesale, and Tom is outright opposed. > I think you'll do better to focus on a narrower case - I'd suggest > custom scan nodes - and leave the rest as a project for another time. > Thanks, it makes me clear what we should target on v9.4 development. Towards the next commitfest, I'm planning to develop the following features: * CustomScan node that can run custom code instead of built-in scan nodes. * Join-pushdown of postgres_fdw using the hook to be located on the add_paths_to_joinrel(), for demonstration purpose. * Something new way to scan a relation; probably, your suggested ctid scan with less or bigger qualifier is a good example,also for demonstration purpose. Probably, above set of jobs will be the first chunk of this feature. Then, let's do other stuff like Append, Sort, Aggregate and so on later. It seems to me a reasonable strategy. Even though it is an off-topic in this thread.... >> Regarding to the "guest module" of this framework, another >> idea that I have is, built-in query cache module that returns >> previous scan result being cached if table contents was not >> updated from the previous run. Probably, it makes sense in >> case when most of rows are filtered out in this scan. >> Anyway, I'd like to consider something useful to demonstrate >> this API. > > I doubt that has any chance of working well. Supposing that query > caching is a feature we want, I don't think that plan trees are the > right place to try to install it. That sounds like something that > ought to be done before we plan and execute the query in the first > place. > The query cache I mention above is intended to cache a part of query tree. For example, if an ad-hoc query tries to join A,B and C, and data analyst modifies qualifiers attached on C but it joins A and B as usual. I though, it makes sense if we can cache the result of the statically joined result. Idea is similar to materialized view. If executor can pick up cached result on AxB, all it needs to do is joining C and the cached one. Of course, I know it may have difficulty like cache validation. :-( Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>