WIP: Upper planner pathification
От | Tom Lane |
---|---|
Тема | WIP: Upper planner pathification |
Дата | |
Msg-id | 3795.1456689808@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: WIP: Upper planner pathification
(Andres Freund <andres@anarazel.de>)
Re: WIP: Upper planner pathification (Simon Riggs <simon@2ndQuadrant.com>) Re: WIP: Upper planner pathification (Robert Haas <robertmhaas@gmail.com>) Re: WIP: Upper planner pathification (Teodor Sigaev <teodor@sigaev.ru>) Re: WIP: Upper planner pathification (Robert Haas <robertmhaas@gmail.com>) Re: WIP: Upper planner pathification (Andres Freund <andres@anarazel.de>) |
Список | pgsql-hackers |
Those with long memories will recall that I've been waving my arms about $SUBJECT for more than five years. I started to work seriously on a patch last summer, and here is a version that I feel comfortable exposing to public scrutiny (which is not to call it "done"; more below). The basic point of this patch is to apply the generate-and-compare-Paths paradigm to the planning steps after query_planner(), which only covers scan and join processing (the FROM and WHERE parts of a query). These later steps deal with grouping/aggregation, window functions, SELECT DISTINCT, ORDER BY, LockRows (SELECT FOR UPDATE), LIMIT/OFFSET, and ModifyTable. Also UNION/INTERSECT/EXCEPT. Back in the bad old days we had only one way to do any of that stuff, so there was no real problem with the approach of converting query_planner's answer into a Plan and then stacking more Plan nodes atop that. Over time we grew other ways to do those steps, and chose between those ways with ad-hoc code in grouping_planner(). That was messy enough in itself, but it had other disadvantages too: subquery_planner() had to choose and return a single Plan, without regard to what the outer query might need. (Well, we did pass down a tuple_fraction parameter, but that is a pretty limited bit of information.) An even larger problem is that we had no way to handle addition of new alternative plan types for these upper-planning steps without fundamental hacking on grouping_planner(). An example is the code I added in commit addc42c339208d6a and later (planagg.c and other places) for optimization of MIN/MAX aggregates: that code had a positively incestuous relationship with grouping_planner(), and was darn ugly in multiple other ways besides. Of late, the main way this issue has surfaced is that we have no practical way to plan pushdown of aggregates or updates on remote tables to the responsible FDWs, because the FDWs cannot create Paths representing such operations. The present patch addresses this problem by inventing Path nodes to represent every post-scan/join step, and changing the API of grouping_planner() and subquery_planner() so that they return sets of Paths rather than single Plans. Creation of a Plan tree happens only after control returns to the top level of standard_planner(). The Path nodes for these post-scan/join steps are attached to "upper relation" RelOptInfos that didn't exist before. There are provisions for FDWs to inject candidate Paths for these upper-level steps. As proof of concept for that, planagg.c has been revised to work by injecting a new Path into the grouping/aggregation upper rel, rather than predetermining what the answer will be. This vastly decreases its coupling with both grouping_planner and some other parts of the system such as equivclass.c (though, the Law of Conservation of Cruft being what it is, I did have to push some knowledge about planagg.c's work into setrefs.c). I'm pretty pleased with the way this turned out. grouping_planner() is about half the length it was before, and much more straightforward IMO. planagg.c no longer seems like a complete hack; it's a reasonable prototype for injecting nontraditional implementation paths into aggregation or other late planner stages, and grouping_planner() doesn't need to know about it. The patch does add a lot of net new lines (and it's not done) but most of the new code is very straightforward boilerplate. The main thing that makes this WIP and not committable is that I've not yet bothered to implement outfuncs.c code and some other debug support for all the new path struct types. A lot of the new function header comments remain to be fleshed out too, and some more documentation needs to be written. But I think it's reviewable as-is; the other stuff would just make it even longer but not more interesting. There's a lot of future work to be done within this skeleton. Notably, I did not fix the UNION/INTERSECT/EXCEPT planning code to consider multiple paths; it still only generates a single Path tree. That code needs to be rewritten from scratch, probably, and it seems like doing so is a separate project. I'd also like to do some more refactoring in createplan.c: some code paths are still doing redundant cost estimation, and I'm growing increasingly dissatisfied with the "use_physical_tlist" hack. But that seems like a separable issue as well. So, where to go from here? I'm acutely aware that we're hard up against the final 9.6 commitfest, and that we discourage major patches arriving so late in a devel cycle. But I simply couldn't get this done any faster. I don't really want to hold it over for the 9.7 devel cycle. It's been enough trouble maintaining this patch in the face of conflicting commits over the last year or so (it's probably still got bugs related to parallel query...), and there definitely are conflicting patches in the upcoming 'fest. And the lack of this infrastructure is blocking progress on FDWs and some other things. So I'd really like to get this into 9.6. I'm happy to put it into the March commitfest if someone will volunteer to review it. Comments? regards, tom lane
Вложения
В списке pgsql-hackers по дате отправления: