PoC: adding CustomJoin, separate from CustomScan
От | Tomas Vondra |
---|---|
Тема | PoC: adding CustomJoin, separate from CustomScan |
Дата | |
Msg-id | b8fa8ed4-4444-404f-82f8-8b86c6e82d4d@vondra.me обсуждение исходный текст |
Ответы |
Re: PoC: adding CustomJoin, separate from CustomScan
|
Список | pgsql-hackers |
Hi, I've been experimenting with executor nodes inspired by papers on executor robustness (think "Algorithms that don't explode if an estimate is off."). And I decided to use the CustomScan API, because it seemed like an ideal solution for my experiments - convenient, isolated, easy to try on other releases, ... I'm going to discuss a couple issues with using CustomScan for joins, and propose some improvements to the CustomScan API to address those. I'd welcome feedback on the proposed changes / ideas of alternative approaches, etc. Of course, I may be wrong / missing something about the CustomScan design, feel free to point that out. For the "scan" algorithm (SmoothScan [1]), this mostly worked fine. I ended up copying some (a lot) of the code matching clauses to an index from indexscan planning, and that's a bit ugly. But the CustomScan API does not promise to address that, so it did not surprise me, and I accept that and I can deal with that (at least for now). Other than that, the CustomScan worked fine for my experimental "scan" method. But when implementing a custom join (generalized join [2]), it was a quite different story. The CustomScan claims to support joins, you just need to use the set_join_pathlist hook, and set a couple fields in the plan/executor nodes differently. Like, leave scanrelid=0 etc. And that kinda works for the planning phase, but at execution it turned out to be much trickier. The main hurdle I ran into is how do you construct the result tuple? In regular joins, you can do that by setting ecxt_innertuple/ecxt_outertuple, and calling ExecProject(). Or something along those lines. But for CustomScan joins, that's not possible - the targetlist is modified so that all the Vars have INDEX_VAR, as pointed out by a comment in primnodes.h: > In ForeignScan and CustomScan plan nodes, INDEX_VAR is abused to > signify references to columns of a custom scan tuple type. Which makes sense, because while the CustomScan can have nested plans, it does not have a concept of an explicit inner/outer plan. It seemed to me I'd have to essentially build the tuples "on my own", which seems quite tricky and inconvenient. And also a bit against the idea of CustomScan shielding extensions from this kind of "core" stuff. I may be entirely wrong, of course. Perhaps I'm missing something, and there's a simple way to do this? I tried to look at existing extensions implementing joins through CustomScan, but there are not that many, and I haven't found any good solution. I also tried reading through the ~2014 threads related to CustomScan, and how it got modified to allow joins. But I don't see this discussed there either. It seems to me the CustomScan received the minimum amount of "tweaks" to allow joins, but it's not very practical. I realize ForeignScan supports joins in a very similar way (i.e. you leave scanrelid=0, etc.). But I think there's a difference - the foreign join code is not really supposed to build the tuples, it gets the "formed" tuples from somewhere else, more or less. For example postgres_fdw deparses a query, sends it somewhere, that other node does the actual join, builds the tuple from inner/outer, and sends it back. The postgres_fdw code does not need to worry about mapping the target list to inner/outer etc. This does not work for CustomScan I think (unless it's doing the same sort of query offloading). I did ask for suggestions on Discord what's the right way to do this with CustomScan and joins, and the response was that this may not have been thought through very carefully, and some improvements may be necessary to make it more convenient. So I decided to give this a try. The way I see it, most of the issues stem from grafting joins onto an interface that's designed for scans. The whole sequence of custom nodes: CustomPath -> CustomScan -> CustomScanState is based on scans. CustomPath "inherits" from Path, CustomScan from "Scan", CustomScanState from "ScanState". It's not clear to me how to make this work with "JoinPath", "JoinPlan" and "JoinState" in a reasonable way. I suppose some of the "join" data can be stashed in the private fields, of the structs, but then the various planner/executor parts need to know about that in some way. How else would setrefs do the right thing with translating the targetlist into inner/outer references? (Maybe it could be done in PlanCustomPath, but it seems too early?) The "proper" way seems to be to have separate nodes for joins: CustomJoinPath -> CustomJoin -> CustomJoinState The attached PoC patch does that (except for the CustomJoinPath, it can do with CustomPath for now). It's more or less a copy-paste adjusting all the places modifying all the places with "case CustomScan" to also deal with "CustomJoin" - either in the same way, or sometimes in a way that works for joins. It's fairly mechanical. With this patch, my custom join can simply do econtext->ecxt_outertuple = outer; econtext->ecxt_innertuple = inner; return ExecProject(node->js.ps.ps_ProjInfo); and it works. One thing that surprised me a bit is that there's no testing extension implementing a simple custom scan/join. So it's hard to show this :-( I'm sure there's places that need more work (some of which are marked with FIXME, and I probably missed some). But it surprised me how small the patch is - most of it is the mechanical adjustments of switches. It would get a bit larger, e.g. due to sgml docs (which the PoC patch does not update). The patch also renames a couple structs to have "Scan" in them, e.g. CustomExecMethods are now CustomScanExecMethods. This is necessary because the methods get the "plan state" of the particular type (i.e. CustomScanState vs. CustomJoinState), etc. I guess we could do with some "shared" state, but it seems like a recipe for confusion, and I'm not sure it'd remove that much "code" anyway. So that's what I have for now. Note: I mentioned some extensions implementing SmoothScan/G-join. I plan to publish those once I polish that a bit more. It's more a research rather than something ready to use right now. regards [1] https://scholar.harvard.edu/files/stratos/files/smooth_vldbj.pdf [2] https://dl.gi.de/server/api/core/bitstreams/ce8e3fab-0bac-45fc-a6d4-66edaa52d574/content -- Tomas Vondra
Вложения
В списке pgsql-hackers по дате отправления: