Обсуждение: Custom Scans and private data
(sending again, forgot to cc hackers, sorry for the duplicate) Hi, I'm trying to use the custom scan API to replace code that currently does everything via hooks and isn't safe against copyObject() (Yes, that's not a grand plan). To me dealing with CustomScan->custom_private seems to be extraordinarily painful when dealing with nontrivial data structures. And such aren't all that unlikely when dealing with a custom scan over something more complex. Now, custom scans likely modeled private data after FDWs. But it's already rather painful there (in fact it's the one thing I heard people complain about repeatedly besides the inability to push down operations). Just look at the ugly hangups postgres_fdw goes through to pass data around - storing it in lists with indexes determining the content and such. That kinda somewhat works if you only integers and such need to be stored, but if you have more complex data it really is a PITA. The best alternatives I found are a) to serialize data into a string, base64 or so. b) use a Const node over a bytea datum. It seems rather absurd having to go through deserialization at every execution. Since we already have CustomScan->methods, it seems to be rather reasonable to have a CopyCustomScan callback and let it do the copying of the private data if present? Or possibly of the whole node, which'd allow to embed it into a bigger node? I looked at pg-strom and it seems to go through rather ugly procedures to deal with the problem: https://github.com/pg-strom/devel/blob/master/src/gpujoin.c form_gpujoin_info/deform_gpujoin_info. What's the advantage of the current model? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > Since we already have CustomScan->methods, it seems to be rather > reasonable to have a CopyCustomScan callback and let it do the copying > of the private data if present? Or possibly of the whole node, which'd > allow to embed it into a bigger node? Weren't there rumblings of requiring plan trees to be deserializable/ reserializable, not just copiable, so that they could be passed to background workers? Not that I'm particularly in favor of that, but if you're going to go in the direction of allowing private data formats to be copied then I think you're likely to have to address the other thing. In any case, since this convention already exists for FDWs I'm not sure why we should make it different for CustomScan. regards, tom lane
On 2015-08-25 14:42:32 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > Since we already have CustomScan->methods, it seems to be rather > > reasonable to have a CopyCustomScan callback and let it do the copying > > of the private data if present? Or possibly of the whole node, which'd > > allow to embed it into a bigger node? > > Weren't there rumblings of requiring plan trees to be deserializable/ > reserializable, not just copiable, so that they could be passed to > background workers? Yes, I do recall that as well. that's going to require a good bit of independent stuff - currently there's callbacks as pointers in nodes; that's obviously not going to fly well across processes. Additionally custom scan already has a TextOutCustomScan callback, even if it's currently probably intended for debugging. I rather doubt that adding out/readfuncs without the ability to do something about what exactly is read in is going to work well. But I admit I'm not too sure about it. > In any case, since this convention already exists for FDWs I'm not > sure why we should make it different for CustomScan. I think it was a noticeable mistake in the fdw case, but we already released with that. We shouldn't make the same mistake twice. Looking at postgres_fdw and the pg-strom example linked upthread imo pretty clearly shows how ugly this is. There's also the rather noticeable difference that we already have callbacks in the node for custom scans (used by outfuncs), making this rather trivial to add. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2015-08-25 14:42:32 -0400, Tom Lane wrote: >> In any case, since this convention already exists for FDWs I'm not >> sure why we should make it different for CustomScan. > I think it was a noticeable mistake in the fdw case, but we already > released with that. We shouldn't make the same mistake twice. I don't agree that it was a mistake, and I do think there is value in consistency. In the case at hand, it would not be too hard to provide some utility functions for some common cases; for instance, if you want to just store a struct, we could offer convenience functions to wrap that in a bytea constant and unwrap it again. Those could be useful for both FDWs and custom scans. (The bigger picture here is that we always intended to offer a bunch of support functions to make writing FDWs easier, once we'd figured out what made sense. The fact that we haven't done that work yet doesn't make it a bad approach. Nor does "shove it all into some callbacks" mean that the callbacks will be easy to write.) > Looking at > postgres_fdw and the pg-strom example linked upthread imo pretty clearly > shows how ugly this is. There's also the rather noticeable difference > that we already have callbacks in the node for custom scans (used by > outfuncs), making this rather trivial to add. I will manfully refrain from taking that bait. regards, tom lane
> On 2015-08-25 14:42:32 -0400, Tom Lane wrote: > > Andres Freund <andres@anarazel.de> writes: > > > Since we already have CustomScan->methods, it seems to be rather > > > reasonable to have a CopyCustomScan callback and let it do the copying > > > of the private data if present? Or possibly of the whole node, which'd > > > allow to embed it into a bigger node? > > > > Weren't there rumblings of requiring plan trees to be deserializable/ > > reserializable, not just copiable, so that they could be passed to > > background workers? > > Yes, I do recall that as well. that's going to require a good bit of > independent stuff - currently there's callbacks as pointers in nodes; > that's obviously not going to fly well across processes. Additionally > custom scan already has a TextOutCustomScan callback, even if it's > currently probably intended for debugging. > > I rather doubt that adding out/readfuncs without the ability to do > something about what exactly is read in is going to work well. But I > admit I'm not too sure about it. > > > In any case, since this convention already exists for FDWs I'm not > > sure why we should make it different for CustomScan. > > I think it was a noticeable mistake in the fdw case, but we already > released with that. We shouldn't make the same mistake twice. Looking at > postgres_fdw and the pg-strom example linked upthread imo pretty clearly > shows how ugly this is. There's also the rather noticeable difference > that we already have callbacks in the node for custom scans (used by > outfuncs), making this rather trivial to add. > As Tom pointed out, the primary reason why CustomScan required provider to save its private data on custom_exprs/custom_private were awareness of copyObject(). In addition, custom_exprs are expected to link expression node to be processed in setrefs.c and subselect.c because core implementation cannot know which type of data is managed in private. Do you concern about custom_private only? Even if we have extra callbacks like CopyObjectCustomScan() and TextReadCustomScan(), how do we care about the situation when core implementation needs to know the location of expression nodes? Is custom_exprs retained as is? In the earlier version of CustomScan interface had individual callbacks on setrefs.c and subselect.c, however, its structure depended on the current implementation too much, then we moved to the implementation to have two individual private fields rather than callbacks on outfuncs.c. On the other hands, I'm inclined to think TextOutCustomScan() might be a misdesign to support serialize/deserialize via readfuncs.c. http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F80111D04F@BPXM15GP.gisp.nec.co.jp I think it shall be deprecated rather then enhancement. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015-08-25 19:22:04 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I think it was a noticeable mistake in the fdw case, but we already > > released with that. We shouldn't make the same mistake twice. > > I don't agree that it was a mistake, and I do think there is value in > consistency. Having to write ad-hoc serialization code in a different way in each fdw seems like a bad idea to me. The output of that serialization is rather hard to read since there's no names and such, and it's quite inefficient. I have a hard time seing the benefits. > In the case at hand, it would not be too hard to provide some utility > functions for some common cases; for instance, if you want to just store > a struct, we could offer convenience functions to wrap that in a bytea > constant and unwrap it again. Those could be useful for both FDWs and > custom scans. Yes, that'd already be an improvement. Right now it seems the best way to achieve that is to either base64 the output into a Value node or use bytea inside a Const node. Alternatively you can make it a list of ints but that obviously isn't a grand idea. But imo that's not really good enough as soon as you have more than a single struct. For anything but that you'll have to write serialization/deserialization code again. Additionally you're in many cases forced to always use the serialized representation, even if there's no copyObject() inbetween, since you don't have a proper way to store non-serialized information! Check out serializePlanData in https://github.com/laurenz/oracle_fdw/blob/master/oracle_fdw.c - that's something we shouldn't put people through. Actually, *especially* if we at some point want to ship plans over the wire, that kind of serialization is a bad idea - it's very hard to have any sort of checks. out/readfuncs.c type serialization will have field names, but you can't do to the equivalent from these interfaces. > Nor does "shove it all into some callbacks" mean that the callbacks > will be easy to write.) Being able to only perform copying/serialization when necessary, storing the data in a non-serialized manner, seem pretty clear improvements. I think it's important to be able to easily store core data in fields, but that's just a copyObject() call away. > > Looking at > > postgres_fdw and the pg-strom example linked upthread imo pretty clearly > > shows how ugly this is. There's also the rather noticeable difference > > that we already have callbacks in the node for custom scans (used by > > outfuncs), making this rather trivial to add. > > I will manfully refrain from taking that bait. While that comment made me laugh, I'm not really sure why my examples are bait. One is the probably most used fdw, the other the only user of the custom scan interface I know of. Greetings, Andres Freund
On 2015-08-26 00:55:48 +0000, Kouhei Kaigai wrote: > As Tom pointed out, the primary reason why CustomScan required provider > to save its private data on custom_exprs/custom_private were awareness > of copyObject(). Well, a callback brings that with it as well. I do think it makes sense to *allow* not to have a callback and rely on copyObject() to do the work. > In addition, custom_exprs are expected to link expression node to be > processed in setrefs.c and subselect.c because core implementation > cannot know which type of data is managed in private. Right. > Do you concern about custom_private only? Yes, pretty much. There's very little chance you can expand the expression tree with custom expressions and survive the experience. So custom_exprs etc. makes sense. > Even if we have extra > callbacks like CopyObjectCustomScan() and TextReadCustomScan(), > how do we care about the situation when core implementation needs to > know the location of expression nodes? Is custom_exprs retained as is? Yes. > In the earlier version of CustomScan interface had individual > callbacks on setrefs.c and subselect.c, however, its structure > depended on the current implementation too much, then we moved > to the implementation to have two individual private fields > rather than callbacks on outfuncs.c. I agree with that choice. > On the other hands, I'm inclined to think TextOutCustomScan() might > be a misdesign to support serialize/deserialize via readfuncs.c. > http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F80111D04F@BPXM15GP.gisp.nec.co.jp > I think it shall be deprecated rather then enhancement. Well, right now there's no support for reading/writing plans at all. But if we add it, TextOutCustomScan() seems like a small problem in comparison to others. CustomScan contains pointers, that's definitely not something you can ship over the wire and expect to work. We'll probably have to store a soname + function name instead. More generally I rather doubt that it'll always make sense to serialize/deserialize in a generic manner between different backends. It very well can make sense to refer to backend-local state in a plan - you need to be able to take that into account upon serialization/deserialization. Consider e.g. a connection id for an FDW that uses pooling. Greetings, Andres Freund
> On 2015-08-26 00:55:48 +0000, Kouhei Kaigai wrote: > > As Tom pointed out, the primary reason why CustomScan required provider > > to save its private data on custom_exprs/custom_private were awareness > > of copyObject(). > > Well, a callback brings that with it as well. I do think it makes sense > to *allow* not to have a callback and rely on copyObject() to do the > work. > > > In addition, custom_exprs are expected to link expression node to be > > processed in setrefs.c and subselect.c because core implementation > > cannot know which type of data is managed in private. > > Right. > > > Do you concern about custom_private only? > > Yes, pretty much. There's very little chance you can expand the > expression tree with custom expressions and survive the experience. So > custom_exprs etc. makes sense. > I can understand your concern, however, I'm not certain which is the best way to do. Regarding of CustomPath and CustomScanState, we expect developer defines its own data structure delivered from these types, like: typedef struct { CustomPath cpath; List *host_quals; /* RestrictInfo run on host */ List *dev_quals; /* RestrictInfo run on device */ } GpuScanPath; On the other hand, CustomScan node prohibit to have these manner because of copyObject() awareness. If we can have similar design, it looks to me harmonious design. Here is my random thought to realize it, even though current specification require CustomScan to follow existing copyObject() manner. - _copyCustomScan() needs to allow to allocate larger than sizeof(CustomScan), according to the requirement by custom-scanprovider. - We may need to have a utility function to copy the common part. It is not preferable to implement by custom-scan provideritself. - For upcoming readfuncs.c support, I don't want to have READ_XXX_FIELD() macro on the extension side. Even though pg_strtok()is a public function, _readBitmapset() is static function. - Is custom_private deprecated? Also, do we force to have CopyObjectCustomScan() and potentially TextReadCustomScan()? Iwant to keep custom_private as-is, because of very simple custom-scan provider which has at most several primitive valuesas private one. > > Even if we have extra > > callbacks like CopyObjectCustomScan() and TextReadCustomScan(), > > how do we care about the situation when core implementation needs to > > know the location of expression nodes? Is custom_exprs retained as is? > > Yes. > > > In the earlier version of CustomScan interface had individual > > callbacks on setrefs.c and subselect.c, however, its structure > > depended on the current implementation too much, then we moved > > to the implementation to have two individual private fields > > rather than callbacks on outfuncs.c. > > I agree with that choice. > > > On the other hands, I'm inclined to think TextOutCustomScan() might > > be a misdesign to support serialize/deserialize via readfuncs.c. > > > http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F80111D04 > F@BPXM15GP.gisp.nec.co.jp > > I think it shall be deprecated rather then enhancement. > > Well, right now there's no support for reading/writing plans at all. But > if we add it, TextOutCustomScan() seems like a small problem in > comparison to others. CustomScan contains pointers, that's definitely > not something you can ship over the wire and expect to work. We'll > probably have to store a soname + function name instead. > It may not be a long future. The ParallelAppend feature, I've discussed in another thread, needs capability of plan tree serialization, and under the development: https://github.com/kaigai/sepgsql/blob/fappend/src/backend/nodes/readfuncs.c#L1506 I thought we may need to define a new DDL to associate a particular custom- scan name with a set of callback pointer table, using persistent system catalog. Your idea, soname + symbol name, may be a solution. > More generally I rather doubt that it'll always make sense to > serialize/deserialize in a generic manner between different backends. It > very well can make sense to refer to backend-local state in a plan - you > need to be able to take that into account upon > serialization/deserialization. Consider e.g. a connection id for an FDW > that uses pooling. > +1. I also prefer more generic mechanism to serialize/deserialize, rather than custom-scan specific. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015-08-26 23:55:16 +0000, Kouhei Kaigai wrote: > - _copyCustomScan() needs to allow to allocate larger than sizeof(CustomScan), > according to the requirement by custom-scan provider. I think it's somewhat nice to allow larger objects, but I don't think it's absolutely necessary. > - We may need to have a utility function to copy the common part. > It is not preferable to implement by custom-scan provider itself. I think that could be done by having the CopyCustomScan callback simply just returning a new node, *without* filling out those fields. > - For upcoming readfuncs.c support, I don't want to have READ_XXX_FIELD() > macro on the extension side. Even though pg_strtok() is a public function, > _readBitmapset() is static function. Yea, the general copying code for other node types should reusable for that. Invoke parseNodeString() should be usable. > - Is custom_private deprecated? Also, do we force to have CopyObjectCustomScan() > and potentially TextReadCustomScan()? I'd either remove it, or add a second void * private field which has to be copied by CopyObjectCustomScan() while leaving the responsibility to custom_private. I honestly don't see too much value in keeping it. > It may not be a long future. The ParallelAppend feature, I've discussed > in another thread, needs capability of plan tree serialization, and under > the development: Yes, I skimmed through those discussions. Greetings, Andres Freund
On Wed, Aug 26, 2015 at 11:23 PM, Andres Freund <andres@anarazel.de> wrote: >> > Looking at >> > postgres_fdw and the pg-strom example linked upthread imo pretty clearly >> > shows how ugly this is. There's also the rather noticeable difference >> > that we already have callbacks in the node for custom scans (used by >> > outfuncs), making this rather trivial to add. >> >> I will manfully refrain from taking that bait. > > While that comment made me laugh, I'm not really sure why my examples > are bait. One is the probably most used fdw, the other the only user of > the custom scan interface I know of. I suspect what Tom is getting at is that he thinks the custom scan interface is a giant piece of dung which I ought not to have committed unless somebody was holding a gun to my head at time time, and that putting function pointers in the structure at all was a particularly poor design decision on par with the Roman decision to make their water pipes out of a metal which is toxic if ingested. He and I are going to have to agree to disagree on that. I think it's great that you are experimenting with the feature, and I think we ought to try to make it better, even if that requires incompatible changes to the API. When we originally discussed this topic at a previous PGCon developer meeting, it was suggested that we might want to have a facility for custom nodes, and I at least had the impression that this might be separate from particular facilities we might have for custom paths or custom plans or custom plan-states. For example, suppose you can do this: void InitCustomNodeAndres(CustomNodeTemplate *tmpl) { tmpl->outfunc = _outAndres; tmpl->copyfunc = _copyAndres; tmpl->equalfunc = _equalAndres; tmpl->readfunc = _readAndres; } void _PG_init() { RegisterCustomNodeType("Andres", "andres", "InitCustomNodeAndres"); } ... Andres *andres = makeCustomNode(Andres); If we had something like that, then you could address your use case by making the structures you want to pass around be nodes rather than having to invent a way to smash whatever you have into a binary blob. That would be pretty cool. Maybe, with a bit of work, we could even find a way to get rid of the idea of custom-path, custom-plan, and custom scan-state as node types in their own right and instead have some trick where they can just be general custom nodes that have one or two extra methods defined for the stuff that is specific to the planner and executor. Not sure exactly how to set that up off-hand, but maybe there's a good way. In general, I think we've made extensibility very hard in areas like this. There are a whole bunch of thorny problems here that are interconnected. For example, suppose you want to add a new kind of SQL-visible object to the system, something EnterpriseDB has needed to do repeatedly over the years. You have to modify like 40 core source files. You need new parser rules: but the parser is not extensible. You need new keywords: but the keyword table is not extensible. To represent the results of parsing, you need new parse nodes: but the node system is not extensible. You need to classify your statement as DML for logging purposes, you need to hook it into the event trigger system, you need to hook it into objectaddress.c, you need a new command tag, you need to add your new command to the appropriate place in ProcessUtility. You can sort of get control in ProcessUtility from a hook, but it's awkward, and the others can't be handled at all. So you just have to hack core. It would be nice to fully solve this problem so that some of the EDB stuff could exist as an extension rather than a modification of the core code, but given the magnitude of the problem I don't expect to get there any time soon. Still, I think making core facilities more accessible to extensions has a ton of merit. FDWs, background workers, dynamic shared memory, logical decoding plugins, DDL deparsing, and, yeah, even custom scans are all relatively recent examples of new facilities we've been careful to make usable outside core, and there is, I think, a good deal of evidence that every one of those facilities has gotten use by developers other than the ones who installed the core support, which I personally find really satisfying. I predict that if we take steps to make the node system more extensible, we will find that that capability gets plenty of use, including some uses we can't now predict. And even more broadly: I think the time has come to get really skeptical about every place in our code where there's a big giant switch. All of those represent places where non-core code need not apply. And all of them represent places where, instead of each module registering itself with all the correct subsystems, each subsystem knows about all of the things that plug into it. Whoever designed the shared-memory initialization mechanism was smart: it knows very little itself, and relies on its clients to tell it what they need. There are some things I don't like about that mechanism, but at least we got that part right, and so extensions can use it. A lot of other places, unfortunately, took the simpler approach of just hard-coding everything. There are some places where performance or other concerns justify that approach, but there are a lot of places where I think another approach would be better. EDB is far from the only company that would benefit from making more of these subsystems extensible. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Aug 26, 2015 at 11:23 PM, Andres Freund <andres@anarazel.de> wrote: >> While that comment made me laugh, I'm not really sure why my examples >> are bait. One is the probably most used fdw, the other the only user of >> the custom scan interface I know of. > I suspect what Tom is getting at is that he thinks the custom scan > interface is ... [ROTFL] No, actually, I'm not interested in fighting that battle. But I do think that letting custom-scan extensions fool about with the semantics of plan-copying (and plan-serialization for that matter) is a risky choice that is not justified by some marginal gains in code readability, especially when there are other feasible ways to attack the readability problem. copyObject needs to be a pretty self-contained operation with not a lot of external dependencies, and I'm afraid there would be too much temptation for extensions to take shortcuts that would fail later. > [ ruminations about how to improve the system's extensibility ] Yeah, I've entertained similar thoughts in the past. It's not very clear how we get there from here, though --- for instance, any fundamental change in the Node system would break so much code that it's unlikely anyone would thank us for it, even assuming we ever finished the conversion project. I'm also less than convinced that "I should be able to install major new features without touching the core code" is actually a sane or useful goal to have. It sounds great in the abstract, but long ago (at HP) I worked on systems that were actually built to be able to do that ... and I can tell you that the tradeoffs you have to make to do that are not very pleasant. As a concrete example, you mentioned the lack of extensibility of the bison parser, which is undoubtedly a blocking issue if you want a new feature that would require new SQL syntax. While bison is certainly a PITA in some ways, it has one saving grace that I think a lot of people underappreciate: if it compiles your grammar, the grammar is (probably) unambiguous and has no unreachable elements. The extensible parser we had at HP was, um, considerably less capable of detecting conflicts or ambiguity. Maybe we just didn't know how to do it, but I suspect that that is a fundamental tradeoff when you have incomplete information. I don't want to sound too negative --- there are probably improvements we can make in this area. But I'm skeptical of sticking in hooks that haven't been pretty carefully thought through. We've done a lot of that in the past, and I think the net result is that we have a lot of useless hooks, or at least hooks that can't be used in a way that we don't break every release or two. regards, tom lane
Hi, On 2015-08-27 18:59:13 -0400, Tom Lane wrote: > But I do think that letting custom-scan extensions fool about with > the semantics of plan-copying (and plan-serialization for that matter) > is a risky choice that is not justified by some marginal gains in code > readability To me the likelihood of having bugs in manual (de)serilization via lists et al. is higher than introducing bugs via copyfuncs. Or even out/readfuncs, especially as we could easily add a bunch of verifications to the latter by verifying field names in READ_*. Wether a prepared statement fails because a copyfuncs hook failed, or whether it's in the fdw itself doesn't seem to make that much of a difference. > especially when there are other feasible ways to attack the > readability problem. Lets at least add a Value variant that can store strings that aren't null terminated, that'd already make things a lot better. Right now you need to reuse Const nodes and stuff a bytea in there or such. But even then, that only works if you have a single flat datastructure that doesn't have any pointers in it. copyObject() style copying can easily cope with that, using a binary blob really can't. As far as I can see you right now basically need to write code for custom scans so that between planning and execution you unconditionally do: Planning: 1) create your internal data 2) serialize data, Execution: 3) unserialize data 4) use unserialized data There's no way to stash data between 2 and 3 anywhere, even if there's no need for the whole copying rigamarole because it was just a plain statement. If you look at the various *Scan, *Join etc. nodes we have in core, most of them have a bunch of variable length data structures and pointers in them. All that you can't sanely do for a custom scan node. > copyObject needs to be a pretty self-contained operation with not a > lot of external dependencies Why? > > [ ruminations about how to improve the system's extensibility ] > > Yeah, I've entertained similar thoughts in the past. It's not very clear > how we get there from here, though --- for instance, any fundamental > change in the Node system would break so much code that it's unlikely > anyone would thank us for it, even assuming we ever finished the > conversion project. How about adding a single 'ExtensibleNode' node type (inheriting from Node as normal). Many of the switches over node types would gain one additional entry for that, and do something like case T_ExtensibleNode: handler = LookupHandler((ExtensibleNode *)node)->extended_type, Handler_ExecInitNode); if (handler) { result = ((ExecInitNodeHandler *) handler)(node,estate, eflags); break; } /* fall through to error */ default: elog(ERROR, "unrecognizednode type: %d", (int) nodeTag(node)); that should be doable one-by-one in many of these bigger switches. There's obviously some limits where that's doable, but I don't think it'd break much code? How exactly to identify extended_type in a way that makes sense e.g. between restarting pg, backends, primary/standby isn't trivial, but I do think it should be doable. > I'm also less than convinced that "I should be able to install major new > features without touching the core code" is actually a sane or useful goal > to have. Yea, I have some doubts there as well. There's a lot of features though which we really don't want in core, where some of the limitations of the node system really do become problematic. > As a concrete example, you mentioned the lack of extensibility of the > bison parser, which is undoubtedly a blocking issue if you want a new > feature that would require new SQL syntax. While that's a problem, where I really don't have any good answer, I also think in many cases it's primarily DDL, and primarily extending existing DDL. By far the most things I've seen want to add informations to tables and/or columns - and storage options actually kinda give you that on the grammar level. We just don't allow you too sanely hook into it, and there's a bunch of other pointless restrictions. Greetings, Andres Freund
On 2015-08-27 23:23:54 +0100, Robert Haas wrote: > I think it's great that you are experimenting with the feature, and I > think we ought to try to make it better, even if that requires > incompatible changes to the API. Yea, I think it's too early to consider this API stable at all. That's imo just normal for moving into making something new extensible imo. > When we originally discussed this topic at a previous PGCon developer > meeting, it was suggested that we might want to have a facility for > custom nodes, and I at least had the impression that this might be > separate from particular facilities we might have for custom paths or > custom plans or custom plan-states. I think we can actually use Custom* to testdrive concept we might want to use more widely at a later stage. > For example, suppose you can do this: > > void > InitCustomNodeAndres(CustomNodeTemplate *tmpl) > { > tmpl->outfunc = _outAndres; > tmpl->copyfunc = _copyAndres; > tmpl->equalfunc = _equalAndres; > tmpl->readfunc = _readAndres; > } > > void > _PG_init() > { > RegisterCustomNodeType("Andres", "andres", "InitCustomNodeAndres"); > } > > ... > > Andres *andres = makeCustomNode(Andres); > > If we had something like that, then you could address your use case by > making the structures you want to pass around be nodes rather than > having to invent a way to smash whatever you have into a binary blob. Yes, that'd be pretty helpful already. On the other hand it'd still not make some of the CustomScan members superflous - I do think exprs/tlist make a air amount of sense to handle setrefs.c et al. > In general, I think we've made extensibility very hard in areas like > this. Well, a lot of it is just *hard* to make extensible. If we pay too much attention towards extensibility nobody is going to want to extend things anymore, because postgres will move to slow :(. Also I do think we have to be careful about not making it too "cheap" to have significant features outside of core. If you look at mysql the massive fragmentation around storage engines really has cost them a lot. > There are a whole bunch of thorny problems here that are > interconnected. For example, suppose you want to add a new kind of > SQL-visible object to the system, something EnterpriseDB has needed to > do repeatedly over the years. You have to modify like 40 core source > files. FWIW, I think this is also a problem for core code. It's very easy to forget individual pieces even if you're diligent and know what you're doing. I think there's a bunch of things that could massively make that easier, without actually costing that much. Just from the top of my head: 1) Allow to add additional syscaches at runtime, including emitting relevant invalidations 2) Allow to register a callback from dependency.c, add some generic output to objectaddress.c 3) Make relation/attribute reloptions actually extensible With these three you can actually add new catalogs and database in a fairly sane manner, even if the user interface isn't the prettiest. But I do think that relation options + utility hooks that turn "generic" DDL into the normal DDL & your custom action can get you pretty far. But having to write your own caches, trigger that emit sys/relcache invalidations is painful. Handling dependencies in a meaningful manner is, to my knowledge, impossible. Unless you want to default to CASCADE and do stuff in the drop event trigger... If WITH (...) or security labels are too ugly or crummy I think you also can get rather far using functions. Good, because solving > You need new parser rules: but the parser is not extensible. > You need new keywords: but the keyword table is not extensible. without massive performance and/or maintainability regressions seems hard. Greetings, Andres Freund