Обсуждение: Parallelized polymorphic aggs, and aggtype vs aggoutputtype
I believe I've narrowed down the cause of this failure [1]: regression=# explain SELECT min(col) FROM enumtest; ERROR: type matched to anyenum is not an enum type: anyenum The core reason for this seems to be that apply_partialaggref_adjustment fails to consider the possibility that it's dealing with a polymorphic transtype. It will happily label the partial aggregate with a polymorphic aggoutputtype, which is broken in itself (since we won't know the properties of the output type for purposes such as datum-copying). But the above error comes out before we have any opportunity to crash from that, because we then end up with the upper-level combining aggregate having an input that is shown as being of polymorphic type, which means it can't resolve what its transtype is supposed to be. The business about having a (newly added) separate aggserialtype makes this doubly dangerous, because if that applies we would be showing the input to the combining aggregate as being of the serial type, which might have *nothing* to do with the original input type, thereby breaking polymorphic transtype resolution even further. But whether we're using that or just the transtype, showing the input to the combining aggregate as being of some type other than the original input is likely to break any agg implementation function that uses get_call_expr_argtype or similar to figure out what type it's supposed to be working on. We might need to fix this by adding the resolved transtype to Aggref nodes, rather than trying to re-resolve it at executor start. Some tracing I did on the way to identifying this bug suggests that that would save a fair amount of planner and executor-start overhead, because we are calling get_aggregate_argtypes and thereby enforce_generic_type_consistency half a dozen times per aggregate, not to mention resolve_aggregate_transtype itself. However, before trying to fix any of that, I'd like to demand an explanation as to why aggoutputtype exists at all. It seems incredibly confusing to have both that and aggtype, and certainly the code comments give no hint of what the semantics are supposed to be when those fields are different. I think aggoutputtype needs to go away again. regards, tom lane [1] https://www.postgresql.org/message-id/22782.1466100870@sss.pgh.pa.us
On Thu, Jun 16, 2016 at 4:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > However, before trying to fix any of that, I'd like to demand an > explanation as to why aggoutputtype exists at all. It seems incredibly > confusing to have both that and aggtype, and certainly the code comments > give no hint of what the semantics are supposed to be when those fields > are different. I think aggoutputtype needs to go away again. fix_combine_agg_expr, or more search_indexed_tlist_for_partial_aggref, needs to be able to match a finalize-aggregate node type to a partial-aggregate node type beneath it. aggtype is one of the fields that gets compared as part of that process, and it won't be the same if you make aggtype mean the result of that particular node rather than the result of the aggregate. nodeAgg.c also uses aggtype for matching purposes; not sure if there is a similar problem there or not. On the other hand, exprType() needs to return the actual output type of that particular node, which is given by aggoutputtype. I think that requirement is non-negotiable, so I suspect the only way to get by without two fields is if all the stuff for which aggtype is being used is inessential; in that case, we could remove it and then rename aggoutputtype back to aggtype. Thanks for looking at this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jun 16, 2016 at 4:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> However, before trying to fix any of that, I'd like to demand an >> explanation as to why aggoutputtype exists at all. It seems incredibly >> confusing to have both that and aggtype, and certainly the code comments >> give no hint of what the semantics are supposed to be when those fields >> are different. I think aggoutputtype needs to go away again. > fix_combine_agg_expr, or more search_indexed_tlist_for_partial_aggref, > needs to be able to match a finalize-aggregate node type to a > partial-aggregate node type beneath it. Meh. I think this is probably telling us that trying to repurpose Aggref as the representation of a partial aggregate may have been mistaken. Or maybe just that fix_combine_agg_expr was a bad idea. It seems likely to me that that could have been dispensed with altogether in return for slightly more work in create_grouping_paths, for instance transforming the upper-level Aggrefs into something that looks more likeAggref(PartialAggref(original-arguments)) whereupon the pre-existing match logic should be sufficient to replace the "PartialAggref(original-arguments)" subtree with a suitable Var in the upper aggregation plan node. > aggtype is one of the fields > that gets compared as part of that process, and it won't be the same > if you make aggtype mean the result of that particular node rather > than the result of the aggregate. nodeAgg.c also uses aggtype for > matching purposes; not sure if there is a similar problem there or > not. AFAICS, nodeAgg.c's check on that is not logically necessary: if you have identical inputs and the same aggregate function then you should certainly expect the same output type. The only reason we're making it is that the code originally used equal() to detect identical aggregates, and somebody slavishly copied all the comparisons when splitting up that test so as to distinguish "identical inputs" from "identical aggregates". I'll reserve judgement on whether search_indexed_tlist_for_partial_aggref has any idea what it's doing in this regard. regards, tom lane
On Thu, Jun 16, 2016 at 6:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Meh. I think this is probably telling us that trying to repurpose Aggref > as the representation of a partial aggregate may have been mistaken. > Or maybe just that fix_combine_agg_expr was a bad idea. It seems likely > to me that that could have been dispensed with altogether in return for > slightly more work in create_grouping_paths, for instance transforming > the upper-level Aggrefs into something that looks more like > Aggref(PartialAggref(original-arguments)) > whereupon the pre-existing match logic should be sufficient to replace > the "PartialAggref(original-arguments)" subtree with a suitable Var > in the upper aggregation plan node. I don't mind if you feel the need to refactor this. In David's original patch, he had a PartialAggref node that basically acted as a wrapper for an Aggref node, effectively behaving like a flag on the underlying Aggref. I thought that was ugly and argued for including the required information in the Aggref itself. Now what you are proposing here is a bit different and it may work better, but there are tricky things like how it all deparses in the EXPLAIN output - you may recall that I fixed a problem in that area a while back. We're trying to represent a concept that doesn't exist at the SQL level, and that makes things a bit complicated: if the Aggref's input is a PartialAggref, will the deparse code be able to match that up to the correct catalog entry? But if you've got a good idea, have at it. >> aggtype is one of the fields >> that gets compared as part of that process, and it won't be the same >> if you make aggtype mean the result of that particular node rather >> than the result of the aggregate. nodeAgg.c also uses aggtype for >> matching purposes; not sure if there is a similar problem there or >> not. > > AFAICS, nodeAgg.c's check on that is not logically necessary: if you have > identical inputs and the same aggregate function then you should certainly > expect the same output type. The only reason we're making it is that the > code originally used equal() to detect identical aggregates, and somebody > slavishly copied all the comparisons when splitting up that test so as to > distinguish "identical inputs" from "identical aggregates". I'll reserve > judgement on whether search_indexed_tlist_for_partial_aggref has any idea > what it's doing in this regard. Sure, it's all cargo-cult programming. I don't want to presume to read David's mind, but I think we were both thinking that minimizing the amount of tinkering that we did would cut down on the likelihood of bugs. Now, you've found a bug that got through, so we can either double down on the design we've already picked, or we can change it. I'm not set on one approach or the other; I thought what we ended up was fairly reasonable given the bloated mess that is struct Aggref, but I've got no deep attachment to it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jun 16, 2016 at 6:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Meh. I think this is probably telling us that trying to repurpose Aggref >> as the representation of a partial aggregate may have been mistaken. > I don't mind if you feel the need to refactor this. I'm not sure yet. What I think we need to work out first is exactly how polymorphic parallel aggregates ought to identify which concrete data types they are using. There were well-defined rules before, but how do we adapt those to two levels of aggregate evaluation? Are we hard-wiring an assumption that the aggregate transtype is the same at both levels? What assumptions do we want to make about the relationship of the data transfer type (the lower level's output type) to the polymorphic input and trans types? A key point here is that the way that polymorphic aggregate support functions used to find out what they were doing is that we'd set up dummy FuncExprs that would satisfy get_call_expr_argtype(). That relied on the fact that the transfn and finalfn calls were specified in a way that corresponded to legal SQL function calls. It's not clear to me whether that statement still holds for parallelized aggregates. regards, tom lane
I wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I don't mind if you feel the need to refactor this. > I'm not sure yet. What I think we need to work out first is exactly > how polymorphic parallel aggregates ought to identify which concrete > data types they are using. There were well-defined rules before, > but how do we adapt those to two levels of aggregate evaluation? After reviewing things a bit, it seems like the first design-level issue there is what about polymorphic combine functions. So far as the type system is concerned, there's no issue: the declared signature is always going to be "combine(foo, foo) returns foo" and it doesn't matter whether foo is an ordinary type, a polymorphic one, or INTERNAL. The other question is whether the function itself knows what it's operating on in the current invocation. For regular polymorphic types this seems no different from any other usage. If the transtype is INTERNAL, then the type system provides no help; but we could reasonably assume that the internal representation itself contains as much information as the combine func needs. We could add extra arguments like the "finalfunc_extra" option does for the finalfunc, but I don't really see a need --- that hack is mainly to satisfy the type system that the finalfunc's signature is sane, not to tell the finalfunc something it has no other way to find out. So I think we're probably OK as far as the combine function is concerned. The situation is much direr as far as serialize/deserialize are concerned. These basically don't work at all for polymorphic transtypes: if you try to declare "deserialize(bytea) returns anyarray", the type system won't let you. Perhaps that's not an issue because you shouldn't really need serialize/deserialize for anything except INTERNAL transtype. However, there's a far bigger problem which is that "deserialize(bytea) returns internal" is a blatant security hole, which I see you ignored the defense against in opr_sanity.sql. The claim in the comment there that it's okay if we check for aggregate context is a joke --- or haven't you heard that we allow users to create aggregates? That's not nearly good enough to prevent unsafe usage of such functions. Not to mention that CREATE FUNCTION won't allow creation of such functions, so extensions are locked out of using this feature. This has to be redesigned or else reverted entirely. I'm not going to take no for an answer. A possible solution is to give deserialize an extra dummy argument, along the lines of "deserialize(bytea, internal) returns internal", thereby ensuring it can't be called in any non-system-originated contexts. This is still rather dangerous if the other argument is variable, as somebody might be able to abuse an internal-taking function by naming it as the deserialize function for a maliciously-designed aggregate. What I'm inclined to do to lock it down further is to drop the "serialtype" argument to CREATE AGGREGATE, which seems rather pointless (what else would you ever use besides bytea?). Instead, insist that serialize/deserialize apply *only* when the transtype is INTERNAL, and their signatures are exactly "serialize(internal) returns bytea" and "deserialize(bytea, internal) returns internal", never anything else. A different way of locking things down, which might be cleaner in the long run, is to invent a new pseudo-type for the sole purpose of being the serialization type, that is we'd insist on the signatures being "serialize(internal) returns serialized_internal" and "deserialize(serialized_internal) returns internal", where serialized_internal has the representation properties of bytea but is usable for no other purpose than this. Not sure it's worth the trouble though. Anyway, setting aside the security angle, it doesn't seem like there is anything wrong with the high-level design for polymorphic cases. I'm now thinking the problem I saw is just a garden variety implementation bug (or bugs). I'm still not very happy with the confusion around aggtype though, not least because I suspect it contributed directly to this bug. I also feel that both the code comments and the user-facing documentation for this feature are well below project standards, eg where's the discussion in section 35.10? regards, tom lane
On Fri, Jun 17, 2016 at 1:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> I don't mind if you feel the need to refactor this. > >> I'm not sure yet. What I think we need to work out first is exactly >> how polymorphic parallel aggregates ought to identify which concrete >> data types they are using. There were well-defined rules before, >> but how do we adapt those to two levels of aggregate evaluation? > > After reviewing things a bit, it seems like the first design-level issue > there is what about polymorphic combine functions. So far as the type > system is concerned, there's no issue: the declared signature is always > going to be "combine(foo, foo) returns foo" and it doesn't matter whether > foo is an ordinary type, a polymorphic one, or INTERNAL. The other > question is whether the function itself knows what it's operating on in > the current invocation. For regular polymorphic types this seems no > different from any other usage. If the transtype is INTERNAL, then the > type system provides no help; but we could reasonably assume that the > internal representation itself contains as much information as the combine > func needs. We could add extra arguments like the "finalfunc_extra" > option does for the finalfunc, but I don't really see a need --- that hack > is mainly to satisfy the type system that the finalfunc's signature is > sane, not to tell the finalfunc something it has no other way to find out. > So I think we're probably OK as far as the combine function is concerned. OK.... > The situation is much direr as far as serialize/deserialize are concerned. > These basically don't work at all for polymorphic transtypes: if you try > to declare "deserialize(bytea) returns anyarray", the type system won't > let you. Perhaps that's not an issue because you shouldn't really need > serialize/deserialize for anything except INTERNAL transtype. In fact, I think that it won't even let you declare a serialize or deserialize function for anything other than an INTERNAL transtype right now. There was a PostGIS use case that made it seem like we might want to relax that, but currently I believe that's the law of the land. > However, > there's a far bigger problem which is that "deserialize(bytea) returns > internal" is a blatant security hole, which I see you ignored the defense > against in opr_sanity.sql. The claim in the comment there that it's okay > if we check for aggregate context is a joke --- or haven't you heard that > we allow users to create aggregates? That's not nearly good enough to > prevent unsafe usage of such functions. That may be true, but it isn't obvious to me. The user can't simply say SELECT numeric_avg_deserialize(...) or whatever because the function won't execute when called that way. How would you arrange to call such a function in a context where it wouldn't fail outright but would return a value that you could then hand to some other function that expects a type-internal input? > Not to mention that CREATE > FUNCTION won't allow creation of such functions, so extensions are locked > out of using this feature. Oops. > This has to be redesigned or else reverted entirely. I'm not going to > take no for an answer. Please don't act as if I've been refusing to fix bugs. I've been doing practically nothing else. > A possible solution is to give deserialize an extra dummy argument, along > the lines of "deserialize(bytea, internal) returns internal", thereby > ensuring it can't be called in any non-system-originated contexts. Seems promising. > This > is still rather dangerous if the other argument is variable, as somebody > might be able to abuse an internal-taking function by naming it as the > deserialize function for a maliciously-designed aggregate. The complex and, as far as I can tell, largely undocumented set of rules about what sorts of internal-taking-and-returning functions we allow to exist strikes me as a house of cards just waiting for the next stiff breeze to come along. > What I'm > inclined to do to lock it down further is to drop the "serialtype" > argument to CREATE AGGREGATE, which seems rather pointless (what else > would you ever use besides bytea?). Instead, insist that > serialize/deserialize apply *only* when the transtype is INTERNAL, and > their signatures are exactly "serialize(internal) returns bytea" and > "deserialize(bytea, internal) returns internal", never anything else. I originally had the thought that you might want to make the serialized representation something that could be human-readable. I'd actually like to have an SQL syntax that outputs the transition state or, if it's internal, the serialized representation of it. That would allow us to do partitionwise aggregation given a query like SELECT avg(foo) FROM parent_table_for_several_foreign_tables. You'd need to be able to ship something like SELECT PARTIAL avg(foo) FROM t1 to each child. And it might be nicer if the results came back in some human-readable format rather than as a bytea, but it's not essential, of course. > A different way of locking things down, which might be cleaner in the > long run, is to invent a new pseudo-type for the sole purpose of being > the serialization type, that is we'd insist on the signatures being > "serialize(internal) returns serialized_internal" and > "deserialize(serialized_internal) returns internal", where > serialized_internal has the representation properties of bytea but is > usable for no other purpose than this. Not sure it's worth the trouble > though. I think we should break up internal into various kinds of internal depending on what kind of a thing we've actually got a pointer to. We don't need more synonyms for bytea, IMO. > Anyway, setting aside the security angle, it doesn't seem like there is > anything wrong with the high-level design for polymorphic cases. I'm now > thinking the problem I saw is just a garden variety implementation bug > (or bugs). I'm still not very happy with the confusion around aggtype > though, not least because I suspect it contributed directly to this bug. > I also feel that both the code comments and the user-facing documentation > for this feature are well below project standards, eg where's the > discussion in section 35.10? Good point. That needs to be addressed -- ideally by David, but if he doesn't then I guess I'll have to do it. If you have other specific comments, please send them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Jun 17, 2016 at 1:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> there's a far bigger problem which is that "deserialize(bytea) returns >> internal" is a blatant security hole, which I see you ignored the defense >> against in opr_sanity.sql. The claim in the comment there that it's okay >> if we check for aggregate context is a joke --- or haven't you heard that >> we allow users to create aggregates? That's not nearly good enough to >> prevent unsafe usage of such functions. > That may be true, but it isn't obvious to me. The user can't simply > say SELECT numeric_avg_deserialize(...) or whatever because the > function won't execute when called that way. How would you arrange to > call such a function in a context where it wouldn't fail outright but > would return a value that you could then hand to some other function > that expects a type-internal input? The concern I have is that you could stick it into an aggregate that isn't the one it's expecting to be used in, or into a slot in that aggregate that isn't deserialize(), and the run-time test can't detect either of those things. Now maybe this is locked down sufficiently by the fact that we don't let non-superusers create aggregates with transtype INTERNAL, but that seems pretty shaky to me given the number of moving parts in aggregates these days and the fact that we keep adding more. But whether there's a live security bug there today is really moot, because of: >> Not to mention that CREATE >> FUNCTION won't allow creation of such functions, so extensions are locked >> out of using this feature. > Oops. I think that means we *have* to change this. >> A possible solution is to give deserialize an extra dummy argument, along >> the lines of "deserialize(bytea, internal) returns internal", thereby >> ensuring it can't be called in any non-system-originated contexts. > Seems promising. If nobody has a better idea, I'll pursue that next week or so. It doesn't seem like a must-fix-for-beta2. (Letting it slide means we're locked into an initdb or pg_upgrade again for beta3, but I don't have a lot of hope of avoiding that anyway.) Meanwhile, I'll go see if I can work out exactly what's broken about the polymorphic type-slinging. That I would like to see working in beta2. > I originally had the thought that you might want to make the > serialized representation something that could be human-readable. Perhaps, but on efficiency grounds I doubt that people would want that to be the same API that's used for partial aggregation. I could see adding an additional function slot "prettyprint(internal) returns text" to aggregates for this purpose. In any case, even if we lock down the declared type to be bytea, there's nothing stopping someone from defining their serialization code to output an ASCII string. > I think we should break up internal into various kinds of internal > depending on what kind of a thing we've actually got a pointer to. Not a bad long-term project, but it's not happening in this cycle. I'm not very sure how we'd go about it anyway --- for examples like this, every new user-defined aggregate potentially wants its own flavor of "internal", so how do we manage that? regards, tom lane
Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > I think we should break up internal into various kinds of internal > > depending on what kind of a thing we've actually got a pointer to. > > Not a bad long-term project, but it's not happening in this cycle. > I'm not very sure how we'd go about it anyway --- for examples > like this, every new user-defined aggregate potentially wants its > own flavor of "internal", so how do we manage that? Can't we have them be ExtensibleNode? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jun 17, 2016 at 3:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The concern I have is that you could stick it into an aggregate that isn't > the one it's expecting to be used in, or into a slot in that aggregate > that isn't deserialize(), and the run-time test can't detect either of > those things. Now maybe this is locked down sufficiently by the fact > that we don't let non-superusers create aggregates with transtype > INTERNAL, but that seems pretty shaky to me given the number of moving > parts in aggregates these days and the fact that we keep adding more. Well, I'm not averse to changing it for more security, but "there could be a bug there in somewhere" is a bit different from "the claim in the comment there that it's okay if we check for aggregate context is a joke". >>> Not to mention that CREATE >>> FUNCTION won't allow creation of such functions, so extensions are locked >>> out of using this feature. > >> Oops. > > I think that means we *have* to change this. Well, we don't *have* to change things for this reason, but it's certainly not at all desirable for user-defined aggregates to be locked out of this functionality. So I'm in favor of changing it. >> I think we should break up internal into various kinds of internal >> depending on what kind of a thing we've actually got a pointer to. > > Not a bad long-term project, but it's not happening in this cycle. > I'm not very sure how we'd go about it anyway --- for examples > like this, every new user-defined aggregate potentially wants its > own flavor of "internal", so how do we manage that? I think we'd want some way to easily spin up new internal-ish types. CREATE TYPE myinternalthingy AS INTERNAL, or something like that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Tom Lane wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> I think we should break up internal into various kinds of internal >>> depending on what kind of a thing we've actually got a pointer to. >> Not a bad long-term project, but it's not happening in this cycle. >> I'm not very sure how we'd go about it anyway --- for examples >> like this, every new user-defined aggregate potentially wants its >> own flavor of "internal", so how do we manage that? > Can't we have them be ExtensibleNode? No, these are data types and values of data types, not parsetree nodes. The core problem is to teach the type system to prevent you from sticking foo(internal) into a context where the actual value passed will be some other kind of "internal" than it's expecting. Having said that, another possible solution is to establish a convention that every sort of "internal" thingy should have an ID word (with a different magic number for each kind of thingy) that all functions check. But I'm not convinced that that's going to be convenient to do everywhere, and it's certainly not likely to be easier to bolt on than a bunch of new internal-ish type OIDs. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > Can't we have them be ExtensibleNode? > > No, these are data types and values of data types, not parsetree nodes. Ah, right. > The core problem is to teach the type system to prevent you from > sticking foo(internal) into a context where the actual value passed > will be some other kind of "internal" than it's expecting. Makes sense as a safety measure. > Having said that, another possible solution is to establish a convention > that every sort of "internal" thingy should have an ID word (with a > different magic number for each kind of thingy) that all functions check. > But I'm not convinced that that's going to be convenient to do everywhere, > and it's certainly not likely to be easier to bolt on than a bunch of new > internal-ish type OIDs. I think the magic number is a lot more convenient than new type OIDs. As the number of internal types grows, it will become a nuisance, considering that you need to add boilerplate for in/out/send/recv functions, document it as a pseudotype, yadda yadda yadda. Some central place to define the internal numbers to ensure uniqueness, akin to duplicate_oids, is probably a necessity, though, which may be a bit too much scripting to be indulging in, this late in the cycle. I suppose we could have done pg_ddl_command this way, but it's already in 9.5 so it's probably too late for that. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I wrote: > Meanwhile, I'll go see if I can work out exactly what's broken about the > polymorphic type-slinging. That I would like to see working in beta2. OK, I've got at least the core of the problem. fix_combine_agg_expr changes the upper aggregate's Aggref node so that its args list is a single Var of the transtype. This has nothing whatsoever to do with the original user-supplied aggregate arguments, not as to datatype nor even number of arguments. Nonetheless, nodeAgg.c uses that args list as input to get_aggregate_argtypes(), and thence to resolve_aggregate_transtype() and build_aggregate_finalfn_expr(), as well as some miscellaneous checks in build_pertrans_for_aggref (though by luck it appears that those checks aren't reachable for a combining Aggref). It's basically just luck that this ever works; we'd have noticed it long ago if it weren't that so few of the aggregates that currently have combinefns are either polymorphic or multiple-argument. Likely the only polymorphic one that ever got tested is anyarray_agg, and that accidentally fails to fail because (1) enforce_generic_type_consistency has special-case exceptions for ANYARRAY, and (2) the array-mashing code doesn't actually need to know which array type it's dealing with since it finds that out by looking into the array values. We could fix the resolve_aggregate_transtype aspect of this by storing the resolved transtype into Aggref nodes sometime before whacking them into partial format, rather than re-deducing it during nodeAgg startup. I'm inclined to do that just on efficiency grounds; but it's not enough to fix the problem here, because we'd still be passing the wrong argument types to build_aggregate_finalfn_expr(), and any finalfunc that actually used that information would fail. It seems like the best way to fix this is to add another field to Aggref that is an OID list of the original argument types, which we don't change when replacing the argument list for a combining agg, and to look to that rather than the physical args list when extracting the argument types in nodeAgg.c. This is kind of bulky and grotty, but it should work reliably. I also thought about leaving the original args list in place, and having a secondary args list used only for combining Aggrefs that contains just the Var referencing the subsidiary transtype result. An attraction of that is that we'd not need that Rube-Goldberg-worthy method for printing combining Aggrefs in ruleutils.c. However, the cruft would still leak out somewhere, because if we just did that much then setrefs.c would want to find all the aggregate input values in the output of the subsidiary plan node, where of course they aren't and shouldn't be. Maybe we could skip resolving the original args list into subplan references in this case, but that might well break ruleutils, so it just seems like a mess. Yet another idea is to have nodeAgg do something similar to what ruleutils does and drill down through the subsidiary-transtype-result Var to find the partial Aggref and grab its argument list. But that's just doubling down on a mechanism that we should be looking to get rid of not emulate. So at this point my proposal is: 1. Add an OID-list field to Aggref holding the data types of the user-supplied arguments. This can be filled in parse analysis since it won't change thereafter. Replace calls to get_aggregate_argtypes() with use of the OID-list field, or maybe keep the function but have it look at the OID list not the physical args list. 2. Add an OID field to Aggref holding the resolved (non polymorphic) transition data type's OID. For the moment we could just set this during parse analysis, since we do not support changing the transtype of an existing aggregate. If we ever decide we want to allow that, the lookup could be postponed into the planner. Replace calls to resolve_aggregate_transtype with use of this field. (resolve_aggregate_transtype() wouldn't disappear, but it would be invoked only once during parse analysis, not multiple times per query.) #2 isn't necessary to fix the bug, but as long as we are doing #1 we might as well do #2 too, to buy back some of the planner overhead added by parallel query. Barring objections I'll make this happen by tomorrow. I still don't like anything about the way that the matching logic in fix_combine_agg_expr works, but at the moment I can't point to any observable bug from that, so I'll not try to change it before beta2. regards, tom lane
On 18 June 2016 at 05:45, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The situation is much direr as far as serialize/deserialize are concerned. > These basically don't work at all for polymorphic transtypes: if you try > to declare "deserialize(bytea) returns anyarray", the type system won't > let you. Perhaps that's not an issue because you shouldn't really need > serialize/deserialize for anything except INTERNAL transtype. However, > there's a far bigger problem which is that "deserialize(bytea) returns > internal" is a blatant security hole, which I see you ignored the defense > against in opr_sanity.sql. The claim in the comment there that it's okay > if we check for aggregate context is a joke --- or haven't you heard that > we allow users to create aggregates? That's not nearly good enough to > prevent unsafe usage of such functions. Not to mention that CREATE > FUNCTION won't allow creation of such functions, so extensions are locked > out of using this feature. This is an unfortunate oversight, and it is my fault. > This has to be redesigned or else reverted entirely. I'm not going to > take no for an answer. I don't think anyone is against fixing any of the said issues. I personally was committed to other non-work related things Friday - Sunday and I was unable to do anything with this. > A possible solution is to give deserialize an extra dummy argument, along > the lines of "deserialize(bytea, internal) returns internal", thereby > ensuring it can't be called in any non-system-originated contexts. This > is still rather dangerous if the other argument is variable, as somebody > might be able to abuse an internal-taking function by naming it as the > deserialize function for a maliciously-designed aggregate. What I'm > inclined to do to lock it down further is to drop the "serialtype" > argument to CREATE AGGREGATE, which seems rather pointless (what else > would you ever use besides bytea?). Instead, insist that > serialize/deserialize apply *only* when the transtype is INTERNAL, and > their signatures are exactly "serialize(internal) returns bytea" and > "deserialize(bytea, internal) returns internal", never anything else. This is also the only way that I can think of to fix this issue. If we can agree that the fix should be to insist that the deserialisation function take an additional 2nd parameter of INTERNAL, then I can write a patch to fix this, and include a patch for the document section 35.10 to explain better about parallelising user defined aggregates. > A different way of locking things down, which might be cleaner in the > long run, is to invent a new pseudo-type for the sole purpose of being > the serialization type, that is we'd insist on the signatures being > "serialize(internal) returns serialized_internal" and > "deserialize(serialized_internal) returns internal", where > serialized_internal has the representation properties of bytea but is > usable for no other purpose than this. Not sure it's worth the trouble > though. I agree this idea has merit, but seems like quite a large project for 9.6 at this stage. Adding the extra parameter to the deserialiation function seems more reasonable at this stage, although once we get that we're likely stuck with it even if we devise a better way to handle all this later. > Anyway, setting aside the security angle, it doesn't seem like there is > anything wrong with the high-level design for polymorphic cases. I'm now > thinking the problem I saw is just a garden variety implementation bug > (or bugs). I'm still not very happy with the confusion around aggtype > though, not least because I suspect it contributed directly to this bug. > I also feel that both the code comments and the user-facing documentation > for this feature are well below project standards, eg where's the > discussion in section 35.10? Thank you for committing the fix for the originally reported problem with the polymorphic aggregates. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 18 June 2016 at 09:29, Tom Lane <tgl@sss.pgh.pa.us> wrote: > So at this point my proposal is: > > 1. Add an OID-list field to Aggref holding the data types of the > user-supplied arguments. This can be filled in parse analysis since it > won't change thereafter. Replace calls to get_aggregate_argtypes() with > use of the OID-list field, or maybe keep the function but have it look > at the OID list not the physical args list. > > 2. Add an OID field to Aggref holding the resolved (non polymorphic) > transition data type's OID. For the moment we could just set this > during parse analysis, since we do not support changing the transtype > of an existing aggregate. If we ever decide we want to allow that, > the lookup could be postponed into the planner. Replace calls to > resolve_aggregate_transtype with use of this field. > (resolve_aggregate_transtype() wouldn't disappear, but it would be > invoked only once during parse analysis, not multiple times per query.) > > #2 isn't necessary to fix the bug, but as long as we are doing #1 > we might as well do #2 too, to buy back some of the planner overhead > added by parallel query. > > Barring objections I'll make this happen by tomorrow. Thanks for committing this change. >From reading over the commit I see you've left; + * XXX need more documentation about partial aggregation here Would the attached cover off what you had imagined might go here? > I still don't like anything about the way that the matching logic in > fix_combine_agg_expr works, but at the moment I can't point to any > observable bug from that, so I'll not try to change it before beta2. Right, I see more work is needed in that area as there's now an out-of-date comment at the top of search_indexed_tlist_for_partial_aggref. The comment claims we only ignore aggoutputtype, but you added aggtranstype and aggargtypes to that list. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On 20 June 2016 at 19:06, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 18 June 2016 at 05:45, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> A possible solution is to give deserialize an extra dummy argument, along >> the lines of "deserialize(bytea, internal) returns internal", thereby >> ensuring it can't be called in any non-system-originated contexts. This >> is still rather dangerous if the other argument is variable, as somebody >> might be able to abuse an internal-taking function by naming it as the >> deserialize function for a maliciously-designed aggregate. What I'm >> inclined to do to lock it down further is to drop the "serialtype" >> argument to CREATE AGGREGATE, which seems rather pointless (what else >> would you ever use besides bytea?). Instead, insist that >> serialize/deserialize apply *only* when the transtype is INTERNAL, and >> their signatures are exactly "serialize(internal) returns bytea" and >> "deserialize(bytea, internal) returns internal", never anything else. > > This is also the only way that I can think of to fix this issue. If we > can agree that the fix should be to insist that the deserialisation > function take an additional 2nd parameter of INTERNAL, then I can > write a patch to fix this, and include a patch for the document > section 35.10 to explain better about parallelising user defined > aggregates. I've gone and implemented the dummy argument approach for deserialization functions. If we go with this, I can then write the docs for 35.10 which'll serve to explain parallel user defined aggregates in detail. Some notes about the patch; I didn't remove the comments at the top of each deserial function which mention something like: * numeric_avg_serialize(numeric_avg_deserialize(bytea)) must result in a value * which matches the original bytea value. I'm thinking that perhaps these now make a little less sense, given that numeric_avg_deserialize is now numeric_avg_deserialize(bytea, internal). Perhaps these should be updated or removed. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On 20 June 2016 at 22:19, David Rowley <david.rowley@2ndquadrant.com> wrote: > I've gone and implemented the dummy argument approach for > deserialization functions. > > If we go with this, I can then write the docs for 35.10 which'll serve > to explain parallel user defined aggregates in detail. I've attached my proposed patch for the user defined aggregate docs. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
David Rowley <david.rowley@2ndquadrant.com> writes: > I've gone and implemented the dummy argument approach for > deserialization functions. How do you feel about the further idea of locking down the signatures to be exactly "serialize(internal) returns bytea" and "deserialize(bytea, internal) returns internal", and removing pg_aggregate.aggserialtype? I don't see very much value in allowing any other nominal transmission type besides bytea; and the less flexibility in these function signatures, the less chance of confusion/misuse of other internal-related functions. regards, tom lane
I wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: >> I've gone and implemented the dummy argument approach for >> deserialization functions. > How do you feel about the further idea of locking down the signatures > to be exactly "serialize(internal) returns bytea" and "deserialize(bytea, > internal) returns internal", and removing pg_aggregate.aggserialtype? > I don't see very much value in allowing any other nominal transmission > type besides bytea; and the less flexibility in these function signatures, > the less chance of confusion/misuse of other internal-related functions. Not hearing any objections, pushed that way. regards, tom lane
On 23 June 2016 at 08:53, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> David Rowley <david.rowley@2ndquadrant.com> writes: >>> I've gone and implemented the dummy argument approach for >>> deserialization functions. > >> How do you feel about the further idea of locking down the signatures >> to be exactly "serialize(internal) returns bytea" and "deserialize(bytea, >> internal) returns internal", and removing pg_aggregate.aggserialtype? >> I don't see very much value in allowing any other nominal transmission >> type besides bytea; and the less flexibility in these function signatures, >> the less chance of confusion/misuse of other internal-related functions. I don't object to removing it. If we find a use for non-bytea serial types down the line, then we can add it again. > Not hearing any objections, pushed that way. Many thanks for committing those fixes. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes: > I've attached my proposed patch for the user defined aggregate docs. I whacked this around some more and pushed it. While working on that, I noticed what seems to me to be a minor bug. The behavior that I'd expect (and that I documented) for a deserialization function is that it just allocates its result in the current, short-lived memory context, since it will be the combine function's responsibility to merge that into the long-lived transition state. But it looks to me like the deserialization functions in numeric.c are allocating their results in the aggregate context, which will mean a leak. (For example, numeric_avg_deserialize creates its result using makeNumericAggState which forces the result into the agg context.) Now this leak isn't going to be real significant unless we accumulate a whole lot of partial results in one query, which would be unusual and maybe even impossible in the current parallel-query environment. But it still seems wrong. Please check, and submit a patch if I'm right about what's happening. regards, tom lane
On 23 June 2016 at 11:22, Tom Lane <tgl@sss.pgh.pa.us> wrote: > While working on that, I noticed what seems to me to be a minor bug. > The behavior that I'd expect (and that I documented) for a deserialization > function is that it just allocates its result in the current, short-lived > memory context, since it will be the combine function's responsibility to > merge that into the long-lived transition state. But it looks to me like > the deserialization functions in numeric.c are allocating their results > in the aggregate context, which will mean a leak. (For example, > numeric_avg_deserialize creates its result using makeNumericAggState > which forces the result into the agg context.) Yes, you're right. In the end I decided to add a makeNumericAggStateCurrentContext() function which does not perform any memory context switching at all. It seems like this can be used for the combine functions too, since they've already switched to the aggregate memory context. This should save a few cycles during aggregate combine, and not expend any extra as some alternatives, like adding a flag to makeNumericAggState(). -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
David Rowley <david.rowley@2ndquadrant.com> writes: > On 23 June 2016 at 11:22, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The behavior that I'd expect (and that I documented) for a deserialization >> function is that it just allocates its result in the current, short-lived >> memory context, since it will be the combine function's responsibility to >> merge that into the long-lived transition state. But it looks to me like >> the deserialization functions in numeric.c are allocating their results >> in the aggregate context, which will mean a leak. (For example, >> numeric_avg_deserialize creates its result using makeNumericAggState >> which forces the result into the agg context.) > Yes, you're right. > In the end I decided to add a makeNumericAggStateCurrentContext() > function which does not perform any memory context switching at all. > It seems like this can be used for the combine functions too, since > they've already switched to the aggregate memory context. This should > save a few cycles during aggregate combine, and not expend any extra > as some alternatives, like adding a flag to makeNumericAggState(). You missed the ones using makePolyNumAggState --- I fixed that and pushed it. regards, tom lane
On Thu, Jun 23, 2016 at 10:57:26AM -0400, Tom Lane wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: > > On 23 June 2016 at 11:22, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> The behavior that I'd expect (and that I documented) for a deserialization > >> function is that it just allocates its result in the current, short-lived > >> memory context, since it will be the combine function's responsibility to > >> merge that into the long-lived transition state. But it looks to me like > >> the deserialization functions in numeric.c are allocating their results > >> in the aggregate context, which will mean a leak. (For example, > >> numeric_avg_deserialize creates its result using makeNumericAggState > >> which forces the result into the agg context.) > > > Yes, you're right. > > > In the end I decided to add a makeNumericAggStateCurrentContext() > > function which does not perform any memory context switching at all. > > It seems like this can be used for the combine functions too, since > > they've already switched to the aggregate memory context. This should > > save a few cycles during aggregate combine, and not expend any extra > > as some alternatives, like adding a flag to makeNumericAggState(). > > You missed the ones using makePolyNumAggState --- I fixed that and > pushed it. What, if anything, is yet required to close this 9.6 open item?
Noah Misch <noah@leadboat.com> writes: > What, if anything, is yet required to close this 9.6 open item? The original complaint about polymorphic aggs is fixed to my satisfaction. The business about how non-text-format EXPLAIN reports parallel plans (<16002.1466972724@sss.pgh.pa.us>) remains, but perhaps we should view that as an independent issue. regards, tom lane