Обсуждение: How can I pushdown of functions used in targetlist with FDW ?
How can I pushdown of functions used in targetlist with FDW ? I know non-built-in functions can pushdown with WHERE clauses. https://www.postgresql.org/docs/12/postgres-fdw.html#id-1.11.7.42.10 But I would like to pushdown function when use targetlist. For example, I would like to change "Remote SQL: SELECT h FROM public.hs" to "Remote SQL: SELECT avals(h) FROM public.hs" in bellow case. postgres=# explain (verbose) SELECT avals(h) FROM tab_hstore; QUERY PLAN ------------------------------------------------------------------------------ Foreign Scan on public.tab_hstore (cost=100.00..157.52 rows=1462 width=32) Output: avals(h) Remote SQL: SELECT h FROM public.hs How to pushdown function in targetlist with FDW? And if it can not pushdown, do you have a plan to add features? Regards, Shigeo Hirose
shigeo Hirose <shigeo.hirose@toshiba.co.jp> writes: > How can I pushdown of functions used in targetlist with FDW ? There is, AFAIK, no provision for that. There's not a lot of reason to consider adding it either, because there's no reason to suppose that the remote node can run such a function any faster than the local node. So the difficulty and risk of determining/assuming that f(x) on the local node is the same as f(x) on the remote node doesn't seem like it'd be repaid. You can force matters by making a foreign table that points to a view on the remote side, where the view includes the function call as an output column. It wouldn't be perfectly transparent of course; you'd have to modify the local query to refer to that column rather than calling the function explicitly. regards, tom lane
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > shigeo Hirose <shigeo.hirose@toshiba.co.jp> writes: > > How can I pushdown of functions used in targetlist with FDW ? > > There is, AFAIK, no provision for that. There's not a lot of > reason to consider adding it either, because there's no reason > to suppose that the remote node can run such a function any > faster than the local node. So the difficulty and risk of > determining/assuming that f(x) on the local node is the same > as f(x) on the remote node doesn't seem like it'd be repaid. Well, for starters, isn't there something in the SQL/MED spec about this..? Next, we already do this for aggregates. Also, I don't think I agree with this particular position- it's entirely possible that the remote server is much less loaded/busy than the local one and therefore it'd be beneficial, overall, to run that function on the remote system and not the local one. The function might even have side-effects or use other objects in the system to run. The function may also return a much smaller result than pulling back the raw data (of course, the opposite could also possibly be true). I guess my gut feeling is that, in general, we should push down as much of the query as possible, as we do for aggregates and joins and conditionals. I appreciate that there's some complications here when it comes to figuring out if it's possible/reasonable to push down something like, say, a volatile function that's locally defined and known to exist on the remote. Haven't got a particular idea how to address that offhand but that doesn't change my feelings that we should have a way to do this generally and that it would actually be useful to have. Thanks, Stephen
Вложения
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> There is, AFAIK, no provision for that. There's not a lot of >> reason to consider adding it either, because there's no reason >> to suppose that the remote node can run such a function any >> faster than the local node. > Also, I don't think I agree with this particular position- it's entirely > possible that the remote server is much less loaded/busy than the local > one and therefore it'd be beneficial, overall, to run that function on > the remote system and not the local one. Or the reverse. We have no way of estimating such effects. > The function might even have > side-effects or use other objects in the system to run. In such a case, we *can't* be pushing down. The entire foundational principle of this behavior is that we must be certain that a pushed-down operation has exactly the same semantics on either node. Otherwise, the planner's choices are not optimizations but query behavior changes, and we can't have that. If you've got a function like that to run on the remote end, you have to embed it in a remote view -- we simply don't have another option. > The function > may also return a much smaller result than pulling back the raw data (of > course, the opposite could also possibly be true). Yeah, this is a legitimate point, but again we have no very good way of estimating which is better. In general, there's an awful lot of postgres_fdw's behavior that depends on the assumption that the remote and local servers are pretty interchangeable, not least that we take the remote's cost numbers at face value when preparing cost numbers for a foreign scan. The only consideration that we really can reliably optimize push-down choices with is trying to reduce the volume of data transmitted, and we do that without consideration for whether the operations pushed across might take more or less time when run on the other server. Maybe someday that could be improved, but it seems like a nontrivial research project involving a lot more moving parts than just this point. regards, tom lane
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > >> There is, AFAIK, no provision for that. There's not a lot of > >> reason to consider adding it either, because there's no reason > >> to suppose that the remote node can run such a function any > >> faster than the local node. > > > Also, I don't think I agree with this particular position- it's entirely > > possible that the remote server is much less loaded/busy than the local > > one and therefore it'd be beneficial, overall, to run that function on > > the remote system and not the local one. > > Or the reverse. We have no way of estimating such effects. Yeah, it'd be nice if there was an overall solution, but I don't know that we have to solve that to support this. > > The function might even have > > side-effects or use other objects in the system to run. > > In such a case, we *can't* be pushing down. The entire foundational > principle of this behavior is that we must be certain that a pushed-down > operation has exactly the same semantics on either node. Otherwise, > the planner's choices are not optimizations but query behavior changes, > and we can't have that. If you've got a function like that to run > on the remote end, you have to embed it in a remote view -- we simply > don't have another option. This is where things get complicated because I would argue that we *can* push down and in some cases we *have* to, but we definitely need to sort out when to do which and I think that's where the whole SQL/MED stuff and FOREIGN FUNCTIONs come into play. In other words, I see options like: a) Function is a FOREIGN FUNCTION, therefore we must push it down b) Function is only available locally and therefore we must run it locally c) Function is available in *both* places and expected to have the same behavior 'a' and 'b' are pretty clear. In an ideal world, we'd figure out where the *best* place to run 'c' is and then run it there. Perhaps there's a way to set up a cost model for it, but at least at the moment, we only have one cost for functions and it'd almost certainly be the same value for local as for remote in the 'c' case, which means we need something else. For my 2c, I'd be inclined to use a heuristic of "if it's actually possible for us to push it down, we should do so." Of course there'll be cases where it's terrible to do so and so it'd be good if there was some way for users to tell us which to do (maybe we have a GUC for it? The enable_* mechanism is an unfortunately large hammer but it might cover enough cases (enable_funcpushdown?)). > > The function > > may also return a much smaller result than pulling back the raw data (of > > course, the opposite could also possibly be true). > > Yeah, this is a legitimate point, but again we have no very good way > of estimating which is better. Yeah, I agree with that. > In general, there's an awful lot of postgres_fdw's behavior that depends > on the assumption that the remote and local servers are pretty > interchangeable, not least that we take the remote's cost numbers at > face value when preparing cost numbers for a foreign scan. The only > consideration that we really can reliably optimize push-down choices with > is trying to reduce the volume of data transmitted, and we do that without > consideration for whether the operations pushed across might take more or > less time when run on the other server. Maybe someday that could be > improved, but it seems like a nontrivial research project involving a > lot more moving parts than just this point. I agree with all of that- but it seems like we've got people asking for it, and I can understand why they are, and therefore I'd be inclined to provide a way for users to get that behavior and I'm even inclined to say that pushing down should be the 'default' if everything else is equal and it's possible to do so. Figuring out a way to decide which is likely to be better based on stats and other information could then be a research project to improve on the simple heuristic. Thanks, Stephen
Вложения
Stephen Frost <sfrost@snowman.net> writes: > I agree with all of that- but it seems like we've got people asking for > it, and I can understand why they are, and therefore I'd be inclined to > provide a way for users to get that behavior and I'm even inclined to > say that pushing down should be the 'default' if everything else is > equal and it's possible to do so. [ shrug... ] I think that's a completely arbitrary value judgment unsupported by any facts. Furthermore, it'd greatly expand our exposure to problems with "we thought this function behaves the same on the remote end, but it doesn't". That hazard makes me feel that the default should *never* be to encourage pushing down, unless there's a pretty clear potential win to justify the risk of breakage. If SQL/MED has a notion of a "foreign function" that is only executable on the remote side, and that fails if you try to execute it locally, then it'd make sense to implement that feature and then expect postgres_fdw to try hard to push down such calls. But we don't have that (yet). I'm unconvinced that ordinary functions ought to be treated as if remote execution is preferable. regards, tom lane
At Thu, 09 Jan 2020 14:12:25 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in > Stephen Frost <sfrost@snowman.net> writes: > > I agree with all of that- but it seems like we've got people asking for > > it, and I can understand why they are, and therefore I'd be inclined to > > provide a way for users to get that behavior and I'm even inclined to > > say that pushing down should be the 'default' if everything else is > > equal and it's possible to do so. > > [ shrug... ] I think that's a completely arbitrary value judgment > unsupported by any facts. Furthermore, it'd greatly expand our exposure > to problems with "we thought this function behaves the same on the remote > end, but it doesn't". That hazard makes me feel that the default should > *never* be to encourage pushing down, unless there's a pretty clear > potential win to justify the risk of breakage. > > If SQL/MED has a notion of a "foreign function" that is only executable on > the remote side, and that fails if you try to execute it locally, then > it'd make sense to implement that feature and then expect postgres_fdw to > try hard to push down such calls. But we don't have that (yet). I'm > unconvinced that ordinary functions ought to be treated as if remote > execution is preferable. Isn't ROUTING MAPPING [1] that? Definers should define one at their own risk as table constraints are. [1] https://www.postgresql.org/message-id/CAD21AoCSzZO%2Bd3EL-9w%3D97aC28%3DRb3%2BBh77wRLcP7HCH8-Bn9A%40mail.gmail.com regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: > At Thu, 09 Jan 2020 14:12:25 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in >> If SQL/MED has a notion of a "foreign function" that is only executable on >> the remote side, and that fails if you try to execute it locally, then >> it'd make sense to implement that feature and then expect postgres_fdw to >> try hard to push down such calls. But we don't have that (yet). I'm >> unconvinced that ordinary functions ought to be treated as if remote >> execution is preferable. > Isn't ROUTING MAPPING [1] that? Definers should define one at their > own risk as table constraints are. Hmm. It looks like that patch is moribund, and I can't say that I'm excited about reviving it. Aside from the syntactical reserved-word problems, it seems like a mighty heavyweight way of attacking the issue --- that is, instead of a property directly attached to the function of interest, you've now got this entire other system catalog structure that has to be searched for relevant information. And that needs all the usual support for a new object type, eg ALTER infrastructure, pg_dump support, etc etc etc. Plus, once you've created a mapping, it's only good for one server so you have to do it over again for each server. I guess in the general case you have to have that, but I'd sure want some kind of shortcut for the common case where e.g. ABS() means the same thing everywhere. The SQL committee do love verbosity, don't they. regards, tom lane
At Thu, 09 Jan 2020 20:25:20 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in > Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: > > Isn't ROUTING MAPPING [1] that? Definers should define one at their > > own risk as table constraints are. > > Hmm. It looks like that patch is moribund, and I can't say that I'm > excited about reviving it. Aside from the syntactical reserved-word > problems, it seems like a mighty heavyweight way of attacking the issue I don't mean that the patch itself is usable right now, but am simply asking about the feature. > --- that is, instead of a property directly attached to the function of > interest, you've now got this entire other system catalog structure that > has to be searched for relevant information. And that needs all the usual > support for a new object type, eg ALTER infrastructure, pg_dump support, > etc etc etc. Yeah, that sounds really combersome to me.. > Plus, once you've created a mapping, it's only good for one server so you > have to do it over again for each server. I guess in the general case > you have to have that, but I'd sure want some kind of shortcut for the > common case where e.g. ABS() means the same thing everywhere. As for the user side, I understand that they want to push down certain functions despite of the troubles, but it's better if simpler. About the idea of function attrributes, I think push-down'ability is not one of function's properties. postgres_fdw can push down many intrinsic functions. oracle_fdw can push down less functions. file_fdw cannot at all. So it doesn't seem rather an ability of FDW drivers. Putting aside the maintenance trouble, it would work for intrinsic functions. Even with the feature, we still cannot push down user-defined functions, or incompatible intrinsic functions, which the user wants to push down knowing about the incompatibility. If we allow that, we need to have a means to tell foreign servers about such functions. A means other than ROUTINE MAPPING I can come up with is server options and foreign table options. > The SQL committee do love verbosity, don't they. Sure.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: How can I pushdown of functions used in targetlist with FDW ?
От
hirose shigeo(廣瀬 繁雄 ○SWC□OST)
Дата:
Thank you for informations I didn't know "CREATE ROUTINE MAPPING" thread. In my development, it may be necessary to push down features, whether they are remote only or remote and local. Now I understand community concerns about function pushdown. I will investigate more and if needed I will create new design and post to hackers. Regards, Shigeo Hirose On 2020/01/10 12:24, Kyotaro Horiguchi wrote: > At Thu, 09 Jan 2020 20:25:20 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in >> Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: >>> Isn't ROUTING MAPPING [1] that? Definers should define one at their >>> own risk as table constraints are. >> >> Hmm. It looks like that patch is moribund, and I can't say that I'm >> excited about reviving it. Aside from the syntactical reserved-word >> problems, it seems like a mighty heavyweight way of attacking the issue > > I don't mean that the patch itself is usable right now, but am simply > asking about the feature. > >> --- that is, instead of a property directly attached to the function of >> interest, you've now got this entire other system catalog structure that >> has to be searched for relevant information. And that needs all the usual >> support for a new object type, eg ALTER infrastructure, pg_dump support, >> etc etc etc. > > Yeah, that sounds really combersome to me.. > >> Plus, once you've created a mapping, it's only good for one server so you >> have to do it over again for each server. I guess in the general case >> you have to have that, but I'd sure want some kind of shortcut for the >> common case where e.g. ABS() means the same thing everywhere. > > As for the user side, I understand that they want to push down certain > functions despite of the troubles, but it's better if simpler. > > About the idea of function attrributes, I think push-down'ability is > not one of function's properties. postgres_fdw can push down many > intrinsic functions. oracle_fdw can push down less functions. file_fdw > cannot at all. So it doesn't seem rather an ability of FDW drivers. > Putting aside the maintenance trouble, it would work for intrinsic > functions. > > Even with the feature, we still cannot push down user-defined > functions, or incompatible intrinsic functions, which the user wants > to push down knowing about the incompatibility. If we allow that, we > need to have a means to tell foreign servers about such functions. A > means other than ROUTINE MAPPING I can come up with is server options > and foreign table options. > >> The SQL committee do love verbosity, don't they. > > Sure.. > > regards. >