Обсуждение: How can I pushdown of functions used in targetlist with FDW ?

Поиск
Список
Период
Сортировка

How can I pushdown of functions used in targetlist with FDW ?

От
shigeo Hirose
Дата:
How can I pushdown of functions used in targetlist with FDW ?

I know non-built-in functions can pushdown with WHERE clauses.
https://www.postgresql.org/docs/12/postgres-fdw.html#id-1.11.7.42.10

But I would like to pushdown function when use targetlist.
For example, I would like to change "Remote SQL: SELECT h FROM 
public.hs" to "Remote SQL: SELECT avals(h) FROM public.hs" in bellow case.

postgres=# explain (verbose) SELECT avals(h) FROM tab_hstore;
                                   QUERY PLAN
------------------------------------------------------------------------------
  Foreign Scan on public.tab_hstore  (cost=100.00..157.52 rows=1462 
width=32)
    Output: avals(h)
    Remote SQL: SELECT h FROM public.hs


How to pushdown function in targetlist with FDW?
And if it can not pushdown, do you have a plan to add features?

Regards,
Shigeo Hirose



Re: How can I pushdown of functions used in targetlist with FDW ?

От
Tom Lane
Дата:
shigeo Hirose <shigeo.hirose@toshiba.co.jp> writes:
> How can I pushdown of functions used in targetlist with FDW ?

There is, AFAIK, no provision for that.  There's not a lot of
reason to consider adding it either, because there's no reason
to suppose that the remote node can run such a function any
faster than the local node.  So the difficulty and risk of
determining/assuming that f(x) on the local node is the same
as f(x) on the remote node doesn't seem like it'd be repaid.

You can force matters by making a foreign table that points to a view
on the remote side, where the view includes the function call as an
output column.  It wouldn't be perfectly transparent of course; you'd
have to modify the local query to refer to that column rather than
calling the function explicitly.

            regards, tom lane



Re: How can I pushdown of functions used in targetlist with FDW ?

От
Stephen Frost
Дата:
Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> shigeo Hirose <shigeo.hirose@toshiba.co.jp> writes:
> > How can I pushdown of functions used in targetlist with FDW ?
>
> There is, AFAIK, no provision for that.  There's not a lot of
> reason to consider adding it either, because there's no reason
> to suppose that the remote node can run such a function any
> faster than the local node.  So the difficulty and risk of
> determining/assuming that f(x) on the local node is the same
> as f(x) on the remote node doesn't seem like it'd be repaid.

Well, for starters, isn't there something in the SQL/MED spec about
this..?

Next, we already do this for aggregates.

Also, I don't think I agree with this particular position- it's entirely
possible that the remote server is much less loaded/busy than the local
one and therefore it'd be beneficial, overall, to run that function on
the remote system and not the local one.  The function might even have
side-effects or use other objects in the system to run.  The function
may also return a much smaller result than pulling back the raw data (of
course, the opposite could also possibly be true).

I guess my gut feeling is that, in general, we should push down as much
of the query as possible, as we do for aggregates and joins and
conditionals.  I appreciate that there's some complications here when it
comes to figuring out if it's possible/reasonable to push down something
like, say, a volatile function that's locally defined and known to exist
on the remote.  Haven't got a particular idea how to address that
offhand but that doesn't change my feelings that we should have a way to
do this generally and that it would actually be useful to have.

Thanks,

Stephen

Вложения

Re: How can I pushdown of functions used in targetlist with FDW ?

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> There is, AFAIK, no provision for that.  There's not a lot of
>> reason to consider adding it either, because there's no reason
>> to suppose that the remote node can run such a function any
>> faster than the local node.

> Also, I don't think I agree with this particular position- it's entirely
> possible that the remote server is much less loaded/busy than the local
> one and therefore it'd be beneficial, overall, to run that function on
> the remote system and not the local one.

Or the reverse.  We have no way of estimating such effects.

> The function might even have
> side-effects or use other objects in the system to run.

In such a case, we *can't* be pushing down.  The entire foundational
principle of this behavior is that we must be certain that a pushed-down
operation has exactly the same semantics on either node.  Otherwise,
the planner's choices are not optimizations but query behavior changes,
and we can't have that.  If you've got a function like that to run
on the remote end, you have to embed it in a remote view -- we simply
don't have another option.

> The function
> may also return a much smaller result than pulling back the raw data (of
> course, the opposite could also possibly be true).

Yeah, this is a legitimate point, but again we have no very good way
of estimating which is better.

In general, there's an awful lot of postgres_fdw's behavior that depends
on the assumption that the remote and local servers are pretty
interchangeable, not least that we take the remote's cost numbers at
face value when preparing cost numbers for a foreign scan.  The only
consideration that we really can reliably optimize push-down choices with
is trying to reduce the volume of data transmitted, and we do that without
consideration for whether the operations pushed across might take more or
less time when run on the other server.  Maybe someday that could be
improved, but it seems like a nontrivial research project involving a
lot more moving parts than just this point.

            regards, tom lane



Re: How can I pushdown of functions used in targetlist with FDW ?

От
Stephen Frost
Дата:
Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> >> There is, AFAIK, no provision for that.  There's not a lot of
> >> reason to consider adding it either, because there's no reason
> >> to suppose that the remote node can run such a function any
> >> faster than the local node.
>
> > Also, I don't think I agree with this particular position- it's entirely
> > possible that the remote server is much less loaded/busy than the local
> > one and therefore it'd be beneficial, overall, to run that function on
> > the remote system and not the local one.
>
> Or the reverse.  We have no way of estimating such effects.

Yeah, it'd be nice if there was an overall solution, but I don't know
that we have to solve that to support this.

> > The function might even have
> > side-effects or use other objects in the system to run.
>
> In such a case, we *can't* be pushing down.  The entire foundational
> principle of this behavior is that we must be certain that a pushed-down
> operation has exactly the same semantics on either node.  Otherwise,
> the planner's choices are not optimizations but query behavior changes,
> and we can't have that.  If you've got a function like that to run
> on the remote end, you have to embed it in a remote view -- we simply
> don't have another option.

This is where things get complicated because I would argue that we *can*
push down and in some cases we *have* to, but we definitely need to sort
out when to do which and I think that's where the whole SQL/MED stuff
and FOREIGN FUNCTIONs come into play.

In other words, I see options like:

a) Function is a FOREIGN FUNCTION, therefore we must push it down
b) Function is only available locally and therefore we must run it
   locally
c) Function is available in *both* places and expected to have the same
   behavior

'a' and 'b' are pretty clear.  In an ideal world, we'd figure out where
the *best* place to run 'c' is and then run it there.  Perhaps there's a
way to set up a cost model for it, but at least at the moment, we only
have one cost for functions and it'd almost certainly be the same value
for local as for remote in the 'c' case, which means we need something
else.  For my 2c, I'd be inclined to use a heuristic of "if it's
actually possible for us to push it down, we should do so."  Of course
there'll be cases where it's terrible to do so and so it'd be good if
there was some way for users to tell us which to do (maybe we have a GUC
for it?  The enable_* mechanism is an unfortunately large hammer but it
might cover enough cases (enable_funcpushdown?)).

> > The function
> > may also return a much smaller result than pulling back the raw data (of
> > course, the opposite could also possibly be true).
>
> Yeah, this is a legitimate point, but again we have no very good way
> of estimating which is better.

Yeah, I agree with that.

> In general, there's an awful lot of postgres_fdw's behavior that depends
> on the assumption that the remote and local servers are pretty
> interchangeable, not least that we take the remote's cost numbers at
> face value when preparing cost numbers for a foreign scan.  The only
> consideration that we really can reliably optimize push-down choices with
> is trying to reduce the volume of data transmitted, and we do that without
> consideration for whether the operations pushed across might take more or
> less time when run on the other server.  Maybe someday that could be
> improved, but it seems like a nontrivial research project involving a
> lot more moving parts than just this point.

I agree with all of that- but it seems like we've got people asking for
it, and I can understand why they are, and therefore I'd be inclined to
provide a way for users to get that behavior and I'm even inclined to
say that pushing down should be the 'default' if everything else is
equal and it's possible to do so.  Figuring out a way to decide which is
likely to be better based on stats and other information could then be a
research project to improve on the simple heuristic.

Thanks,

Stephen

Вложения

Re: How can I pushdown of functions used in targetlist with FDW ?

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> I agree with all of that- but it seems like we've got people asking for
> it, and I can understand why they are, and therefore I'd be inclined to
> provide a way for users to get that behavior and I'm even inclined to
> say that pushing down should be the 'default' if everything else is
> equal and it's possible to do so.

[ shrug... ]  I think that's a completely arbitrary value judgment
unsupported by any facts.  Furthermore, it'd greatly expand our exposure
to problems with "we thought this function behaves the same on the remote
end, but it doesn't".  That hazard makes me feel that the default should
*never* be to encourage pushing down, unless there's a pretty clear
potential win to justify the risk of breakage.

If SQL/MED has a notion of a "foreign function" that is only executable on
the remote side, and that fails if you try to execute it locally, then
it'd make sense to implement that feature and then expect postgres_fdw to
try hard to push down such calls.  But we don't have that (yet).  I'm
unconvinced that ordinary functions ought to be treated as if remote
execution is preferable.

            regards, tom lane



Re: How can I pushdown of functions used in targetlist with FDW ?

От
Kyotaro Horiguchi
Дата:
At Thu, 09 Jan 2020 14:12:25 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
> Stephen Frost <sfrost@snowman.net> writes:
> > I agree with all of that- but it seems like we've got people asking for
> > it, and I can understand why they are, and therefore I'd be inclined to
> > provide a way for users to get that behavior and I'm even inclined to
> > say that pushing down should be the 'default' if everything else is
> > equal and it's possible to do so.
> 
> [ shrug... ]  I think that's a completely arbitrary value judgment
> unsupported by any facts.  Furthermore, it'd greatly expand our exposure
> to problems with "we thought this function behaves the same on the remote
> end, but it doesn't".  That hazard makes me feel that the default should
> *never* be to encourage pushing down, unless there's a pretty clear
> potential win to justify the risk of breakage.
> 
> If SQL/MED has a notion of a "foreign function" that is only executable on
> the remote side, and that fails if you try to execute it locally, then
> it'd make sense to implement that feature and then expect postgres_fdw to
> try hard to push down such calls.  But we don't have that (yet).  I'm
> unconvinced that ordinary functions ought to be treated as if remote
> execution is preferable.

Isn't ROUTING MAPPING [1] that?  Definers should define one at their
own risk as table constraints are.

[1] https://www.postgresql.org/message-id/CAD21AoCSzZO%2Bd3EL-9w%3D97aC28%3DRb3%2BBh77wRLcP7HCH8-Bn9A%40mail.gmail.com

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: How can I pushdown of functions used in targetlist with FDW ?

От
Tom Lane
Дата:
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
> At Thu, 09 Jan 2020 14:12:25 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
>> If SQL/MED has a notion of a "foreign function" that is only executable on
>> the remote side, and that fails if you try to execute it locally, then
>> it'd make sense to implement that feature and then expect postgres_fdw to
>> try hard to push down such calls.  But we don't have that (yet).  I'm
>> unconvinced that ordinary functions ought to be treated as if remote
>> execution is preferable.

> Isn't ROUTING MAPPING [1] that?  Definers should define one at their
> own risk as table constraints are.

Hmm.  It looks like that patch is moribund, and I can't say that I'm
excited about reviving it.  Aside from the syntactical reserved-word
problems, it seems like a mighty heavyweight way of attacking the issue
--- that is, instead of a property directly attached to the function of
interest, you've now got this entire other system catalog structure that
has to be searched for relevant information.  And that needs all the usual
support for a new object type, eg ALTER infrastructure, pg_dump support,
etc etc etc.

Plus, once you've created a mapping, it's only good for one server so you
have to do it over again for each server.  I guess in the general case
you have to have that, but I'd sure want some kind of shortcut for the
common case where e.g. ABS() means the same thing everywhere.

The SQL committee do love verbosity, don't they.

            regards, tom lane



Re: How can I pushdown of functions used in targetlist with FDW ?

От
Kyotaro Horiguchi
Дата:
At Thu, 09 Jan 2020 20:25:20 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
> Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
> > Isn't ROUTING MAPPING [1] that?  Definers should define one at their
> > own risk as table constraints are.
> 
> Hmm.  It looks like that patch is moribund, and I can't say that I'm
> excited about reviving it.  Aside from the syntactical reserved-word
> problems, it seems like a mighty heavyweight way of attacking the issue

I don't mean that the patch itself is usable right now, but am simply
asking about the feature.

> --- that is, instead of a property directly attached to the function of
> interest, you've now got this entire other system catalog structure that
> has to be searched for relevant information.  And that needs all the usual
> support for a new object type, eg ALTER infrastructure, pg_dump support,
> etc etc etc.

Yeah, that sounds really combersome to me..

> Plus, once you've created a mapping, it's only good for one server so you
> have to do it over again for each server.  I guess in the general case
> you have to have that, but I'd sure want some kind of shortcut for the
> common case where e.g. ABS() means the same thing everywhere.

As for the user side, I understand that they want to push down certain
functions despite of the troubles, but it's better if simpler.

About the idea of function attrributes, I think push-down'ability is
not one of function's properties. postgres_fdw can push down many
intrinsic functions. oracle_fdw can push down less functions. file_fdw
cannot at all. So it doesn't seem rather an ability of FDW drivers.
Putting aside the maintenance trouble, it would work for intrinsic
functions.

Even with the feature, we still cannot push down user-defined
functions, or incompatible intrinsic functions, which the user wants
to push down knowing about the incompatibility.  If we allow that, we
need to have a means to tell foreign servers about such functions.  A
means other than ROUTINE MAPPING I can come up with is server options
and foreign table options.

> The SQL committee do love verbosity, don't they.

Sure..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: How can I pushdown of functions used in targetlist with FDW ?

От
hirose shigeo(廣瀬 繁雄 ○SWC□OST)
Дата:
Thank you for informations
I didn't know "CREATE ROUTINE MAPPING" thread.
In my development, it may be necessary to push down features, whether 
they are remote only or remote and local.
Now I understand community concerns about function pushdown.
I will investigate more and if needed I will create new design and post 
to hackers.

Regards,

Shigeo Hirose


On 2020/01/10 12:24, Kyotaro Horiguchi wrote:
> At Thu, 09 Jan 2020 20:25:20 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in
>> Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
>>> Isn't ROUTING MAPPING [1] that?  Definers should define one at their
>>> own risk as table constraints are.
>>
>> Hmm.  It looks like that patch is moribund, and I can't say that I'm
>> excited about reviving it.  Aside from the syntactical reserved-word
>> problems, it seems like a mighty heavyweight way of attacking the issue
> 
> I don't mean that the patch itself is usable right now, but am simply
> asking about the feature.
> 
>> --- that is, instead of a property directly attached to the function of
>> interest, you've now got this entire other system catalog structure that
>> has to be searched for relevant information.  And that needs all the usual
>> support for a new object type, eg ALTER infrastructure, pg_dump support,
>> etc etc etc.
> 
> Yeah, that sounds really combersome to me..
> 
>> Plus, once you've created a mapping, it's only good for one server so you
>> have to do it over again for each server.  I guess in the general case
>> you have to have that, but I'd sure want some kind of shortcut for the
>> common case where e.g. ABS() means the same thing everywhere.
> 
> As for the user side, I understand that they want to push down certain
> functions despite of the troubles, but it's better if simpler.
> 
> About the idea of function attrributes, I think push-down'ability is
> not one of function's properties. postgres_fdw can push down many
> intrinsic functions. oracle_fdw can push down less functions. file_fdw
> cannot at all. So it doesn't seem rather an ability of FDW drivers.
> Putting aside the maintenance trouble, it would work for intrinsic
> functions.
> 
> Even with the feature, we still cannot push down user-defined
> functions, or incompatible intrinsic functions, which the user wants
> to push down knowing about the incompatibility.  If we allow that, we
> need to have a means to tell foreign servers about such functions.  A
> means other than ROUTINE MAPPING I can come up with is server options
> and foreign table options.
> 
>> The SQL committee do love verbosity, don't they.
> 
> Sure..
> 
> regards.
>