Обсуждение: jsonb, collection & postgres_fdw

Поиск
Список
Период
Сортировка

jsonb, collection & postgres_fdw

От
Konstantin Knizhnik
Дата:
Hi hackers,

Right now jsonb functions are treated as non-shippable by postgres_fdw 
and so predicates with them are not pushed down to foreign server:

create table jt(content jsonb);
create extension postgres_fdw;
create server pg_fdw  FOREIGN DATA WRAPPER postgres_fdw options(host 
'127.0.0.1', dbname 'postgres');
create user mapping for current_user server pg_fdw options (user 
'postgres');
create foreign table fjt(content jsonb) server pg_fdw options 
(table_name 'jt');
postgres=# explain select * from fjt where jsonb_exists(content, 'some');
                           QUERY PLAN
--------------------------------------------------------------
  Foreign Scan on fjt  (cost=100.00..157.50 rows=487 width=32)
    Filter: jsonb_exists(content, 'some'::text)

It is because of the following check  in postgres_fdw:

                 /*
                  * If function's input collation is not derived from a 
foreign
                  * Var, it can't be sent to remote.
                  */
                 if (fe->inputcollid == InvalidOid)
                      /* OK, inputs are all noncollatable */ ;
                 else if (inner_cxt.state != FDW_COLLATE_SAFE ||
                          fe->inputcollid != inner_cxt.collation)
                     return false;

In my case
(gdb) p fe->inputcollid
$1 = 100
(gdb) p inner_cxt.collation
$3 = 0
(gdb) p inner_cxt.state
$4 = FDW_COLLATE_NONE


I wonder if there is some way of making postgres_fdw to push this this 
function to foreign server?
May be this check should be changed to:

                 if (fe->inputcollid == InvalidOid || inner_cxt.state == 
FDW_COLLATE_NONE)
                      /* OK, inputs are all noncollatable */ ;




Re: jsonb, collection & postgres_fdw

От
Tom Lane
Дата:
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
> Right now jsonb functions are treated as non-shippable by postgres_fdw 
> and so predicates with them are not pushed down to foreign server:

Yeah, that's kind of annoying, but breaking the collation check
is not an acceptable fix.  And what you're proposing *does* break it.
The issue here is that the function's input collation is coming from
the default collation applied to the text constant, and we can't assume
that that will be the same on the remote side.

In reality, of course, jsonb_exists doesn't care about its input collation
--- but postgres_fdw has no way to know that.  I don't see any easy way
around that.

One idea that would probably work in a lot of postgres_fdw usage scenarios
is to have a foreign-server-level flag that says "all the collations on
that server behave the same as the local ones, and the default collation
is the same too", and then we just skip the collation checking altogether.
But I'm a bit worried that if someone mistakenly sets that flag, the
misbehavior will be very hard to detect.

            regards, tom lane



Re: jsonb, collection & postgres_fdw

От
Konstantin Knizhnik
Дата:

On 13.08.2020 20:00, Tom Lane wrote:
> Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
>> Right now jsonb functions are treated as non-shippable by postgres_fdw
>> and so predicates with them are not pushed down to foreign server:
> Yeah, that's kind of annoying, but breaking the collation check
> is not an acceptable fix.  And what you're proposing *does* break it.
> The issue here is that the function's input collation is coming from
> the default collation applied to the text constant, and we can't assume
> that that will be the same on the remote side.
>
> In reality, of course, jsonb_exists doesn't care about its input collation
> --- but postgres_fdw has no way to know that.  I don't see any easy way
> around that.
>
> One idea that would probably work in a lot of postgres_fdw usage scenarios
> is to have a foreign-server-level flag that says "all the collations on
> that server behave the same as the local ones, and the default collation
> is the same too", and then we just skip the collation checking altogether.
> But I'm a bit worried that if someone mistakenly sets that flag, the
> misbehavior will be very hard to detect.
>
>             regards, tom lane
Thank you for clarification.
And sorry for mistyping in topic (there should be "collation" instead of 
"collection").
Actually I do not know much about handling collations in Postgres and 
particularly in postgres_fdw.
Can you (or somebody else) provide more information about this fragment 
of code:
                 /*
                  * If function's input collation is not derived from a 
foreign
                  * Var, it can't be sent to remote.
                  */
                 if (fe->inputcollid == InvalidOid)
                      /* OK, inputs are all noncollatable */ ;
                 else if (inner_cxt.state != FDW_COLLATE_SAFE ||
                          fe->inputcollid != inner_cxt.collation)
                     return false;

So we have function call expression which arguments have associated 
collation,
but function itself is collection-neutral: funccollid = 0
Why it is not safe to push this function call to the remote server?
Why it breaks collation check?
If there are some unsafe operations with collations during argument 
evaluation, then
we detect it while recursive processing of arguments.

I agree that my proposed fix is not correct.
But what about this check:

                 if (fe->inputcollid == InvalidOid)
                      /* OK, inputs are all noncollatable */ ;
                 else if (fe->funccollid == InvalidOid)
                      /* OK, function is noncollatable */ ;

Or funccollid=0 doesn't mean that collations of function arguments do 
not affect function behavior?




Re: jsonb, collection & postgres_fdw

От
Tom Lane
Дата:
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
> Or funccollid=0 doesn't mean that collations of function arguments do 
> not affect function behavior?

No, it does not.  As I said already, there is no way to tell from outside
a function whether it pays attention to collation or not.  funccollid
is the collation to ascribe to the function's *output*, but that's always
zero for a non-collatable output type such as boolean.  An example
is text_lt(), which returns boolean but surely does depend on the input
collation.  We don't really have any way to distinguish between that and
jsonb_exists().

In hindsight, it was probably a bad idea not to have a way to mark whether
functions care about collation.  I don't know if it'd be practical to
retrofit such a marker now.

            regards, tom lane



Re: jsonb, collection & postgres_fdw

От
Bharath Rupireddy
Дата:
On Thu, Aug 13, 2020 at 8:54 PM Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
>
> Right now jsonb functions are treated as non-shippable by postgres_fdw
> and so predicates with them are not pushed down to foreign server:
>
> I wonder if there is some way of making postgres_fdw to push this this
> function to foreign server?
> May be this check should be changed to:
>
>                  if (fe->inputcollid == InvalidOid || inner_cxt.state ==
> FDW_COLLATE_NONE)
>                       /* OK, inputs are all noncollatable */ ;
>

I think, in general, we may want to push the some of the local
functions that may filter out tuples/rows to remote backend to reduce
the data transfer(assuming collation and other settings are similar to
that of the local backend), but definitely, not this way. One possible
issue could be that, what if these functions are supported/installed
on the local server, but not on the remote? May be because the remote
postgres server version is different than that of the local? Is there
a version check between local and remote servers in postgres_fdw?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: jsonb, collection & postgres_fdw

От
Konstantin Knizhnik
Дата:

On 14.08.2020 09:40, Bharath Rupireddy wrote:
> On Thu, Aug 13, 2020 at 8:54 PM Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>> Right now jsonb functions are treated as non-shippable by postgres_fdw
>> and so predicates with them are not pushed down to foreign server:
>>
>> I wonder if there is some way of making postgres_fdw to push this this
>> function to foreign server?
>> May be this check should be changed to:
>>
>>                   if (fe->inputcollid == InvalidOid || inner_cxt.state ==
>> FDW_COLLATE_NONE)
>>                        /* OK, inputs are all noncollatable */ ;
>>
> I think, in general, we may want to push the some of the local
> functions that may filter out tuples/rows to remote backend to reduce
> the data transfer(assuming collation and other settings are similar to
> that of the local backend), but definitely, not this way. One possible
> issue could be that, what if these functions are supported/installed
> on the local server, but not on the remote? May be because the remote
> postgres server version is different than that of the local? Is there
> a version check between local and remote servers in postgres_fdw?
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com

Right now postgres_fdw treat as shippable only builtin functions or 
functions from extensions explicitly specified as shippable extensions 
in parameters of this FDW server. So I do no see a problem here. Yes, 
foreign server may have different version of Postgres which doesn't have
this built-in function or its  profile is different. It can happen if 
postgres_fdw is used to connect two different servers which are 
maintained independently. But in most cases I think, postgres_fdw is 
used to organize some kind of cluster. In this case all nodes are 
identical (hardware, OS, postgres version) and performance is very 
critical (because scalability - of one of the goal of replacing single 
node with cluster).
This is why push down of predicates is very critical in this case.

I still do not completely understand current criteria of shippable 
functions.
I understood Tom's explanation, but:

postgres=# create table t1(t text collate "C");
CREATE TABLE
postgres=# create foreign table ft1(t text collate "ru_RU") server 
pg_fdw options (table_name 't1');
CREATE FOREIGN TABLE
postgres=# explain select * from ft1 where lower(t)='some';
                          QUERY PLAN
------------------------------------------------------------
  Foreign Scan on ft1  (cost=100.00..132.07 rows=7 width=32)
(1 row)

lower(t) is pushed to remote server despite to the fact that "t" has 
different collations at local and remote servers.
Also when initialize postgres database, you can specify default collation.
I have not found any place in postgres_fdw which tries to check if 
default collation of remote and local servers are the same
or specify collation explicitly when them are different.

 From my point of view, it will be nice to have flag in postgres_fdw 
server indicating that foreign and remote servers are identical
and treat all functions as shippable in this case (not only built-in 
ones are belonging to explicitly specified shippable extensions).
It will simplify using postres_fdw in clusters and makes it more efficient.





Re: jsonb, collection & postgres_fdw

От
Tom Lane
Дата:
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
> I still do not completely understand current criteria of shippable 
> functions.
> I understood Tom's explanation, but:

> postgres=# create table t1(t text collate "C");
> CREATE TABLE
> postgres=# create foreign table ft1(t text collate "ru_RU") server 
> pg_fdw options (table_name 't1');
> CREATE FOREIGN TABLE
> postgres=# explain select * from ft1 where lower(t)='some';
>                           QUERY PLAN
> ------------------------------------------------------------
>   Foreign Scan on ft1  (cost=100.00..132.07 rows=7 width=32)
> (1 row)

> lower(t) is pushed to remote server despite to the fact that "t" has 
> different collations at local and remote servers.

Well, that's the case because you lied while creating the foreign
table.  We have no practical way to cross-check whether the foreign
table's declaration is an accurate representation of the remote table,
so we just take it on faith that it is.

The problem that the collation check is trying to solve is that we
can't safely push COLLATE clauses to the remote server, because it
may not have the same set of collation names as the local server.
So we can only push clauses whose collation is entirely derivable
from the table column(s) they use.  And then, per the above, we rely on
the user to make sure that the local and remote columns have equivalent
collations.  (Which conceivably would have different names.)

>  From my point of view, it will be nice to have flag in postgres_fdw 
> server indicating that foreign and remote servers are identical
> and treat all functions as shippable in this case (not only built-in 
> ones are belonging to explicitly specified shippable extensions).

Perhaps, but not everyone has that use-case.  I'd even argue that it's
a minority use-case.

            regards, tom lane



Re: jsonb, collection & postgres_fdw

От
Bharath Rupireddy
Дата:
On Fri, Aug 14, 2020 at 12:46 PM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
>
> Right now postgres_fdw treat as shippable only builtin functions or
> functions from extensions explicitly specified as shippable extensions
> in parameters of this FDW server. So I do no see a problem here. Yes,
> foreign server may have different version of Postgres which doesn't have
> this built-in function or its  profile is different. It can happen if
> postgres_fdw is used to connect two different servers which are
> maintained independently. But in most cases I think, postgres_fdw is
> used to organize some kind of cluster. In this case all nodes are
> identical (hardware, OS, postgres version) and performance is very
> critical (because scalability - of one of the goal of replacing single
> node with cluster).
> This is why push down of predicates is very critical in this case.
>

Agree, push down of predicates(with functions) to the remote backend helps a lot. But, is it safe to push all the functions? For instance, functions that deal with time/time zones, volatile functions etc. I'm not exactly sure whether we will have some issues here. Since postgres_fdw can also be used for independently maintained postgres servers(may be with different versions), we must have a mechanism to know the compatibility.

>
>  From my point of view, it will be nice to have flag in postgres_fdw
> server indicating that foreign and remote servers are identical
> and treat all functions as shippable in this case (not only built-in
> ones are belonging to explicitly specified shippable extensions).
> It will simplify using postres_fdw in clusters and makes it more efficient.
>

I think it's better not to have a flag for this. As we have to deal with the compatibility not only at the server version level, but also at each function level. We could have something like a configuration file which allows the user to specify the list of functions that are safely pushable to remote in his/her own postgres_fdw setup, and let the postgres_fdw refer this configuration file, while checking the pushability of the functions to remote. This way, the user has some control over what's pushed and what's not. Of course, this pushability check can only happen after the mandatory checks happening currently such as remote backend configuration settings such as collations etc.

Feel free to correct me.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: jsonb, collection & postgres_fdw

От
Ashutosh Bapat
Дата:
On Mon, Aug 17, 2020 at 7:32 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Aug 14, 2020 at 12:46 PM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
> >
> > Right now postgres_fdw treat as shippable only builtin functions or
> > functions from extensions explicitly specified as shippable extensions
> > in parameters of this FDW server. So I do no see a problem here. Yes,
> > foreign server may have different version of Postgres which doesn't have
> > this built-in function or its  profile is different. It can happen if
> > postgres_fdw is used to connect two different servers which are
> > maintained independently. But in most cases I think, postgres_fdw is
> > used to organize some kind of cluster. In this case all nodes are
> > identical (hardware, OS, postgres version) and performance is very
> > critical (because scalability - of one of the goal of replacing single
> > node with cluster).
> > This is why push down of predicates is very critical in this case.
> >
>
> Agree, push down of predicates(with functions) to the remote backend helps a lot. But, is it safe to push all the
functions?For instance, functions that deal with time/time zones, volatile functions etc. I'm not exactly sure whether
wewill have some issues here. Since postgres_fdw can also be used for independently maintained postgres servers(may be
withdifferent versions), we must have a mechanism to know the compatibility. 
>
> >
> >  From my point of view, it will be nice to have flag in postgres_fdw
> > server indicating that foreign and remote servers are identical
> > and treat all functions as shippable in this case (not only built-in
> > ones are belonging to explicitly specified shippable extensions).
> > It will simplify using postres_fdw in clusters and makes it more efficient.
> >
>
> I think it's better not to have a flag for this. As we have to deal with the compatibility not only at the server
versionlevel, but also at each function level. We could have something like a configuration file which allows the user
tospecify the list of functions that are safely pushable to remote in his/her own postgres_fdw setup, and let the
postgres_fdwrefer this configuration file, while checking the pushability of the functions to remote. This way, the
userhas some control over what's pushed and what's not. Of course, this pushability check can only happen after the
mandatorychecks happening currently such as remote backend configuration settings such as collations etc. 

I agree with most of this. We need a way for a user to tell us which
function is safe to be executed on the foreign server (not just
postgres_fdw, but other kinds of FDWs as well). But maintaining that
as a configurable file and associating safety with an FDW isn't
sufficient. We should maintain that as a catalog. A function may be
safe to push down based on the FDW (a given function always behaves in
the same way on any of the servers of an FDW as its peer locally), or
may be associated with a server (a function is available and behaves
same as its local peer on certain server/s but not all). Going further
a local function may map to a function with a different name on the
remote server/fdw, so that same catalog may maintain the function
mapping. An FDW may decide to cache relevant information, update the
catalog using IMPORT FOREIGN SCHEMA(or ROUTINE), or add some defaults
when installing the extension.

More details are required to be worked out but here my initial thoughts on this.


--
Best Wishes,
Ashutosh Bapat



Re: jsonb, collection & postgres_fdw

От
Ashutosh Bapat
Дата:


On Tue, 18 Aug 2020 at 17:36, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
On Mon, Aug 17, 2020 at 7:32 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Aug 14, 2020 at 12:46 PM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
> >
> > Right now postgres_fdw treat as shippable only builtin functions or
> > functions from extensions explicitly specified as shippable extensions
> > in parameters of this FDW server. So I do no see a problem here. Yes,
> > foreign server may have different version of Postgres which doesn't have
> > this built-in function or its  profile is different. It can happen if
> > postgres_fdw is used to connect two different servers which are
> > maintained independently. But in most cases I think, postgres_fdw is
> > used to organize some kind of cluster. In this case all nodes are
> > identical (hardware, OS, postgres version) and performance is very
> > critical (because scalability - of one of the goal of replacing single
> > node with cluster).
> > This is why push down of predicates is very critical in this case.
> >
>
> Agree, push down of predicates(with functions) to the remote backend helps a lot. But, is it safe to push all the functions? For instance, functions that deal with time/time zones, volatile functions etc. I'm not exactly sure whether we will have some issues here. Since postgres_fdw can also be used for independently maintained postgres servers(may be with different versions), we must have a mechanism to know the compatibility.
>
> >
> >  From my point of view, it will be nice to have flag in postgres_fdw
> > server indicating that foreign and remote servers are identical
> > and treat all functions as shippable in this case (not only built-in
> > ones are belonging to explicitly specified shippable extensions).
> > It will simplify using postres_fdw in clusters and makes it more efficient.
> >
>
> I think it's better not to have a flag for this. As we have to deal with the compatibility not only at the server version level, but also at each function level. We could have something like a configuration file which allows the user to specify the list of functions that are safely pushable to remote in his/her own postgres_fdw setup, and let the postgres_fdw refer this configuration file, while checking the pushability of the functions to remote. This way, the user has some control over what's pushed and what's not. Of course, this pushability check can only happen after the mandatory checks happening currently such as remote backend configuration settings such as collations etc.
I agree with most of this. We need a way for a user to tell us which
function is safe to be executed on the foreign server (not just
postgres_fdw, but other kinds of FDWs as well). But maintaining that
as a configurable file and associating safety with an FDW isn't
sufficient. We should maintain that as a catalog. A function may be
safe to push down based on the FDW (a given function always behaves in
the same way on any of the servers of an FDW as its peer locally), or
may be associated with a server (a function is available and behaves
same as its local peer on certain server/s but not all). Going further
a local function may map to a function with a different name on the
remote server/fdw, so that same catalog may maintain the function
mapping. An FDW may decide to cache relevant information, update the
catalog using IMPORT FOREIGN SCHEMA(or ROUTINE), or add some defaults
when installing the extension.

While looking at something else in postgres_fdw, I came across an old feature which I had completely forgotten about. We allow extensions to be added to server options. Any object belonging to these extensions, including functions, can be shipped to the foreign server. See postres_fdw/sql/postgres_fdw.sql for examples. This is an awkward way since there is no way to control individual functions and a UDF has to be part of an extension to be shippable. It doesn't provide flexibility to map a local function to a remote one if their names differ. But we have something. May be we could dig past conversations to understand why it was done this way.

--
Best Wishes,
Ashutosh