Обсуждение: use_remote_estimate usage for join pushdown in postgres_fdw

Поиск
Список
Период
Сортировка

use_remote_estimate usage for join pushdown in postgres_fdw

От
Ashutosh Bapat
Дата:
Hi All,
postgres_fdw documentation says following about use_remote_estimate (http://www.postgresql.org/docs/devel/static/postgres-fdw.html)
--
use_remote_estimate
This option, which can be specified for a foreign table or a foreign server, controls whether postgres_fdw issues remote EXPLAIN commands to obtain cost estimates. A setting for a foreign table overrides any setting for its server, but only for that table. The default is false.
--

I am trying to see, how should we use this option in the context of join pushdown and for
that matter any pushdown involving more than one table.

I came up with following arguments
1. Foreign base relations derive their use_remote_estimate setting either from the server setting or the per table setting. A join between two foreign relations should derive its use_remote_estimate setting from the joining relations (recursively). This means that we will use EXPLAIN to estimate costs of join if "all" the involved base foreign relations have use_remote_estimate true (either they derive it from the server level setting or table level setting).

2. Similar to 1, but use EXPLAIN to estimate costs if "any" of the involved base foreign relations have use_remote_estimate is true.

3. Since join between two foreign relations is not a table level phenomenon, but a server level phenomenon, we should use server level setting. This means that we will use EXPLAIN output to estimate costs of join if the foreign server has use_remote_estimate true, irrespective of the setting for individual foreign relations involved in that join.

Unfortunately the documentation and comments in code do not say much about the intention (i.e. why and how is this setting expected to be used) of this setting in the context or server.

The intention behind server level setting is more confusing. It does not override table level setting, so it is not intended to be used for a prohibitive reason like e.g. server doesn't support EXPLAIN the way it will be interpreted locally. It seems to act more like a default in case table level setting is absent. User may set table level use_remote_estimate to true, if cost of EXPLAIN is very small compared to that of table scan (with or without indexes) or adding conditional clauses to the query alters the costs heavily that the cost of EXPLAIN itself is justified. But I can be wrong about these intentions.

If we go by the above intention behind table level setting, 2nd argument makes more sense as the table for which use_remote_estimate is true, can change the cost of join heavily because of the clauses in the join and it's better to get it from the foreign server than guessing it locally.

Comments/suggestions are welcome.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: use_remote_estimate usage for join pushdown in postgres_fdw

От
Robert Haas
Дата:
On Fri, Dec 11, 2015 at 4:44 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> Hi All,
> postgres_fdw documentation says following about use_remote_estimate
> (http://www.postgresql.org/docs/devel/static/postgres-fdw.html)
> --
> use_remote_estimate
> This option, which can be specified for a foreign table or a foreign server,
> controls whether postgres_fdw issues remote EXPLAIN commands to obtain cost
> estimates. A setting for a foreign table overrides any setting for its
> server, but only for that table. The default is false.
> --
>
> I am trying to see, how should we use this option in the context of join
> pushdown and for
> that matter any pushdown involving more than one table.
>
> I came up with following arguments
> 1. Foreign base relations derive their use_remote_estimate setting either
> from the server setting or the per table setting. A join between two foreign
> relations should derive its use_remote_estimate setting from the joining
> relations (recursively). This means that we will use EXPLAIN to estimate
> costs of join if "all" the involved base foreign relations have
> use_remote_estimate true (either they derive it from the server level
> setting or table level setting).
>
> 2. Similar to 1, but use EXPLAIN to estimate costs if "any" of the involved
> base foreign relations have use_remote_estimate is true.
>
> 3. Since join between two foreign relations is not a table level phenomenon,
> but a server level phenomenon, we should use server level setting. This
> means that we will use EXPLAIN output to estimate costs of join if the
> foreign server has use_remote_estimate true, irrespective of the setting for
> individual foreign relations involved in that join.
>
> Unfortunately the documentation and comments in code do not say much about
> the intention (i.e. why and how is this setting expected to be used) of this
> setting in the context or server.
>
> The intention behind server level setting is more confusing. It does not
> override table level setting, so it is not intended to be used for a
> prohibitive reason like e.g. server doesn't support EXPLAIN the way it will
> be interpreted locally. It seems to act more like a default in case table
> level setting is absent. User may set table level use_remote_estimate to
> true, if cost of EXPLAIN is very small compared to that of table scan (with
> or without indexes) or adding conditional clauses to the query alters the
> costs heavily that the cost of EXPLAIN itself is justified. But I can be
> wrong about these intentions.
>
> If we go by the above intention behind table level setting, 2nd argument
> makes more sense as the table for which use_remote_estimate is true, can
> change the cost of join heavily because of the clauses in the join and it's
> better to get it from the foreign server than guessing it locally.
>
> Comments/suggestions are welcome.

I like option #2.  I don't really have a strong reason for that, but
it feels intuitive to me that we err on the side of using remote
estimates when in doubt.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: use_remote_estimate usage for join pushdown in postgres_fdw

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> I like option #2.  I don't really have a strong reason for that, but
> it feels intuitive to me that we err on the side of using remote
> estimates when in doubt.

If we believe that, why isn't the default value of use_remote_estimate true?
(Maybe it should be.)

Another option that should be considered is that joins should pay
attention only to the server-level setting and not to the individual
tables' settings.  This would surely be cheaper to implement, and
it avoids any questions about whether to OR or AND the individual
settings.
        regards, tom lane



Re: use_remote_estimate usage for join pushdown in postgres_fdw

От
Robert Haas
Дата:
On Wed, Dec 16, 2015 at 1:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I like option #2.  I don't really have a strong reason for that, but
>> it feels intuitive to me that we err on the side of using remote
>> estimates when in doubt.
>
> If we believe that, why isn't the default value of use_remote_estimate true?
> (Maybe it should be.)
>
> Another option that should be considered is that joins should pay
> attention only to the server-level setting and not to the individual
> tables' settings.  This would surely be cheaper to implement, and
> it avoids any questions about whether to OR or AND the individual
> settings.

That was Ashutosh's option #3.

use_remote_estimate is a pretty expensive option, which is why it's
not on by default.  But if you are willing to spend that effort for a
scan of table A parameterized by a value from table B, it seems likely
to me that you are also willing to spend the effort to accurately cost
a pushed-down join of A and B.  Actually, it seems like it would be
more surprising if you weren't: we're willing to accurately cost
iterating the scan of B, but not pushing the whole join down?  Hmm.

That's an arguable position, of course.

Actually, I think that neither use_remote_estimate nor
!use_remote_estimate is a particularly great option.
!use_remote_estimate produces results that are basically pulled out of
a hat.  use_remote_estimate produces good estimates, but it's pretty
expensive for a planning operation.  I'd like to have some other
alternative, like a local cache of metadata that we can consult when
!use_remote_estimate instead of just making things up, which might
tell us things like what indexes exist on the remote side.  But that's
a different project.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: use_remote_estimate usage for join pushdown in postgres_fdw

От
Ashutosh Bapat
Дата:


On Wed, Dec 16, 2015 at 11:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
> I like option #2.  I don't really have a strong reason for that, but
> it feels intuitive to me that we err on the side of using remote
> estimates when in doubt.

If we believe that, why isn't the default value of use_remote_estimate true?
(Maybe it should be.)

Another option that should be considered is that joins should pay
attention only to the server-level setting and not to the individual
tables' settings.  This would surely be cheaper to implement, and
it avoids any questions about whether to OR or AND the individual
settings.


To implement server-level setting, we will need to pull it out of ForeignServer structure like
 442     foreach(lc, fpinfo->server->options)
 443     {
 444         DefElem    *def = (DefElem *) lfirst(lc);
 445
 446         if (strcmp(def->defname, "use_remote_estimate") == 0)
 447             fpinfo->use_remote_estimate = defGetBoolean(def);
 ...
 455     }

whereas use_remote_estimate setting for joining relations is readily available in PgFdwRelationInfo

 58     /* Options extracted from catalogs. */
 59     bool        use_remote_estimate;
 60     Cost        fdw_startup_cost;
 61     Cost        fdw_tuple_cost;
 62     List       *shippable_extensions;   /* OIDs of whitelisted extensions */
 ...
 76 } ;

This use_remote_estimate is true if server level option is true or table level option is true.

ANDing or ORing use_remote_estimate from the joining relations' PgFdwRelationInfo looks cheaper than pulling it out of ForeignServer structure.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company