Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

Поиск
Список
Период
Сортировка
От Ashutosh Bapat
Тема Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
Дата
Msg-id CAFjFpRdHgeNOhM0AB6Gxz1eVx_yOqkYwuKddZeB5vPzfBaeCnQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Ответы Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi All,
It's been a long time since last patch on this thread was posted. I have started
to work on supporting join pushdown for postgres_fdw. Attached please find three
patches
1. pg_fdw_core.patch: changes in core related to user mapping handling, GUC
                      enable_foreignjoin
2. pg_fdw_join.patch: postgres_fdw changes for supporting join pushdown
3. pg_join_pd.patch: patch which combines both of these for easy testing.

This mail describes the high level changes included in the patch.

GUC enable_foreignjoin
======================
Like enable_*join GUCs, this GUC when ON allows planner to consider pushing down
joins to the foreign server. If OFF, foreign join paths will not be considered.

Building joinrel for joins involving foreign relations
======================================================
RelOptInfo gets a new field umid for user mapping id used for the given foreign
relation. For base relations it's set to the user mapping of effective user and
foreign server for that relation. While building RelOptInfo for a join between
two foreign relations, if server and user mapping of joining relations are same,
they are copied to the RelOptInfo of the join relation being built. Also,
fdwroutine is set in joinrel to indicate that the core code considers this join
as pushable. It should suffice just to check equality of the user mapping oid
since same user mapping implies same server.

Since user mapping oid is available in RelOptInfo, FDWs can get user
mapping information by using this oid, new function GetUserMappingById()
facilitates that.

Generating paths
================
If fdwroutine is set in joinrel, add_paths_to_joinrel() calls
GetForeignJoinPaths hook of corresponding FDW. For postgres_fdw the hook is
implemented by function postgresGetForeignJoinPaths(). Follows the description
of this function.

In order to avoid considering same joinrel again, fdw_private member of
RelOptInfo is set by this function and populated with FDW specific information
about joinrel. When this member is set, it indicates that the pushable paths
have been considered for the given joinrel.

A join between two foreign relations is considered safe to push down if
1. The joining sides are pushable
2. The type of join is OUTER or INNER (LEFT/RIGHT/FULL/INNER). SEMI and ANTI
   joins are not considered right now, because of difficulties in constructing
   the queries involving those. The join clauses of SEMI/ANTI joins are not in a
   form that can be readily converted to IN/EXISTS/NOT EXIST kind of expression.
   We might consider this as future optimization.
3. Joining sides do not have clauses which can not be pushed down to the foreign
   server. For an OUTER join this is important since those clauses need to be
   applied before performing the join and thus join can not be pushed to the
   foreign server. An example is
   SELECT * FROM ft1 LEFT JOIN (SELECT * FROM ft2 where local_cond) ft2 ON (join clause)
   Here the local_cond on ft2 needs to be executed before performing LEFT JOIN
   between ft1 and ft2.
   This condition can be relaxed for an INNER join by pulling the local clauses
   up the join tree. But this needs more investigation and is not considered in
   this version.
4. The join conditions (e.g. conditions in ON clause) are all safe to push down.
   This is important for OUTER joins as pushing down join clauses partially and
   applying rest locally changes the result. There are ways [1] by which partial
   OUTER join can be completed by applying unpushable clauses locally and then
   nullifying the nullable side and eliminating duplicate non-nullable side
   rows. But that's again out of scope of first version of postgres_fdw join
   pushdown.

A ForeignPath is created for a safe-to-push-down join. Recursively applying this
procedure ends in having a single ForeignPath node for whole pushable join tree,
which may represent a join between more than two foreign tables.

Generating plan
===============
postgresGetForeignPlan() is used to create plan from path chosen by the
optimizer for both joins and base relation scans. This function constructs the SQL to
be sent to the remote node and create ForeignPlan node.

The function first separates given scan_clauses into pushable and safe-to-push
clauses. For joinrel baserestrictlist is NIL and we are not considering
parameterized paths, so there are no scan clauses expected. Next the function
constructs the SQL to be sent to the foreign server. Then it constructs the
ForeignScan node. Rest of this section describes the logic to construct the SQL
for join; the logic is implemented as function deparseSelectSqlForRel().

deparseSelectSqlForRel() builds the SQL for given joinrel (and now for baserel
asd well) recursively.
For joinrels
1. it constructs SQL representing either side of join, by calling itself
   in recursive fashion.
2. These SQLs are converted into subqueries and become part of the FROM clause
   with appropriate JOIN type and clauses. The left and right subqueries are
   given aliases "l" and "r" respectively. The columns in each subquery are
   aliased as "a1", "a2", "a3" and so on. Thus the third column on left side can
   be referenced as "l.a3" at any recursion level.
3. Targetlist is added representing the columns in the join result expected at
   that level.
4. The join clauses are added as part of ON clause
5. Any clauses that planner has deemed fit to be evaluated at that level of join
   are added as part of WHERE clause.

It uses the same old logic for deparsing SQL for base relations, except for the
deparsing the targetlist. When plan is being constructed only for a base
relation, the targetlist (SELECT clause) is constructed by including all the
columns by looking at attrs_used. This does not work when the base relation is
part of the join being pushed down, since the join targetlist depend upon the
targetlist in RelOptInfo of base relation and not necessarily the targetlist
obtained from attrs_used.

Row marks
---------
Because of recursive nature of SQL, the names of relations referenced in row
marks are not available at the top level in SQL built for a given top level join
relation. Hence we have to add FOR SHARE/UPDATE clauses to the subqueries built
for foreign base relations. This causes all the rows participating in join (not
the join result) from the base relations to get locked (on foreign server).
Ideally for a top level row mark clause, we should be locking only those rows
(on foreign server) which are part of the top level join result. But that
requires flattening of the FROM clause constructed, which would require some
signficant intelligence in the deparser code. I have left this out of scope for
at least this version of the patch.

Examples of remote SQL can be found in the expected output of regression test postgres_fdw.sql.

Foreign plan execution
======================
For pushed down joins, the tuple descriptor for the result is obtained from the
targetlist of the plan. Accordingly the error callback and result handling has
been modified.

Explain output for a pushed down join provides the join expression that it
represents.

TODOs
=====
This patch is very much WIP patch to show case the approach and invite early
comments. I will continue to improve the patch and some of the areas that will
be improved are
1. Costing of foreign join paths.
2. Various TODOs in the patch, making it more readable, finishing etc.
3. Tests
4. Any comments/suggestions on approach or the attached patch.

In another thread Robert, Fujita-san and Kaigai-san are discussing about
EvalPlanQual support for foreign joins. Corresponding changes to postgres_fdw
will need to be added once those changes get committed.

Items that will be considered in subsequent patches for 9.6
===========================================================
1. Parameterized paths for join: For regular joins, the parameterized paths for
   joins consider the parameterization of the joining paths. In case of foreign
   join, we do not consider any specific paths for the joining foreign
   relations and it acts as a scan. It should be treated as if parameterizing a
   scan and not like regular join. This requires some work and even without that
   the functionality supported by this patch is quite useful. So, I have left it
   out of the scope for this patch and will be considered in subsequent patch.

2. Foreign join paths with pathkeys: There's my patch pending for considering
   pathkeys for foreign base relation scan. The support for foreign join paths
   with pathkeys will added once that patch gets committed.


Suggestions/comments are welcome.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Rushabh Lathia
Дата:
Сообщение: Re: Getting sorted data from foreign server for merge join
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Parallel Seq Scan