Re: WIP: Join push-down for foreign tables

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: WIP: Join push-down for foreign tables
Дата
Msg-id 4EC4CDC0.6040609@enterprisedb.com
обсуждение исходный текст
Ответ на Re: WIP: Join push-down for foreign tables  (Shigeru Hanada <shigeru.hanada@gmail.com>)
Ответы Re: WIP: Join push-down for foreign tables  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 15.11.2011 19:16, Shigeru Hanada wrote:
> This is the second effort for $SUBJECT.  Attached patch requires
> pgsql_fdw patches[1] to be applied previously.  This patch provides:
>
> * Changes for backend
>    * Add new planner node ForeignJoinPath and related routines.  In
>      current design, planner consider all of possible join combinations
>      between foreign tables, similar to local joins such as nested loop,
>      hash join and merge join.  And if foreign join is cheapest, planner
>      produces a ForeignScan plan node for a join.  So executor is not
>      modified heavily since 9.1.
>    * Add new FDW callback for planning join push-down between foreign
>      tables on same server.  This function is optional, and allowed to
>      return NULL to tell planner that that join can't be handled by the
>      FDW.

So the way a three-way join is planned, is that the planner first asks 
the FDW to plan ForeignPaths of scanning the individual tables. Then it 
asks the FDW to consider pairwise joins of those ForeignPaths. Then it 
asks the FDW to consider joins of the constructed ForeignPaths and 
ForeignJoinPaths. Ie. the plan involving a join of three or more remote 
tables is built bottom-up, just like a join of local tables.

When the FDW recognizes it's being asked to join a ForeignJoinPath and a 
ForeignPath, or two ForeignJoinPaths, it throws away the old SQL it 
constructed to do the two-way join, and builds a new one to join all 
three tables. That seems tedious, when there are a lot of tables 
involved. A FDW like the pgsql_fdw that constructs an SQL query doesn't 
need to consider pairs of joins. It could just as well build the SQL for 
the three-way join directly. I think the API needs to reflect that.

I wonder if we should have a heuristic to not even consider doing a join 
locally, if it can be done remotely. For a query like this:

SELECT * FROM remote1 a, remote2 b, remote3 c WHERE a.id = b.id AND c.id 
= b.id

it's quite obvious that the best plan is to do the join remotely, rather 
than pull all the rows from all tables, and do the join locally. In 
theory, if the remote database is remarkably bad at performing joins, it 
might be faster to pull in all the data and do it locally, but I can't 
really imagine that happening in practice.

> * Changes for pgsql_fdw
>    * Implemente PlanForeignJoin callback function.

A couple of basic bugs I bumped into:

* WHERE-clause building fails on a cartesian product ("SELECT * FROM 
remote1, remote2")

* The join planning in pgsql_fdw seems to get confused and gives up if 
there are any local tables also involved in the query (e.g "explain 
SELECT * FROM remote1, remote2 LEFT OUTER JOIN local1 on (local1.a = 
remote2.a) WHERE remote1.a = remote2.a;")

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Hitoshi Harada
Дата:
Сообщение: Re: (PATCH) Adding CORRESPONDING to Set Operations
Следующее
От: "Albe Laurenz"
Дата:
Сообщение: Re: Disable OpenSSL compression